via LinkedIn
$120K - 200K a year
Provide support and automation for large-scale AI/ML platforms on Databricks, ensuring stability, security, and cost efficiency.
Extensive experience with Databricks, cloud infrastructure, IaC, and support of distributed teams.
Fractal is a strategic AI partner to Fortune 500 companies with a vision to power every human decision in the enterprise. Fractal is building a world where individual choices, freedom, and diversity are the greatest assets; an ecosystem where human imagination is at the heart of every decision. Where no possibility is written off, only challenged to get better. We believe that a true Fractalite is the one who empowers imagination with intelligence. Fractal has been featured as a Great Place to Work by The Economic Times in partnership with the Great Place to Work® Institute and recognized as a ‘Cool Vendor’ and a ‘Vendor to Watch’ by Gartner. Please visit Fractal | Intelligence for Imagination for more information about Fractal. Job Title: Databricks SRE and Support Engineer Location: Remote (Must be in USA) As Databricks SRE and Support Engineer, you will work on operations related to AI Dojo (AI/ML upskilling program developed by our Client) on Databricks. This individual contributor (IC) role requires experience on working on large-scale AI/ML platforms guaranteeing stability, reliability, scalability, and performance. Experience with modern Infrastructure and DevOps tools and paradigms, as well as proven hands-on knowledge with Databricks is a must. Primary Responsibilities: • Continuous support: Provide continuous SRE support to thousands of geographically distributed users on the AI Dojo Databricks platform: respond to tickets, triage support, liaise with customers. • Automation & DevOps: Improve existing Infrastructure as Code (IaC) according to best DevOps practices. • Systems Monitoring: Develop and maintain monitoring frameworks to timely respond to outages and other service interruptions. • Security & Compliance: Collaborate with internal cybersecurity teams to ensure all systems and operations comply with industry standards and are secure against evolving threats. • Capacity Planning & Cost Optimization: Forecast and manage capacity requirements for the AI/ML training environment, while identifying opportunities to reduce costs without compromising performance. Required Qualifications: • Bachelor’s degree in computer science, information technology, or a related field. • 6+ years of infrastructure experience: Proven experience working on large-scale, cloud-based, enterprise-level software platforms and deep understanding of Databricks environment. In particular: • Experience building Github Actions pipelines including composite actions, OIDC federation for cloud provider identity acquisition, and use of environments and deployment controls • Experience building Databricks Asset Bundle and Terraform pipelines to manage and deploy Databricks Platform and Workspace resources • Fluency in Python, experience with the Databricks Python SDK to perform Workspace operations, and familiarity with PySpark and Delta Lake. • Deep familiarity with Databricks APIs, and use of the Databricks CLI for use provisioning Workspace identities, filesystem resources, and the querying of account and workspace level Users, Groups, and Service Principals • Strong understanding of security best practices and experience ensuring compliance with relevant regulatory frameworks. • 3+ years of practical experience in Infrastructure-as-Code and CI/CD tools like Terraform, GitHub Actions, Databricks Asset Bundles, and alike • 3+ years of experience working with AWS IaC deployments performing account provisioning, implementing cross-account automation, and building resource and identity policies supporting least-privilege access roles • 3+ years of experience working in support teams that are geographically distributed Please note that we are currently unable to sponsor work visas for this position. Pay: The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Fractal, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. Benefits: As a full-time employee of the company or as an hourly employee working more than 30 hours per week, you will be eligible to participate in the health, dental, vision, life insurance, and disability plans in accordance with the plan documents, which may be amended from time to time. You will be eligible for benefits on the first day of employment with the Company. In addition, you are eligible to participate in the Company 401(k) Plan after 30 days of employment, in accordance with the applicable plan terms. The Company provides for 11 paid holidays and 12 weeks of Parental Leave. We also follow a “free time” PTO policy, allowing you the flexibility to take time needed for either sick time or vacation. Fractal provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.
This job posting was last updated on 12/12/2025