Find your dream job faster with JobLogr
AI-powered job search, resume help, and more.
Try for Free
LA

Luma AI

via Gem

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

SRE | Foundation Models

Anywhere
Full-time
Posted 12/6/2025
Direct Apply
Key Skills:
Distributed Systems
Infrastructure Automation
Scheduling and Orchestration Systems
Cluster Health Monitoring
High Availability Systems
Node.js
TypeScript
AWS
CI/CD
REST API Development

Compensation

Salary Range

$120K - 160K a year

Responsibilities

Build and maintain scalable, reliable infrastructure platforms to support large-scale AI research workloads and model serving.

Requirements

Experience maintaining high availability distributed systems, familiarity with AI/ML infrastructure demands including GPU resource management, and strong operational automation skills.

Full Description

The Opportunity Luma AI is training the multimodal models that will define the next era of intelligence. Unlike other software companies, our product roadmap is driven by research breakthroughs. This requires a symbiotic relationship between our infrastructure engineers and our research scientists. We provide the massive compute resources necessary to compete at the top tier of AI, with a team structure that ensures you are in the room where the models are designed. Where You Come In You will build the platform that enables scientific discovery. Your work will directly accelerate the velocity of our research team, ensuring they have a stable, performant, and scalable environment to train and test the next generation of Omni models. You will translate the complex requirements of large-scale ML workloads into robust infrastructure reality. What You Will Build Research Platforms: Design and maintain the scheduling and orchestration systems that allow researchers to launch and manage massive training jobs with ease. Observability for Intelligence: Implement deep observability stacks that provide transparency into cluster health, allowing us to predict and prevent interruptions to critical training runs. Scalable Inference: Architect the production systems that serve our models to the world, balancing the high availability required for consumer products with the massive compute intensity of generative AI. The Profile We Are Looking For Service Orientation: You understand that reliable infrastructure is the enabler of innovation, and you care deeply about the developer experience of the researchers you support. Operational Excellence: You have a track record of maintaining high availability in complex, distributed environments, using automation to reduce toil. ML Infrastructure Fluency: You are familiar with the unique demands of AI workloads, including the management of GPU resources and the intricacies of distributed training.

This job posting was last updated on 12/8/2025

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »
JobLogr badgeTinyLaunch BadgeJobLogr - AI Job Search Tools to Land Your Next Job Faster than Ever | Product Hunt