Luma AI

via Gem

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

Site Reliability Engineer | AI Supercomputing

Anywhere

Full-time

Posted 12/6/2025

Direct Apply

Key Skills:

High-performance computing (HPC)

GPU architecture

Linux system optimization

Networking (InfiniBand, RDMA)

Kernel and driver debugging

Compensation

Salary Range

$120K - 180K a year

Responsibilities

Design, deploy, and optimize massive GPU supercomputing clusters and low-level networking to support AI model training.

Requirements

Expertise in HPC systems, Linux performance tuning, GPU hardware, and building infrastructure from first principles.

Full Description

The Opportunity Luma AI is building the engine for multimodal general intelligence. To teach models to understand the world through video, audio, and images, we operate at the absolute frontier of computing power. We have secured the capital to deploy massive-scale GPU clusters that rival the world's largest supercomputers, while maintaining the agility of a focused engineering lab. This role places you at the intersection of hardware and software, where you architect the physical and digital foundation of AGI. Where You Come In You will serve as a technical authority on the systems that power our research and product velocity. This is a role for a builder who prefers bare metal to managed services and understands that at our scale, standard cloud abstractions break down. You will architect, optimize, and maintain the massive, multi-vendor GPU supercomputers required to train our foundational models. What You Will Build Supercomputing Architecture: Design and deploy high-performance clusters combining thousands of GPUs, CPUs, and high-throughput networking to maximize training efficiency. The Network Layer: Optimize low-level networking (InfiniBand, RDMA) to ensure seamless communication between accelerators, eliminating bottlenecks in distributed training jobs. Hardware-Software Synthesis: Collaborate with hardware partners to push the boundaries of what is possible, debugging failures at the intersection of the kernel, driver, and silicon. The Profile We Are Looking For HPC Authority: You possess elite knowledge of high-performance computing (HPC), including job schedulers and the nuances of GPU architecture. Deep Systems Fluency: You are comfortable navigating the Linux terminal to solve complex performance issues, utilizing tools like perf and strace to optimize at the OS level. First-Principles Engineering: You have a history of building infrastructure from the ground up, demonstrating the ability to design systems where no playbook currently exists.

This job posting was last updated on 12/8/2025

JobLogr gets you hired faster

Save $15k

in lost income

Get back 54 hrs + hired 3.5x faster

than average job search

Try for Free

No credit card required

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »