$150K - 220K a year
Architect and optimize distributed AI training and inference systems, lead ML pipeline production scaling, provide technical leadership, and engage with customers and product teams.
5+ years cloud and infrastructure experience in senior MLOps or Solutions Architect roles, expertise in scaling AI workloads on multi-node multi-GPU systems, deep knowledge of ML frameworks PyTorch and JAX, and NVIDIA HPC ecosystem.
Description: • Architect and optimize distributed training and inference systems for large-scale AI models • Design and deliver customer-focused solutions that maximize performance and business value • Lead the transition of ML pipelines from POC to scalable production systems • Build long-term customer relationships, ensuring satisfaction and alignment with strategic goals • Create whitepapers, deliver technical presentations, and host webinars to share insights and best practices • Provide technical leadership and mentor teams on AI infrastructure and deployment strategies • Collaborate with engineering and product teams to prioritize customer feedback and influence product roadmaps Requirements: • 5+ years of experience with cloud technologies and infrastructure, ideally in senior MLOps or Solutions Architect roles • Proven expertise in scaling and optimizing AI workloads across multi-node and multi-GPU environments • Demonstrated success delivering ML products, scaling from POC to production • Deep knowledge of ML frameworks like PyTorch and JAX • Strong background in the NVIDIA HPC ecosystem (CUDA, NCCL, Infiniband) • Legal authorization to work in the United States on a full-time basis without sponsorship Benefits: • Full medical benefits: 100% company-paid medical, dental, and vision coverage for employees and families • 401(k) plan with a 4% match program • Stock options plan • Flexible remote work environment • Company-paid short-term, long-term disability, and life insurance coverage • 20 weeks paid parental leave for primary caregivers, 12 weeks for secondary caregivers • Up to $85/month for mobile and internet • Work with state-of-the-art AI and cloud technologies, including the latest NVIDIA GPUs • Be part of a team that operates one of the most powerful commercially available supercomputers • Contribute to sustainable AI infrastructure, with energy-efficient data centers that recover waste heat to warm nearby residential buildings
This job posting was last updated on 10/11/2025