via Remote Rocketship
$120K - 180K a year
Lead and grow a team of ML engineers focusing on production ML systems and release processes.
Experience leading ML production systems, cloud infrastructure, CI/CD, and strong leadership skills.
Job Description: • Lead and grow a team of ML engineers focused on production ML systems • Lead model improvements in response to production issues, product feedback, and new research or platform advancements • Lead production release processes for ML services, including release planning, CI/CD, staged rollouts, and rollback procedures • Build and operate observability and on-call practices for ML features, including monitoring, alerting, dashboards, incident response, and post-incident reviews • Develop and maintain scalable evaluation frameworks, datasets, and automated regression tests to prevent quality regressions • Lead reliability, performance, and cost improvements for inference and serving, including capacity planning and meeting SLAs (latency, throughput, availability) • Partner with researchers, product, and platform teams to define quality bars and production readiness, including Trusted AI requirements • Establish and evolve production standards and governance across ML features (testing, evaluation methodology, release gates, model versioning and lineage) • Partner with platform and product teams to integrate ML capabilities into products Requirements: • BS/MS in CS/Engineering or equivalent experience • Experience building and operating software systems, including production ML systems • People leadership experience, or strong technical leadership experience (mentoring, setting direction, driving delivery) • Experience with cloud infrastructure and production observability (AWS, Azure, or GCP) • Experience with CI/CD, reproducible deployments, and operating services in production • Strong written communication and documentation skills Benefits: • Health and financial benefits • Time away and everyday wellness
This job posting was last updated on 2/23/2026