$150K - 250K a year
Design, build, and optimize container runtimes and Kubernetes deployment patterns for GPU workloads and LLM inference at scale.
10+ years of production software experience focused on Python tooling, Kubernetes, container technologies, GPU workloads, and strong collaboration skills.
Description: • Design, build, and harden containers for NIM runtimes, inference backends; enable reproducible, multi-arch, CUDA-optimized builds. • Develop Python tooling and services for build orchestration, CI/CD integrations, Helm/Operator automation, and test harnesses; enforce quality with typing, linting, and unit/integration tests. • Help design and evolve Kubernetes deployment patterns for NIMs, including GPU scheduling, autoscaling, and multi-cluster rollouts. • Optimize container performance: layer layout, startup time, build caching, runtime memory/IO, network, and GPU utilization; instrument with metrics and tracing. • Evolve the base image strategy, dependency management, and artifact/registry topology. • Collaborate across research, backend, SRE, and product teams to ensure day-0 availability of new models. • Mentor teammates; set high engineering standards for container quality, security, and operability. • Build enterprise-grade software and tooling for container build, packaging, and deployment; improve reliability, performance, and scale across thousands of GPUs. • Support disaggregated LLM inference and emerging deployment patterns. Requirements: • 10+ years building production software with a strong focus on containers and Kubernetes. • Strong Python skills building production-grade tooling/services • Experience with Python SDKs and clients for Kubernetes and cloud services • Expert knowledge of Docker/BuildKit, containerd/OCI, image layering, multi-stage builds, and registry workflows. • Deep experience operating workloads on Kubernetes. • Strong understanding on LLM inference features, including structured output, KV-cache, and LoRa adapter • Hands-on experience building and running GPU workloads in k8s, including NVIDIA device plugin, MIG, CUDA drivers/runtime, and resource isolation. • Excellent collaboration and communication skills; ability to influence cross-functional design. • A degree in Computer Science, Computer Engineering, or a related field (BS or MS) or equivalent experience. • Expertise with Helm chart design systems, Operators, and platform APIs serving many teams (preferred). • Experience with OpenAI API, Hugging Face API as well as understanding different inference backends (vLLM, SGLang, TRT-LLM) (preferred). • Background in benchmarking and optimizing inference container performance and startup latency at scale (preferred). • Prior experience designing multi-tenant, multi-cluster, or edge/air-gapped container delivery (preferred). • Contributions to open-source container, k8s, or GPU ecosystems (preferred). Benefits: • Competitive salaries • Generous benefits package • Eligible for equity
This job posting was last updated on 9/23/2025