via DailyRemote
$120K - 200K a year
Develop and maintain full-stack applications and APIs, lead architecture decisions, and manage CI/CD pipelines.
Extensive full-stack development experience with modern JavaScript frameworks, cloud infrastructure, and DevOps practices.
Job Description: • Architect and implement HPC clusters for AI, simulation, and distributed training using Kubernetes and schedulers like Slurm. • Integrate NVIDIA Hopper and Blackwell‑class GPUs with NVLink/NVSwitch and InfiniBand/RoCE. • Deploy and manage GPU Operator and Network Operator for large fleets. • Design and validate cloud‑native HPC environments with low latency and high bandwidth. • Define and document reference architectures for AI model training and MLOps. • Collaborate with NVIDIA and other partners to evaluate new GPU generations and software stacks. • Benchmark performance, track down bottlenecks, and recommend concrete changes. • Lead design sessions and architecture reviews with customers focused on performance and reliability. Requirements: • A Bachelor’s or Master’s in Computer Science, Engineering, or a related field (PhD is a plus). • 3+ years actually building or running HPC or large GPU clusters—on‑prem, cloud, or hybrid. • Strong Linux background, plus Kubernetes and container runtimes (containerd, CRI‑O, Docker) in real environments, with CI/CD in the loop. • A solid handle on HPC networking and RDMA: InfiniBand, RoCE, NVLink/NVSwitch. • Experience with storage and I/O for big workloads: Ceph, Lustre, NFS at scale, GPUDirect Storage, or similar systems. • Comfort with Terraform, Ansible, Helm, and GitOps‑style workflows. • Good scripting skills in Python or Bash. • You write and speak clearly, can lead a design review without losing the room, and can keep both engineers and non‑technical stakeholders on the same page. • Legal authorization to work in the U.S. on a full-time basis without visa sponsorship. Benefits: • 100% employer‑paid medical, dental, and vision for you and your family • 4% 401(k) match with immediate vesting • Company‑paid short‑ and long‑term disability and life insurance • 20 weeks paid parental leave for primary caregivers, 12 weeks for secondary • Support for your home office (mobile + internet stipend)
This job posting was last updated on 2/19/2026