via Workday
$272K - 489K a year
Lead and mentor a high-performing engineering team to develop stress and diagnostic software for GPU and server platforms, collaborating with hardware and cloud teams.
Extensive experience in system software, hardware interaction, and management of multi-team projects, with familiarity in server architectures and diagnostics.
NVIDIA's Data Center MODS organization is looking for an Engineering Manager to help Cloud Service Providers (CSPs) and OEMs scale out current and next generation datacenter products. You will be responsible for validating and scaling NVIDIA's GPU products at the system level, pushing hardware to its limits to ensure adaptability and reliability across diverse environments - from internal validation labs to hyperscale data centers. Our organization partners closely with architecture, ASIC, operations, and data center teams to build methodologies that stress every subsystem of the GPU and server platform. The team also supports diagnostics for customer deployments, tailoring stress workloads to specific configurations and use cases. What you'll be doing: • Lead and mentor a high-performing engineering team, fostering technical growth and leadership. • Collaborate with architecture and hardware teams to drive development of stress and diagnostic software targeting GPUs, CPUs, memory, storage, and interconnects. • Lead multiple concurrent projects, balancing long-term strategy with short-term execution. • Work with Cloud Service Providers (CSPs), OEMs, and data center operators to support deployment and customization of diagnostics. • Champion continuous improvement in product quality, debug efficiency, and operational scalability. What we need to see: • Bachelor's or Master's degree in Computer Science, Electrical Engineering, or related field or equivalent experience. • 10+ overall years of experience in system software development, with 4+ years in engineering management. • Experience with C/C++/Python • Deep understanding of operating systems, kernel drivers, and hardware-software interaction. • Experience with PC/server architecture, including PCIe, NVLink, Infiniband, or Ethernet. • Consistent track record of leading feature development and multi-team debugging efforts. Ways to Stand Out from the Crowd: • Experience with diagnostics or stress testing in large-scale data center environments. • Familiarity with GPU compute, graphics, memory subsystems, or high-speed interfaces. • Prior experience working with CSPs or OEMs on system-level validation and deployment. • Strong communication and multi-functional leadership skills. • Passion for building tools that ensure product excellence and customer success. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions. The base salary range is 272,000 USD - 431,250 USD for Level 4, and 320,000 USD - 488,750 USD for Level 5. You will also be eligible for equity and benefits. Applications for this job will be accepted at least until February 15, 2026. This posting is for an existing vacancy. NVIDIA uses AI tools in its recruiting processes. NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
This job posting was last updated on 2/17/2026