via ZipRecruiter
$161K - 237K a year
Lead end-to-end GPU provisioning and infrastructure programs, ensuring production readiness, reliability, and continuous improvement.
Requires 10+ years in technical program management related to GPU or high-performance compute infrastructure, with experience in observability, hardware/software integration, and cross-functional leadership.
About the Role We are seeking a highly motivated and experienced Infrastructure and Node Delivery Technical Program Manager (TPM) to join our dynamic team focused on the New Product Introduction (NPI) of our next-generation GPU hardware provisioning and delivery. In this pivotal role, you will be instrumental in leading the cross-functional efforts from concept to mass production, ensuring the timely, high-quality, and cost-effective delivery of our innovative GPU products that power the future of AI. You will navigate the complexities of hardware development cycles, collaborate with world-class engineers, external vendors, and influence strategic decisions to bring groundbreaking technology to market. The Technical Program Manager will • Production Readiness: Ensure infrastructure and system software are production-ready for new hardware and compute platforms. Engage in technical discussions with engineering teams, challenge assumptions, and contribute to problem-solving. • Program Leadership: Drive end-to-end programs spanning GPU provisioning, at-scale deployments, Fleet NPI readiness, and vendor management. Anticipate and identify potential risks, proactively develop mitigation strategies, and drive timely resolution of technical and logistical challenges. • Reliability & SLA Management: Coordinate with hardware compute engineering, Fleet teams, and external vendors to maintain service reliability, enforce SLAs, and lead incident response efforts. • Observability & Telemetry: Partner with engineering teams to improve monitoring, telemetry, and fleet observability for proactive performance management. • Metrics & Insights: Define and track metrics around GPU fleet health, performance, and reliability. • Postmortems & Continuous Improvement: Run post-incident reviews and drive action items that enhance system reliability and prevent regressions. • Internal Enablement: Collaborate with internal customers to collect feedback, enable adoption of core infrastructure platforms, and refine onboarding experiences (e.g., K8s Core Interface, CKS, SUNK) for hardware compute NPIs. • Cross-functional Coordination: Work closely with Product, Infrastructure, Platform Engineering, Vendor, and Customer Experiences to align on roadmap priorities and customer delivery timelines. • Effective Communication: Communicate program status, risks, and critical decisions to senior leadership and executive stakeholders with clarity and conciseness. Foster a culture of transparency, collaboration, and continuous improvement within the NPI process. Minimum Qualifications • Bachelor's degree in Electrical Engineering, Computer Engineering, or a related technical field. • 10+ years of experience in technical program management in GPU provisioning, fleet management, or large-scale compute infrastructure. • Background in observability, monitoring, or telemetry systems (e.g., Prometheus, Grafana, OpenTelemetry). Hands-on experience coordinating NPI or GTM readiness for compute products. • Technical understanding of system software orchestration and hardware/software integration. • Solid understanding of hardware and fleet development lifecycles. • Proven ability to lead cross-functional teams, influence without direct authority, and drive consensus in a fast-paced environment. • Exceptional communication, interpersonal, and presentation skills. • Proficiency in program management tools (e.g., Jira, Confluence, Sheet). Preferred Qualifications • Master's degree in Engineering or an MBA. • Experience with GPU or other high-performance compute architecture NPI. • Experience working with international manufacturing partners and supply chains. • Experience with agile methodologies in a hardware and software development context. The base salary range for this role is $161,000 to $237,000. The starting salary will be determined based on job-related knowledge, skills, experience, and market location. We strive for both market alignment and internal equity when determining compensation. In addition to base salary, our total rewards package includes a discretionary bonus, equity awards, and a comprehensive benefits program (all based on eligibility).
This job posting was last updated on 2/11/2026