via Eightfold
$150K - 250K a year
Lead development and execution of manufacturing test strategies for complex hardware systems including GPUs and liquid cooling, integrating AI/ML for test optimization and collaborating across engineering teams.
Requires advanced degree in engineering or equivalent experience with 8-10+ years in manufacturing test engineering for compute platforms, expertise in hardware diagnostics, software development, and data center cooling technologies.
Define and lead end-to-end manufacturing test strategies for PCBAs, storage enclosures, and rack-level systems. Leads the development of test hardware, software, and firmware to validate the functionality of complex systems including GPUs, CPUs, and liquid-cooled platforms, ensuring alignment with product design and performance goals. Develop test plans and validation metrics for GPU-based platforms (e.g., NVIDIA HGX, GB200), covering bring-up, functional, performance, and stress diagnostics. Integrate AI/ML models to dynamically adjust test coverage based on historical data, product complexity, and risk profiles. Implement AI-driven anomaly detection systems to flag test escapes and reduce false positives in real time. Designs and delivers end-to-end test solutions, particularly for advanced liquid cooling technologies, addressing both macro and micro-level thermal transfer challenges (e.g., fluids, pumps, manifolds, connectors). Collaborates across multidisciplinary teams—mechanical, electrical, process, and production engineering—to integrate test strategies early in the product lifecycle and ensure seamless execution during manufacturing. Defines and maintains test architecture and core test content, including reusable scripts and test cases, for both blade-level and rack-level systems, ensuring scalability and consistency across product lines. Monitors production yield and test data, identifies systemic issues, and drives root cause analysis and corrective actions to improve test coverage, product quality, and manufacturing efficiency. Assesses manufacturing test readiness before each NPI (New Product Introduction) build, conducting risk assessments and coordinating mitigation plans with internal and external stakeholders. Initiates early engagement in design phases to identify test coverage gaps, develop new test materials, and establish successful metrics to ensure quality and reliability from prototype to production. Ensures comprehensive documentation and verification coverage across all product stages, mapping test cases to customer impact and business value, including coverage ROI and cost rationalization. Drive continuous improvement initiatives by analyzing test system data, eliminating non-value-added processes, and enhancing test effectiveness and efficiency. Engages with CM/ODM partners to understand production capabilities and limitations, ensuring supply chain alignment and consistent delivery of high-quality products. Leverage predictive analytics and machine learning to forecast failure trends and proactively mitigate risks. Evaluate AI readiness of supplier test systems and drive adoption of intelligent test solutions across the ecosystem. Collaborate with data science teams to develop AI tools that support test optimization and decision-making. Doctorate in Electrical Engineering, Computer Science, Mechanical Engineering (Hardware) or related field AND 3+ years of enterprise computer, consumer electronics design or hyperscale supply chain systems experience OR Master's Degree in Electrical Engineering, Computer Science, Mechanical Engineering (Hardware) or related field AND 4+ years of enterprise computer, consumer electronics design or hyperscale supply chain systems experience OR Bachelor's Degree in Electrical Engineering, Computer Science, Mechanical Engineering (Hardware) or related field AND 8+ years enterprise computer, consumer electronics design or hyperscale supply chain systems experience OR equivalent experience. 10+ years of experience in manufacturing test engineering for compute and hyperscale platforms (e.g., servers, storage, GPUs), with deep expertise in test development and deployment at the building block, server, and rack levels. 3+ years of experience working with data center cooling infrastructure or advanced thermal management solutions, including liquid cooling technologies. 3+ years of experience developing diagnostic tools or utilities for electronic systems in high-volume or contract manufacturing environments. Demonstrated success leading test tool development projects (CPU/GPU) from concept through deployment, including integration into manufacturing test flows. Proficient in Linux-based environments, including scripting, command-line operations, and troubleshooting test systems and storage enclosures. Analytical and problem-solving skills with a proven ability to collaborate across multidisciplinary technical teams. 2+ years of hands-on experience in data center or test lab environments. Willingness to travel domestically and internationally, and to support off-hours work as required. Familiarity with hardware diagnostics and test methodologies, including PCIe/NVLink, NVIDIA NVML APIs, sensor telemetry, and stress testing for memory, CPU, and GPU components. Knowledge of rack-level test automation frameworks, including barcode scanning, firmware flashing, and test sequencing. Proficient in telemetry and monitoring systems such as Prometheus and Grafana for real-time test data visualization. Solid understanding of PCBA and enclosure-level design and manufacturing processes. Hands-on experience with software development in languages such as Python, SQL, C#, C++, or Rust. Familiarity with firmware/BIOS development and driver integration for Linux or Windows platforms. Experience with modern software development workflows and version control tools (e.g., Git). Direct experience integrating CPU/GPU test tools into high-volume manufacturing test flows. Proficient in Linux OS, including scripting and command-line operations. Experience with Azure DevOps Services, Power BI, or Power Automate is a plus. Ability to navigate ambiguity, translate complex concepts into practical processes, and drive implementation across cross-functional teams.
This job posting was last updated on 12/9/2025