$Not specified
Design and implement cross-platform hardware detection systems for various accelerators. Collaborate with teams to ensure hardware-aware agent deployment across cloud providers.
Candidates should have 4-7 years of experience in systems software or infrastructure engineering, with a focus on AI/ML workloads. Deep expertise in accelerator programming frameworks and strong programming skills in Python and C++ are essential.
Job DescriptionHello, Infrastructure Engineer - Software Engineer – Infrastructure & Hardware Optimization - Remote We have below job opening.If you are interested and your experience match with job description.Please send your updated resume....Asap Software Engineer – Infrastructure & Hardware OptimizationLocation: SF, CA, Portland, OR, Dallas, TX - Remote but need to be local of respective locationDuration: 6 Months+ Contract Job Description: We are seeking a skilled low-level systems engineer to join the team. This individual will focus on infrastructure software that detects, configures, and optimizes AI inference pipelines across heterogeneous hardware accelerators (e.g., NVIDIA / AMD GPUs, TPUs, AWS Inferentia, FPGAs). You will work on hardware abstraction layers, containerized runtime environments, benchmarking, telemetry, and driver orchestration logic for multi-cloud agentic inference deployments. Ideal Experience: · 4–7 years experience in systems software or infrastructure engineering, preferably with exposure to AI/ML workloads. · Deep expertise in CUDA, NCCL, ROCm, or other accelerator programming frameworks. · Familiarity with LLM inference runtimes (TensorRT-LLM, vLLM, ONNXRuntime). · Experience with Kubernetes scheduling, device plugin development, and runtime patching for heterogeneous compute. · Strong Python/C++ and Linux systems programming skills. · Passion for building scalable, portable, and secure AI infrastructure. Responsibilities: · Design and implement cross-platform hardware detection systems for GPUs/TPUs/NPUs using CUDA, ROCm, and low-level runtime interfaces. · Build and maintain plugin-based infrastructure for capability scoring, power efficiency tuning, and memory optimization. · Develop hardware abstraction layers (HAL) and performance benchmarking tools to optimize AI agents for cloud-native inference. · Extend container-based MLOps systems (Docker/Kubernetes) with support for hardware-specific runtime containers (e.g., TensorRT, vLLM, ROCm). · Automate driver validation, container security hardening, and runtime health monitoring across deployments. · Integrate telemetry systems (Prometheus, Grafana) to surface per-device inference performance metrics and health status. · Collaborate with solutions and DevOps teams to ensure hardware-aware agent deployment across cloud providers.Additional InformationAll your information will be kept confidential according to EEO guidelines.
This job posting was last updated on 7/30/2025