Cystems Logic Inc

via Smartrecruiters

Apply Now

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

Infrastructure Engineer - Software Engineer – Infrastructure & Hardware Optimization - Remote

Anywhere

contractor

Posted 7/29/2025

Direct Apply

Key Skills:

Infrastructure Engineering

AI Inference Pipelines

CUDA

NCCL

ROCm

Kubernetes

Python

C++

Linux Systems Programming

MLOps

Telemetry Systems

Driver Orchestration

Container Security

Benchmarking

Hardware Abstraction

Cloud-native Inference

Compensation

Salary Range

$Not specified

Responsibilities

Design and implement cross-platform hardware detection systems for various accelerators. Collaborate with teams to ensure hardware-aware agent deployment across cloud providers.

Requirements

Candidates should have 4-7 years of experience in systems software or infrastructure engineering, with a focus on AI/ML workloads. Deep expertise in accelerator programming frameworks and strong programming skills in Python and C++ are essential.

Full Description

Job DescriptionHello, Infrastructure Engineer - Software Engineer – Infrastructure & Hardware Optimization - Remote We have below job opening.If you are interested and your experience match with job description.Please send your updated resume....Asap Software Engineer – Infrastructure & Hardware OptimizationLocation: SF, CA, Portland, OR, Dallas, TX - Remote but need to be local of respective locationDuration: 6 Months+ Contract Job Description: We are seeking a skilled low-level systems engineer to join the team. This individual will focus on infrastructure software that detects, configures, and optimizes AI inference pipelines across heterogeneous hardware accelerators (e.g., NVIDIA / AMD GPUs, TPUs, AWS Inferentia, FPGAs). You will work on hardware abstraction layers, containerized runtime environments, benchmarking, telemetry, and driver orchestration logic for multi-cloud agentic inference deployments. Ideal Experience: · 4–7 years experience in systems software or infrastructure engineering, preferably with exposure to AI/ML workloads. · Deep expertise in CUDA, NCCL, ROCm, or other accelerator programming frameworks. · Familiarity with LLM inference runtimes (TensorRT-LLM, vLLM, ONNXRuntime). · Experience with Kubernetes scheduling, device plugin development, and runtime patching for heterogeneous compute. · Strong Python/C++ and Linux systems programming skills. · Passion for building scalable, portable, and secure AI infrastructure. Responsibilities: · Design and implement cross-platform hardware detection systems for GPUs/TPUs/NPUs using CUDA, ROCm, and low-level runtime interfaces. · Build and maintain plugin-based infrastructure for capability scoring, power efficiency tuning, and memory optimization. · Develop hardware abstraction layers (HAL) and performance benchmarking tools to optimize AI agents for cloud-native inference. · Extend container-based MLOps systems (Docker/Kubernetes) with support for hardware-specific runtime containers (e.g., TensorRT, vLLM, ROCm). · Automate driver validation, container security hardening, and runtime health monitoring across deployments. · Integrate telemetry systems (Prometheus, Grafana) to surface per-device inference performance metrics and health status. · Collaborate with solutions and DevOps teams to ensure hardware-aware agent deployment across cloud providers.Additional InformationAll your information will be kept confidential according to EEO guidelines.

Apply Now

This job posting was last updated on 7/30/2025

JobLogr is the job search platform that gets you hired faster.

Save $15kIN LOST INCOME

Get back 54 hrsOF YOUR LIFE

Hired 3.5x fasterTHAN YOU WOULD

Estimates based on average job search habits

Try for Free

No credit card required.

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »