Nastech Global

via Monster

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

AI Infra SRE Engineer

Anywhere

Full-time

Posted 10/8/2025

Verified Source

Key Skills:

NVIDIA DGX or equivalent HPC clusters

Cisco UCS C885A

Docker

Python

GoLang

CI/CD systems (GitLab, GitHub Actions, Jenkins)

Terraform

Ansible

Kubernetes (RedHat OpenShift, Google Anthos)

Compensation

Salary Range

$120K - 160K a year

Responsibilities

Manage and ensure reliability, scalability, and performance of NVIDIA DGX and Cisco UCS HPC infrastructure, automate operations using Python, Ansible, Terraform, and Go, and deliver automation via CI/CD pipelines.

Requirements

Experience with NVIDIA DGX or equivalent HPC clusters, Cisco UCS C885A, Docker, automation tools like Python, Ansible, Terraform, Go, and CI/CD pipelines.

Full Description

Position: AI Infra SRE Engineer – DGX Location: Remote Duration: Fulltime Must-have • NVIDIA (DGX) or equivalent high-performance-compute (HPC) clusters (e.g. Cray, HPE, IBM) • Cisco UCS C885A • Docker Good to have • DevOps Automation • CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins) • Terraform, Ansible, Jenkins • Python • GoLang, C/C++ • Enterprise Grade Kubernetes cluster (RedHat OpenShift – preferred) and/or Google Anthos • Software development lifecycle includes design, development, testing, packaging, and deployment using Golang Roles & Responsibilities • Technical knowledge of high-performance compute, NVIDIA DGX/GPUs and/or Cisco Unified Compute System. • Handle availability, latency, scalability and efficiency of NVIDIA and Cisco UCS infrastructure • by instilling engineering reliability into the development life cycle with a focus on fault tolerant approaches. • Drive capacity planning, performance analysis, instrumentation, and other non-functional systems requirements. • Automate operational capabilities using Python, Ansible, Terraform, Go etc. • Deliver automation through CI/CD pipeline and chatbot etc. • Implement metrics driven processes to ensure service quality targets are met.

This job posting was last updated on 10/11/2025

JobLogr gets you hired faster

Save $15k

in lost income

Get back 54 hrs + hired 3.5x faster

than average job search

Try for Free

No credit card required

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »