Find your dream job faster with JobLogr
AI-powered job search, resume help, and more.
Try for Free
NG

Nastech Global

via Monster

Apply Now
All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

AI Infra SRE Engineer

Anywhere
full-time
Posted 10/8/2025
Verified Source
Key Skills:
NVIDIA DGX or equivalent HPC clusters
Cisco UCS C885A
Docker
Python
GoLang
CI/CD systems (GitLab, GitHub Actions, Jenkins)
Terraform
Ansible
Kubernetes (RedHat OpenShift, Google Anthos)

Compensation

Salary Range

$120K - 160K a year

Responsibilities

Manage and ensure reliability, scalability, and performance of NVIDIA DGX and Cisco UCS HPC infrastructure, automate operations using Python, Ansible, Terraform, and Go, and deliver automation via CI/CD pipelines.

Requirements

Experience with NVIDIA DGX or equivalent HPC clusters, Cisco UCS C885A, Docker, automation tools like Python, Ansible, Terraform, Go, and CI/CD pipelines.

Full Description

Position: AI Infra SRE Engineer – DGX Location: Remote Duration: Fulltime Must-have • NVIDIA (DGX) or equivalent high-performance-compute (HPC) clusters (e.g. Cray, HPE, IBM) • Cisco UCS C885A • Docker Good to have • DevOps Automation • CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins) • Terraform, Ansible, Jenkins • Python • GoLang, C/C++ • Enterprise Grade Kubernetes cluster (RedHat OpenShift – preferred) and/or Google Anthos • Software development lifecycle includes design, development, testing, packaging, and deployment using Golang Roles & Responsibilities • Technical knowledge of high-performance compute, NVIDIA DGX/GPUs and/or Cisco Unified Compute System. • Handle availability, latency, scalability and efficiency of NVIDIA and Cisco UCS infrastructure • by instilling engineering reliability into the development life cycle with a focus on fault tolerant approaches. • Drive capacity planning, performance analysis, instrumentation, and other non-functional systems requirements. • Automate operational capabilities using Python, Ansible, Terraform, Go etc. • Deliver automation through CI/CD pipeline and chatbot etc. • Implement metrics driven processes to ensure service quality targets are met.

This job posting was last updated on 10/11/2025

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »
JobLogr badgeTinyLaunch BadgeJobLogr - AI Job Search Tools to Land Your Next Job Faster than Ever | Product Hunt