Nastech Global

1 open position available

1 location

1 employment type

Actively hiring

Full-time

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

Latest Positions

Showing 1 most recent job

AI Infra SRE Engineer

Nastech Global•Anywhere•Full-time

View Job

Compensation$120K - 160K a year

Manage and ensure reliability, scalability, and performance of NVIDIA DGX and Cisco UCS HPC infrastructure, automate operations using Python, Ansible, Terraform, and Go, and deliver automation via CI/CD pipelines. | Experience with NVIDIA DGX or equivalent HPC clusters, Cisco UCS C885A, Docker, automation tools like Python, Ansible, Terraform, Go, and CI/CD pipelines. | Position: AI Infra SRE Engineer – DGX Location: Remote Duration: Fulltime Must-have • NVIDIA (DGX) or equivalent high-performance-compute (HPC) clusters (e.g. Cray, HPE, IBM) • Cisco UCS C885A • Docker Good to have • DevOps Automation • CI/CD systems (e.g., GitLab, GitHub Actions, Jenkins) • Terraform, Ansible, Jenkins • Python • GoLang, C/C++ • Enterprise Grade Kubernetes cluster (RedHat OpenShift – preferred) and/or Google Anthos • Software development lifecycle includes design, development, testing, packaging, and deployment using Golang Roles & Responsibilities • Technical knowledge of high-performance compute, NVIDIA DGX/GPUs and/or Cisco Unified Compute System. • Handle availability, latency, scalability and efficiency of NVIDIA and Cisco UCS infrastructure • by instilling engineering reliability into the development life cycle with a focus on fault tolerant approaches. • Drive capacity planning, performance analysis, instrumentation, and other non-functional systems requirements. • Automate operational capabilities using Python, Ansible, Terraform, Go etc. • Deliver automation through CI/CD pipeline and chatbot etc. • Implement metrics driven processes to ensure service quality targets are met.

NVIDIA DGX or equivalent HPC clusters

Cisco UCS C885A

Docker

Python

GoLang

CI/CD systems (GitLab, GitHub Actions, Jenkins)

Terraform

Ansible

Kubernetes (RedHat OpenShift, Google Anthos)

Verified Source

Posted 10 months ago

Ready to join Nastech Global?

Create tailored applications specifically for Nastech Global with our AI-powered resume builder

Get Started for Free

Find Your Perfect Job Titles

Not sure what job titles match your skills? Our AI-powered tool analyzes your experience and suggests the best job titles to search for.

Free to use, no signup required

AI-powered recommendations

Discover hidden opportunities

Try It Free →

JobLogr gets you hired faster

Save $15k

in lost income

Get back 54 hrs + hired 3.5x faster

than average job search

Try for Free

No credit card required

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »