via Truemote
$110K - 165K a year
Maintain and support AI infrastructure ensuring performance and availability of AI tools and services.
Requires 2+ years experience in AI or cloud infrastructure support, knowledge of cloud platforms, GPU systems, and AI frameworks.
Title: AI Infrastructure System Administrator Location: Fully REMOTE! Salary: $110-165k/year + RSUs + Annual Bonus Position Overview We are seeking a talented AI Infrastructure System Administrator to join our team. The ideal candidate will be responsible for maintaining and supporting our AI infrastructure, ensuring optimal performance and availability of AI tools and services. This role is critical in providing technical support and troubleshooting assistance to our users, helping to drive customer satisfaction and operational efficiency. Key Responsibilities • Provide technical support for AI infrastructure, including GPU and distributed systems. • Troubleshoot and resolve issues related to AI tools and platforms such as Neocloud, PyTorch, Tensorflow, and JAX. • Manage incidents and support requests through a ticketing system, ensuring adherence to SLAs. • Collaborate with DevOps and Site Reliability Engineering teams to maintain cloud infrastructure on AWS, GCP, and Azure. • Assist in the deployment and maintenance of AI applications and services in a hyperscale environment. • Monitor system performance and implement optimizations to improve efficiency and reliability. • Work closely with other teams to ensure seamless integration of AI services into existing infrastructure. Qualifications • Bachelor's degree in Computer Science, Engineering, or a related field. • 2+ years of relent experience in a support or solutions engineering role in AI or cloud infrastructure environments. • Low Level Networking Experience- Network Command Line/ Network Protocols/Network Debugging Experience Nice to have • Strong knowledge of cloud platforms such as AWS, GCP, and Azure. • Experience with GPU systems and distributed computing. • Familiarity with AI frameworks like PyTorch, Tensorflow, and JAX. • Excellent troubleshooting skills and the ability to resolve complex technical issues. • Strong communication skills and a customer-centric approach.
This job posting was last updated on 3/13/2026