$184K - 357K a year
The Senior Systems Software Engineer will design, build, and operate GPU infrastructure management systems for enterprise and cloud environments. This role involves developing Kubernetes operators and multi-cloud provisioning solutions while enhancing observability and operational efficiency.
Candidates should have a bachelor's degree in Computer Science or a related field and at least 8 years of professional experience with Kubernetes and Site Reliability Engineering. Strong communication skills and the ability to manage multiple priorities in a fast-paced environment are essential.
This position is posted by Jobgether on behalf of a partner company. We are currently looking for a Senior Systems Software Engineer, Containers and Kubernetes in the United States. This role offers an exciting opportunity to design, build, and operate cutting-edge GPU infrastructure management systems for enterprise and cloud environments. The Senior Systems Software Engineer will lead development of Kubernetes operators, multi-cloud provisioning solutions, and HPC integration frameworks that scale from single-node systems to clusters of thousands of nodes. You will collaborate with cross-functional teams to enhance observability, reliability, and operational efficiency while supporting next-generation AI workloads. The position combines hands-on engineering with strategic problem-solving, impacting cloud and on-prem infrastructure used by internal and external stakeholders. Ideal candidates thrive in a fast-paced, dynamic environment and are motivated by solving complex distributed systems challenges at scale. Accountabilities: · Develop, maintain, and operate scalable Go programs in Kubernetes and cloud-native environments. · Build next-generation multi-cloud infrastructure management systems to support large-scale AI and HPC deployments. · Enable GPU provisioning, life-cycle management, and end-to-end orchestration using tools such as Kubernetes, Docker, Prometheus, Terraform, and Crossplane. · Support internal and external users through bug fixes, documentation, feature improvements, and Day 2 operations. · Maintain high-quality products through robust test coverage, monitoring, and observability systems. · Collaborate with cross-functional teams to enhance infrastructure reliability, performance, and operational scalability. · Bachelor’s degree or higher in Computer Science or related engineering field (or equivalent experience). · 8+ years of professional experience with a strong background in Kubernetes and Site Reliability Engineering (SRE). · Solid understanding and execution skills across the software development lifecycle. · Experience with OpenAPI and Kubernetes Custom Resource Definitions (CRDs). · Excellent written and verbal communication skills in English and strong interpersonal abilities. · Ability to manage multiple priorities effectively in a fast-paced environment. · Demonstrated motivation to learn new technologies and adapt to evolving infrastructure needs. Preferred Qualifications: · Open-source contributions to the Cloud-Native community and familiarity with AI/LLM workloads. · Experience with CI/CD pipelines in GitHub/GitLab and advanced application configuration management. · Deep expertise in containerization, orchestration frameworks, and observability tools. · Exposure to GPU programming with CUDA and development of Kubernetes operators. · Experience with HPC schedulers or managing multi-cloud deployments. · Competitive base salary: $184,000–$356,500 USD depending on level, experience, and location. · Eligibility for equity and performance-based incentives. · Comprehensive healthcare coverage including medical, dental, and vision. · Flexible work environment supporting collaboration, innovation, and professional growth. · Opportunity to work on state-of-the-art GPU infrastructure, cloud-native systems, and AI solutions. Jobgether is a Talent Matching Platform that partners with companies worldwide to efficiently connect top talent with the right opportunities through AI-driven job matching. When you apply, your profile goes through our AI-powered screening process designed to identify top talent efficiently and fairly. 🔍 Our AI evaluates your CV and LinkedIn profile thoroughly, analyzing your skills, experience, and achievements. 📊 It compares your profile to the job’s core requirements and past success factors to determine your match score. 🎯 Based on this analysis, we automatically shortlist the 3 candidates with the highest match to the role. 🧠 When necessary, our human team may perform an additional manual review to ensure no strong profile is missed. The process is transparent, skills-based, and free of bias—focusing solely on your fit for the role. Once the shortlist is completed, we share it directly with the company that owns the job opening. The final decision and next steps (such as interviews or additional assessments) are then made by their internal hiring team. Thank you for your interest! #LI-CL1
This job posting was last updated on 10/9/2025