via Gem
$282K - 332K a year
Design, build, automate, and operate large-scale cloud infrastructure with deployment pipelines and monitoring for high reliability.
10+ years software engineering, 5+ years GCP, expertise in Kubernetes, Terraform, Go, Rust, cloud security, and incident management.
Senior Staff Infrastructure Engineer, GroqCloud Mission: Design, build, and operate large-scale cloud systems to deliver the fastest inference engine in the world. Responsibilities & opportunities in this role: Infrastructure Development: Design, build, and automate cloud infrastructure using Terraform to support a wide variety of needs. Service Deployment & Orchestration: Build and manage robust deployment pipelines and GitOps workflows into Kubernetes-based environments. Continuously improve CI/CD processes to facilitate rapid, reliable rollouts of new features and services, ensuring minimal downtime and maximum velocity. System troubleshooting: Lead investigations to determine root causes of system failures and develop scripts to repair and automate the upkeep of infrastructure components. Observability enhancement: Implement comprehensive monitoring (tracing, metrics, logging, alerting) to swiftly pinpoint, diagnose, and resolve system issues. Efficient incident response: Manage critical system incidents as a first responder, ensuring swift resolution and comprehensive post-incident analyses with implemented remediations. Cross Functional Collaboration: Collaborate with software engineers, platform & networking engineers, product managers and sales to enable feature delivery. Ideal candidates have/are: 10+ years of experience in software engineering or a related field. 5+ years experience with GCP (especially VPC, Hybrid Networking, IAM, and GKE). Actively working with modern Infrastructure-as-Code technologies (Kubernetes, Terraform, Flux/ArgoCD, Kustomize, Crossplane) Experience with open-source monitoring tool (Prometheus, Grafana, VictoriaMetrics, VictoriaLogging and Alert Manager) Deep experience in cloud technologies, global scale applications, and automation. Familiarity with multi-region deployments, including the associated networking, latency, and failover challenges History of debugging production issues, mitigating, and driving efficient resolution. Comfortable reading, writing, and debugging software in multiple languages, especially Go and Rust. Thorough understanding of cloud-security best practices and modern compliance controls. Compensation: At Groq, a competitive base salary is part of our comprehensive compensation package, which includes equity and benefits. For this role, the base salary range is $282,100 to $331,900 determined by your location, skills, qualifications, experience and internal benchmarks. This range is specific to roles in the United States, compensation for candidates outside the USA will be dependent on the local market.
This job posting was last updated on 11/24/2025