$Not specified
The Site Reliability Engineer will focus on maintaining and improving platform reliability and performance. The role requires collaboration with development teams to solve complex technical challenges.
Candidates must have over 7 years of experience in platform engineering or SRE roles, with a strong background in Kubernetes and public cloud environments. A bachelor's degree in a related field is required, along with experience in CI/CD tools and observability tools.
Required skills: 7+ years of experience in platform Engineering/SRE roles using an object oriented language (Python, Golang, etc) Bachelor’s degree in Computer Science, Computer Engineering or equivalent combination of education and experience Extensive experience working with Kubernetes in a public cloud (GKE, EKS, AKS, etc) Experience working with Istio/Service Mesh Experience working with IaC (Terraform, Pulumi, etc) Experience working within a Public Cloud environment (GCP, AWS, Azure, etc) Experience working with CI/CD tools such as Argo, Buildkite, TravisCI, Jenkins, Spinnaker, etc Experience working with platform observability tools (Prometheus, Thanos, Grafana, Fluentbit, Cloud Monitoring, Google Cloud Logging, Datadog, Pagerduty, Cloudwatch, Kibana, Elastic Search, Splunk, VictorOps, etc) Experience with Networking Experience and desire to work in an agile environment Analytical mindset and passion for solving business problems with technology Nice To Haves: Experience working with Dev Testing tools and patterns such as Garden, Flagger, Canary Deployments, Blue/Green Testing, A/B Testing Experience setting up and working with Kubernetes Admission Control (Kyverno, OPA, etc) Experience working with workload scaling (HPA, VPA, Capacity Planning/Reservations, etc)
This job posting was last updated on 8/23/2025