via Talents By Vaia
$120K - 160K a year
Design, manage, and scale core data platforms and observability stacks, drive SRE best practices, modernize CI/CD infrastructure, automate infrastructure management, and collaborate with development teams.
7+ years as Senior DevOps/SRE/Platform Engineer with strong Kubernetes, GitOps, scripting, Linux, cloud, CI/CD, observability, and distributed systems experience.
About the position Responsibilities • Own Core Data Platforms: Design, manage, and scale our diverse portfolio of datastores, including Cassandra, RDS/Aurora, Redis, Elasticsearch, and more. • Evolve Observability: Champion and advance our observability stack (Prometheus/Thanos, Grafana & ELK) to provide critical, real-time insights for hundreds of services. • Strengthen Reliability (SRE): Drive SRE best practices, including automating disaster recovery drills, managing our alerting strategy (AlertC), and improving system-wide resilience. • Modernize CI/CD: Help administer and optimize our CI/CD infrastructure, which includes Jenkins, Teamcity, and GitHub Actions. • Automate Everything: Leverage our Kubernetes and GitOps (ArgoCD/Flux) foundation to manage infrastructure as code, enhance developer self-service, and eliminate toil. • Collaborate & Consult: Act as a subject matter expert, partnering with development teams to help them choose, implement, and operate their data and observability solutions effectively. Requirements • 7+ years of industry experience as a Senior DevOps, SRE, or Platform Engineer. • Proven experience designing, analyzing, and troubleshooting large-scale distributed systems. • Strong hands-on experience with Kubernetes in a large-scale production environment. • Strong experience with GitOps principles and tools (e.g., ArgoCD, FluxCD). • Strong scripting or programming skills in languages like Python, Go, or Bash. • Deep, hands-on knowledge of Linux systems. • Strong experience with at least one major public cloud (AWS, GCP, or Azure). • Experience managing and scaling CI/CD systems (e.g., Jenkins, GitHub Actions, Teamcity). • Experience with modern observability stacks (e.g., Prometheus/Thanos, Grafana, ELK/OpenSearch). • Excellent problem-solving and collaboration skills, with a "customer-first" attitude toward supporting internal developers. Nice-to-haves • Experience with production datastores (e.g., Cassandra, RDS/Aurora, Redis, Elasticsearch) - this is a strong advantage. Benefits • Attractive package providing financial peace of mind, including competitive compensation, profit-sharing, daily meal vouchers (Swile), family health insurance (Alan), and a personalized relocation package (if needed). • Continuous investment in our employees' skills: in-house and external training, tech conference opportunities, internal mobility (individual contributor or management career ladder). • A well-balanced work-life for our employees is one of our top priorities: 35+ days off per year, hybrid work (2 days remote work per week), fully covered parental leave, and reserved daycare places. • Prioritizing employee well-being through premium work equipment, enjoyable work environment (work-life balance, team building events, summits), remote work subsidy, promoting Diversity & Inclusion with internal & external initiatives (women speaking groups, dedicated school partnerships), dedicated charitable time and sustainability actions (Eco Tree, subsidy for eco-mobility).
This job posting was last updated on 11/27/2025