Chess.com

via Remote Rocketship

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

Senior SRE – Distributed Systems, Cloud Infrastructure

Anywhere

Full-time

Posted 10/7/2025

Verified Source

Key Skills:

Kubernetes

Terraform

GitOps

Golang

Distributed Systems

Cloud-native Services

Observability

Incident Response

Compensation

Salary Range

$120K - 180K a year

Responsibilities

Lead design and optimization of cloud-native services using Kubernetes, Terraform, and GitOps, develop high-performance distributed systems, and improve infrastructure reliability and scalability.

Requirements

5+ years managing large-scale cloud-native distributed systems with expertise in Kubernetes, Terraform, GitOps, Golang, observability, and incident response.

Full Description

Description: • Lead the design and optimization of cloud-native services using Kubernetes, Terraform, and GitOps tools like ArgoCD • Develop high-performance integration patterns and manage scalable, distributed systems handling extensive data volumes • Dive into Golang and TypeScript codebases to identify and resolve performance bottlenecks at scale • Optimize infrastructure and application code to achieve aggressive performance and reliability targets, with a focus on chess programming at the bits level • Work closely with development teams to refine cloud service integration architectures and implement best practices • Monitor and enhance system reliability and performance through effective collaboration and innovative solutions • Participate in incident response for critical infrastructure issues, ensuring rapid resolution and minimal downtime • Drive improvements in infrastructure reliability, scalability, and operational efficiency • Utilize Terraform and Kubernetes to manage and scale our cloud infrastructure, ensuring robust, automated deployment processes Requirements: • 5+ years of experience managing and scaling large-scale, cloud-native distributed systems • Deep understanding of Kubernetes, Terraform, and GitOps practices • Expert in observability practices and ability to support incident response / on call • Extensive experience in high-performance service development with Golang • Proven ability to profile and optimize applications for high throughput and reliable operation • Strong knowledge of distributed systems design, failure modes, and robust architectural principles • Experience with data modeling and indexing strategies to support efficient service operations • Demonstrated experience improving system reliability and performance through deep code-level and architectural analysis • Excellent written and verbal communication skills • Experience working in globally distributed teams Benefits: • 100% remote (work from anywhere!)

This job posting was last updated on 10/11/2025

JobLogr gets you hired faster

Save $15k

in lost income

Get back 54 hrs + hired 3.5x faster

than average job search

Try for Free

No credit card required

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »