Find your dream job faster with JobLogr
AI-powered job search, resume help, and more.
Try for Free
Chess.com

Chess.com

via Remote Rocketship

Apply Now
All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

Senior SRE – Distributed Systems, Cloud Infrastructure

Anywhere
full-time
Posted 10/7/2025
Verified Source
Key Skills:
Kubernetes
Terraform
GitOps
Golang
Distributed Systems
Cloud-native Services
Observability
Incident Response

Compensation

Salary Range

$120K - 180K a year

Responsibilities

Lead design and optimization of cloud-native services using Kubernetes, Terraform, and GitOps, develop high-performance distributed systems, and improve infrastructure reliability and scalability.

Requirements

5+ years managing large-scale cloud-native distributed systems with expertise in Kubernetes, Terraform, GitOps, Golang, observability, and incident response.

Full Description

Description: • Lead the design and optimization of cloud-native services using Kubernetes, Terraform, and GitOps tools like ArgoCD • Develop high-performance integration patterns and manage scalable, distributed systems handling extensive data volumes • Dive into Golang and TypeScript codebases to identify and resolve performance bottlenecks at scale • Optimize infrastructure and application code to achieve aggressive performance and reliability targets, with a focus on chess programming at the bits level • Work closely with development teams to refine cloud service integration architectures and implement best practices • Monitor and enhance system reliability and performance through effective collaboration and innovative solutions • Participate in incident response for critical infrastructure issues, ensuring rapid resolution and minimal downtime • Drive improvements in infrastructure reliability, scalability, and operational efficiency • Utilize Terraform and Kubernetes to manage and scale our cloud infrastructure, ensuring robust, automated deployment processes Requirements: • 5+ years of experience managing and scaling large-scale, cloud-native distributed systems • Deep understanding of Kubernetes, Terraform, and GitOps practices • Expert in observability practices and ability to support incident response / on call • Extensive experience in high-performance service development with Golang • Proven ability to profile and optimize applications for high throughput and reliable operation • Strong knowledge of distributed systems design, failure modes, and robust architectural principles • Experience with data modeling and indexing strategies to support efficient service operations • Demonstrated experience improving system reliability and performance through deep code-level and architectural analysis • Excellent written and verbal communication skills • Experience working in globally distributed teams Benefits: • 100% remote (work from anywhere!)

This job posting was last updated on 10/11/2025

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »
JobLogr badgeTinyLaunch BadgeJobLogr - AI Job Search Tools to Land Your Next Job Faster than Ever | Product Hunt