via Remote Jobs
$Not specified
Design, build, and maintain IT infrastructure platforms with automation and endpoint management.
Proficient in IT systems administration, automation, and endpoint management with experience in patching and RMM tools.
Job Description: • Design and implement edge traffic routing that directs requests to the correct Cell in a way that's transparent to users. • Build and evolve the Topology Service that serves as the authoritative source of cluster state for routing, resource assignment, and Cell lifecycle decisions. • Collaborate across the GitLab Rails monolith and supporting services to make features and data models Cell-aware with feature teams across the product. • Operate and improve the routing and topology systems you build by participating in tier-2 on-call, responding to escalated incidents, and strengthening observability and operational tooling. • Author Architecture Decision Records (ADRs), operational runbooks, and documentation so other teams can understand, adopt, and extend the Cells platform. • Review merge requests from GitLab team members and community contributors, maintaining high standards for correctness, performance, and security across the stack. Requirements: • Experience building observable, resilient production services using Go or Ruby on Rails (TypeScript experience is a plus). • Background delivering and operating production systems in high-scale environments, including incident response and operational ownership. • Ability to reason about distributed systems, including consistency models, partitioning strategies, failure modes, and operational tradeoffs. • Experience building high-throughput networking services (gRPC and protocol buffers knowledge is a plus). • Familiarity working in large, multi-team codebases and coordinating changes across teams and services, including making features and data models Cell-aware. • Knowledge of observability practices such as metrics, tracing, and alerting, with an approach focused on building systems you'd be confident operating on-call. • Strong written communication skills for an async-first, globally distributed team, including documenting decisions (for example, architecture decision records) and runbooks. • Experience working with relational databases in production, including schema design, migrations, and query performance tuning (PostgreSQL experience is a plus). Benefits: • Benefits to support your health, finances, and well-being • Flexible Paid Time Off • Team Member Resource Groups • Equity Compensation & Employee Stock Purchase Plan • Growth and Development Fund • Parental leave • Home office support
This job posting was last updated on 3/3/2026