$130K - 180K a year
Lead and own the quality and reliability program, drive cross-functional initiatives, manage production readiness, and improve incident management processes.
8+ years program management with 3+ years in technical reliability domains, strong technical understanding of distributed systems, SDLC, CI/CD, and experience with Atlassian tools.
Description: • Own the Quality & Reliability Program: Define and drive the vision for quality—across proactive practices (testing, deployment, observability), reactive processes (incident response, external communications), and cultural expectations (quality ownership, readiness). • Lead Cross-Functional Programs: Drive reliability and quality initiatives across Engineering, Product, Operations, and Customer Success. • Production Readiness: Own the Production Readiness Review (PRR) process; ensure all releases meet reliability standards before they go live. • Define and Drive SLOs: Establish and track Service Level Objectives (SLOs). Build visibility into reliability metrics and lead efforts to meet or exceed targets. • Improve Incident Management: Streamline incident response and postmortems. Drive structural improvements in tooling, communication, and ownership. • Scale Tooling & Automation: Collaborate across teams to enhance observability, alerting, testing automation, and response tooling. • Mitigate System Risk: Identify risk vectors early, build mitigation plans, and drive resolution with urgency. • Drive Alignment: Influence across Eng, Product, Ops, and GTM teams to prioritize reliability and integrate quality into every initiative. • Track Progress: Use tools like Atlas, Jira, and internal dashboards to maintain clarity on goals, risks, and outcomes. • Embed Continuous Learning: Build programs that ensure we learn from every incident, test edge cases, and continuously harden our systems. Requirements: • 8+ years of program management experience, with at least 3 years in technical, reliability, or quality-focused domains. • Strong understanding of system architecture, distributed systems, and reliability engineering principles. • Familiarity with SDLC models, CI/CD pipelines, deployment automation, observability, and incident management tooling. • Demonstrated success defining and improving SLOs, SLIs, and production readiness processes. • Proven ability to lead large-scale, cross-functional programs across Engineering, Product, Operations, and Customer Success. • Skilled at translating complex technical goals into clear, actionable, and measurable outcomes. • Experienced in using Atlassian tools (e.g., Jira, Atlas) for program tracking, reporting, and executive communication. • Adept at navigating ambiguity, building alignment, and driving decision-making without formal authority. • Comfortable balancing technical depth with business priorities to influence outcomes. • Bachelor’s degree in Computer Science, Engineering, or related technical field, or equivalent practical experience. • Bonus: Experience in regulated or high-availability industries such as fintech, healthcare, or infrastructure. Benefits: • Base salary per year (paid semi-monthly) • Stock options with standard startup vesting - 1 year cliff; 4 years total • $50 monthly communication expense stipend to go towards your phone/internet bill • $250 stipend to enhance your WFH setup • Reimbursement for peripheral equipment: monitor (up to $400), keyboard and mouse (up to $200) • Premium medical benefits including vision and dental (100% coverage for employees) • Company-sponsored life and disability insurance • Paid parental bonding leave • Paid sick leave, jury duty, bereavement • 401k plan • Flexible Time Off (our team members typically take off ~3-4 weeks per year) • Volunteer Time Off • 13 scheduled holidays • 2x / year in-person team meet-ups
This job posting was last updated on 9/11/2025