via Breezy
$146K - 156K a year
Ensure reliability, uptime, and operational excellence of cloud infrastructure and platforms, leading incident response, automating operations, and improving system resilience.
Over 8 years in SRE/DevOps roles supporting production environments, with expertise in Kubernetes, cloud-native infrastructure, automation, monitoring, and incident management.
Quick Details LocationExperienceRateDuration Fully Remote (US) 8+ Years $70-75/hour 6 months+ About One Dynamic One Dynamic is a Service-Disabled Veteran-Owned Small Business (SDVOSB) headquartered in Fairfax, VA. We specialize in digital transformation, cloud infrastructure, quality assurance, and enterprise architecture for federal and healthcare organizations. We are currently seeking a Lead Site Reliability Engineer to support our client ARC, a rapidly growing device management company revolutionizing how frontline workers interact with enterprise mobile devices. About the Role The Lead Site Reliability Engineer is a senior technical leadership role responsible for the reliability, availability, and operational excellence of the cloud infrastructure and kiosks platform. This role owns uptime, SLAs, and incident response while driving long-term improvements to system resilience, observability, and operational maturity. The Lead SRE serves as both a hands-on technical leader and a force multiplier across platform, QA, and development teams. This role is well-suited for an experienced engineer who thrives in high-ownership environments and can balance real-time operational demands with strategic reliability initiatives. Strong communication, sound technical judgment, and a bias toward preventative engineering are critical to success. Key Responsibilities Own uptime, SLAs, and overall reliability of the cloud infrastructure and kiosks platform Lead incident response, root-cause analysis, and drive actionable postmortems Automate infrastructure, deployments, and operational tasks using modern IaC and scripting in collaboration with the Platform Engineering team Maintain and improve monitoring, alerting, and observability (e.g., Grafana, Prometheus, New Relic). Execute and continuously improve disaster recovery and business continuity plans Partner with platform engineering, QA, and development teams to ensure operational readiness Establish and maintain runbooks, operational standards, and reliability best practices Provide leadership, mentorship, and clear communication during both normal operations and incidents Optimize cloud and Kubernetes environments for reliability, performance, and scalability Required Qualifications 8+ years in SRE, DevOps, or Platform Engineering roles; 2+ years in a senior or lead capacity Strong experience supporting production environments with strict SLAs and high uptime requirements Deep knowledge of Kubernetes, containers, and cloud-native infrastructure Proficiency in automation and scripting using Bash, Python, or Go Hands-on experience with CI/CD pipelines and release engineering in modern environments Expert-level familiarity with IaC tools (Terraform preferred) Strong understanding of monitoring, alerting, logging, and observability tooling Experience implementing and managing GitOps workflows (ArgoCD or similar) Demonstrated ability to lead incidents and communicate effectively with technical and non-technical stakeholders Solid understanding of disaster recovery planning, resilience practices, and system hardening Must be authorized to work in the United States (US-based candidates only) The Ideal Candidate You think several steps ahead. You are relentless, strategic, and a long-term thinker. You believe the details are essential, and so you get them right. You are a fast learner. You take feedback well and implement it. You care about achieving the best outcome and do not focus on being right or wrong. About the Client ARC is a device management solution integrated with smart lockers, designed to store, secure, and charge company-owned handheld devices (E.g., Zebra, Honeywell) used by frontline workers to perform core job functions. Launched in late 2021, ARC was spun off from ChargeItSpot, a consumer-facing phone-charging technology company established in 2012. ARC's Mission: Minimize Device Waste. Maximize Worker Productivity. Make Life Easier. How to Apply If you have the unique combination of skills and qualities we are seeking, please submit your resume via One Dynamic's careers portal. We look forward to hearing from you! One Dynamic is an Equal Opportunity employer. Personnel are chosen based on ability without regard to race, color, religion, sex, national origin, disability, marital status, or sexual orientation, in accordance with federal and state law.
This job posting was last updated on 12/23/2025