Find your dream job faster with JobLogr
AI-powered job search, resume help, and more.
Try for Free
MI

Microsoft

via Eightfold

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

Senior Reliability Engineer

Aliso Viejo, California, Mountain View, California, San Diego, California, San Jose, California, Sunnyvale, California, Boise, Idaho, Hillsboro, Oregon, Austin, Texas, Redmond, Washington
Full-time
Posted 12/4/2025
Direct Apply
Key Skills:
Go programming
Cloud infrastructure (Azure preferred)
Large-scale distributed systems
Monitoring tools
Incident management
DevOps practices
Network engineering
System design
On-call rotation experience

Compensation

Salary Range

$120K - 160K a year

Responsibilities

Ensure production reliability through monitoring, incident response, root cause analysis, and collaboration with hardware/firmware teams, supporting 24x7 data center operations.

Requirements

Bachelor's or Master's degree with 2+ years software engineering or systems experience, proficiency in programming (Go, Python, C#), understanding of cloud infrastructure and networking, and experience with monitoring and incident management.

Full Description

Build and bring specializedknowledge across multiple production aspects (monitoring, release engineering, testing, live site excellence, buildout, performance optimization, capacity management) Analyze large-scale telemetry and operational data to uncover insights and drive data-informed decisions. Use the proven set of principles and practices such as safe deployment, testing for reliability, single point of failures elimination, disaster recovery, SLOs based monitoring, throttling, infrastructure management automation, post-mortem excellence, and adoption of common systems Respond to alerts and incidents. Build and follow playbooks to drive root cause analysis and reviews Partner with hardware and firmware teams to understand system behavior and identify opportunities for predictive analytics. Participate in an on-call rotation and availability during non-standard business hours and contribute to service reliability and incident resolution. Master's Degree in Computer Science, Information Technology, or related field AND 2+ years technical experience in software engineering, network engineering, or systems administration OR Bachelor's Degree in Computer Science, Information Technology, or related field AND 4+ years technical experience in software engineering, network engineering, or systems administration - OR equivalent experience. 3+ years of experience in software engineering or operations for large-scale distributed systems. Ability to support a 24x7 data center environment, including participation in an on-call rotation and availability during non-standard business hours(evening, nights, weekends, or holidays) as operational needs require. Proficiency in one or more programming languages (C#, Python, Go, or similar). Understanding of cloud infrastructure (Azure preferred), networking, and system design. Familiarity with monitoring tools, incident management frameworks, and DevOps practices.

This job posting was last updated on 12/5/2025

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »
JobLogr badgeTinyLaunch BadgeJobLogr - AI Job Search Tools to Land Your Next Job Faster than Ever | Product Hunt