$120K - 235K a year
Evaluate and optimize datacenter operational readiness and performance through standardized assessments and risk mitigation.
Requires advanced engineering degree or extensive technical engineering experience in datacenters or critical environments, plus strong leadership and communication skills.
Microsoft’s Cloud Operations & Innovation (CO+I) is the engine that powers our cloud services. At its core, datacenter availability isn't just a metric, but a promise of continuity. It is imperative to identify availability improvements & opportunities across Microsoft datacenters. This goal will continually allow our operational cloud to scale in a safe, secure, and reliable manner for our customers. The Continuous Evaluation Program (CEP) is a strategic initiative within Microsoft’s global datacenter operations, designed to systematically assess, monitor, and optimize the ongoing operational readiness of our infrastructure. CEP plays a critical role in strengthening Microsoft’s availability, business reputation, and customer experience by proactively identifying risks, mitigating exposure, and driving consistency in operational excellence. As we accelerate our speed to market, CEP ensures scalable, reliable, and high-quality solutions through continuous evaluation and behavioral influence. Key Program Focus Areas • Availability: Provides impartial assessments of operational readiness across the datacenter fleet, ensuring consistent uptime and performance. • Standardized Evaluation Framework: Utilizes clear, measurable benchmarks derived from Microsoft’s datacenter operational standards to guide ongoing site evaluations. • Data-Driven Risk Mitigation: Leverages historical data to identify patterns in equipment and system failures, enabling proactive risk identification and elimination. • Scalable Operational Processes: Implements optimized and standardized procedures that support rapid growth without compromising reliability or quality. • Proactive Issue Detection: Identifies potential risks early, with a goal to prevent disruptions and ensure higher availability across operational sites. • Culture of Continuous Improvement: Promotes innovation and agility, ensuring Microsoft’s datacenter infrastructure remains resilient, adaptable, and future-ready. Responsibilities • Align with Microsoft’s culture, objectives and operational standards. • Deliver a best-in-class, objective and impartial evaluation program monitoring Microsoft’s datacenter infrastructure, operational capabilities and performance against our standards, best practices and programs. • Drive global consistency of processes, procedures, and reporting with local operations teams. • Develop methodologies and metrics to validate data center performance, system control parameters and operational efficiency against design intent. • Support Microsoft’s datacenter portfolio expansion to include new country and facility onboarding through operational and site risk reviews. • Manage programs associated with operational readiness. • Review compliance with existing corrective and preventative maintenance programs to enhance operational readiness. • Evolve operational excellence with key focus areas of risk management, uptime availability and safety. • Focus on improved environmental performance, compliance, and risk management. • Support and promote improvement, best practices, corrective and preventive actions • Engages with appropriate partner teams to support initiatives, tasks or projects. • Establish working relationships and engagement with our Engineering Groups (EGs), key partners and Landlord partners (including contributing to MBRs and QBRs) • Work with regional and global peers to share and build best practices across the entire datacenter portfolio. • Partner with regional operational leadership and local teams to reduce high-impacting and human-error Critical Environment (CE) incidents year over year. • Monitor and verify the implementation and effectiveness of remediation action plans. • Create an environment to promote learning and innovation opportunities. • Obtain a clear understanding of Microsoft’s day-to-day operation, management and maintenance expectations for all critical equipment, controls and processes including (but not limited to), operating procedures, standards, change management and drills. • Develop methodologies and metrics to validate datacenter performance, system control parameters and operational efficiency against design intent. • Support Microsoft’s datacenter portfolio expansion to include new country and facility onboarding through operational and site risk reviews. Qualifications Required/Minimum Qualifications: • Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 2+ years technical engineering experience • OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 4+ years technical engineering experience • OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 5+ years technical engineering experience • OR 12+ years relevant technical engineering experience. • 2+ years of relevant work experience in datacenters, pharma, or large-scale manufacturing operations environment. Background Check Requirements Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter. • Citizenship & Citizenship Verification: This position requires verification of citizenship due to citizenship-based legal restrictions. Specifically, this position supports United States federal, state, and/or local government agency customers and is subject to certain citizenship-based restrictions where required or permitted by applicable law. To meet this legal requirement, and as a condition ofemployment, the successful candidate’s citizenship will be verified with a valid passport. Preferred Qualifications • Doctorate Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 4+ years technical engineering experience • OR Master's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 7+ years technical engineering experience • OR Bachelor's Degree in Mechanical Engineering, Materials Engineering, Reliability Engineering, Electrical Engineering, or related field AND 9+ years technical engineering experience. • 4+ years experience working in critical environments. • 9+ years in Data Center or Critical Environment’s Mechanical, Electrical and Control systems. • Communication and leadership skills with the ability to influence and drive change. • Experience leading and managing a business-critical function. • Understanding of datacenter and uptime methodologies or equivalent Mission Critical facilities. Reliability Engineering IC4 - The typical base pay range for this role across the U.S. is USD $119,800 - $234,700 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $158,400 - $258,000 per year. Certain roles may be eligible for benefits and other compensation. Find additional benefits and pay information here: https://careers.microsoft.com/us/en/us-corporate-pay Microsoft will accept applications for the role until September 15, 2025. #COICareers | #EPCCareers | #DCDCareers Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
This job posting was last updated on 8/30/2025