CrowdStrike

via SimplyHired

Apply Now

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

Principal Engineer, Operational Excellence – Resilience

Anywhere

full-time

Posted 10/15/2025

Verified Source

Key Skills:

Technology resilience

Disaster recovery

Site reliability engineering

Infrastructure redundancy

Chaos engineering

Cloud-native environments

Feature management systems

Progressive deployment

Multi-tenant architecture

Scalability engineering

Strategic initiative leadership

Stakeholder influence

Resilience KPIs

Executive communication

Compensation

Salary Range

$150K - 220K a year

Responsibilities

Lead and coordinate enterprise-wide technology resilience initiatives, develop and implement recovery strategies, drive chaos engineering programs, and provide strategic guidance across business and engineering teams.

Requirements

10+ years in technology resilience or related fields, expertise in cloud-native resilience architectures, disaster recovery certifications, strong leadership and communication skills, and a relevant bachelor's degree or equivalent experience.

Full Description

Job Description: • Facilitate coordination between stakeholders across IT, Product, Engineering, and business units, serving as the central point for technology resilience initiatives and ensuring alignment with business objectives • Own and maintain enterprise-wide technology resilience standards, ensuring consistent implementation and reducing organizational drift from established frameworks across infrastructure, application, and product domains • Drive comprehensive technical resilience architecture including infrastructure redundancy and fault tolerance, application resilience and graceful degradation strategies, and chaos engineering frameworks for continuous resilience validation • Lead enterprise technical recovery strategy development and implementation, including backup and redundancy systems, recovery time/point objectives (RTO/RPO) for technical systems, and data recovery/restoration procedures • Partner to define and implement resilience standards, including feature flagging, release, testing, multi-tenancy frameworks, and scalability frameworks to manage growth • Provide technical oversight and aggregation of technology resilience risks across the enterprise, establishing and monitoring key performance indicators including system uptime • Drive chaos engineering and resilience testing programs, establishing enterprise-wide practices for proactive resilience validation and continuous improvement • Own shared resilience tooling strategy, evaluation, and implementation to support enterprise-wide capabilities including monitoring, testing, and recovery automation • Build and maintain formal networks with key constituents across business units, engineering teams, and external partners • Serve as senior technical advisor during major incident response, providing expertise on technical recovery strategies and coordinating cross-functional recovery efforts • Drive innovation in resilience practices, identifying emerging technologies and methodologies to advance CrowdStrike's competitive resilience advantage • Provide strategic guidance and expertise to junior team members and cross-functional partners on resilience engineering best practices Requirements: • 10+ years of direct experience in technology resilience, disaster recovery, site reliability engineering, or related technical disciplines, with demonstrated expertise in enterprise-scale cloud-native environments • Deep understanding of infrastructure redundancy patterns, application resilience design, chaos engineering principles, and enterprise disaster recovery strategies across hybrid cloud architectures • Proven experience with feature management systems, progressive deployment strategies, multi-tenant architecture resilience, and scalability engineering practices • Proven ability to drive strategic initiatives across large technology organizations, with experience influencing senior stakeholders and leading complex, cross-functional resilience programs • Experience establishing and monitoring resilience KPIs, including system uptime, MTTR, RTO/RPO objectives, and deployment success metrics • Advanced certifications in disaster recovery, cloud architecture, or site reliability disciplines (e.g., DRCS, CISSP, AWS/Azure/GCP architecture certifications) • Exceptional written and oral communication skills, including experience developing and delivering strategic briefings to executive leadership and technical teams • Advanced analytical and conceptual thinking abilities, with proven track record of solving complex, ambiguous resilience challenges with enterprise-wide impact • Demonstrated ability to build formal networks and influence stakeholders across engineering, product, and business organizations • Bachelor's degree in Computer Science, Information Systems, Engineering, Risk/Resilience, or equivalent practical experience Benefits: • Remote-friendly and flexible work culture • Market leader in compensation and equity awards • Comprehensive physical and mental wellness programs • Competitive vacation and holidays for recharge • Paid parental and adoption leaves • Professional development opportunities for all employees regardless of level or role • Employee Networks, geographic neighborhood groups, and volunteer opportunities to build connections • Vibrant office culture with world class amenities • Great Place to Work Certified™ across the globe

Apply Now

This job posting was last updated on 10/18/2025

JobLogr gets you hired faster

Save $15k

in lost income

Get back 54 hrs + hired 3.5x faster

than average job search

Try for Free

No credit card required

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »