$Not specified
The Site Reliability Engineer will design, implement, and manage Kubernetes environments, ensuring scalable and reliable infrastructure. They will also develop monitoring solutions, conduct incident response, and collaborate with development teams to enhance application reliability.
Candidates should have 5-7 years of experience in SRE or DevOps roles with strong expertise in Kubernetes and Linux/Unix systems. Proficiency in programming and database administration is also required.
Role: Site Reliability Engineer (Ex - Fidelity Exp) Location: Remote Position Type: Contract Key Responsibilities • Design, implement, and manage Kubernetes environments from deployment to configuration, monitoring, and troubleshooting • Build and maintain scalable and reliable infrastructure using infrastructure as code principles • Develop comprehensive monitoring solutions and implement alerting strategies • Analyze system performance bottlenecks and implement improvements • Implement and maintain CI/CD pipelines for seamless deployments • Conduct incident response, root cause analysis, and implement preventative measures • Create and enhance automation tools leveraging AI/ML where applicable • Collaborate with development teams to improve application reliability and performance Required Qualifications • 5-7 years of experience in SRE or DevOps roles • Strong expertise with Kubernetes ecosystem and container orchestration • Deep understanding of Linux/Unix operating systems and performance analysis tools (NMON, etc.) • Experience with log analysis, monitoring systems, and observability tools • Proficiency in database administration and performance tuning (Oracle, SQL Server) • Strong programming skills in at least one of: Python, Go, Java, or Node.js • Experience developing automation tools and frameworks • Proven track record of proactive problem identification and resolution Preferred Qualifications • Experience with AI/ML integration into operational workflows • Cloud platform experience (AWS, GCP, Azure) • Knowledge of service mesh technologies • Experience with distributed systems architecture • Familiarity with security best practices and compliance requirements Personal Qualities • Proactive mindset with strong analytical and problem-solving abilities • Collaborative approach to working across development and operations teams • Excellent communication skills and ability to explain complex technical concepts • Self-motivated with the ability to work independently and as part of a team • Passion for continuous improvement and learning
This job posting was last updated on 8/5/2025