via Remote Rocketship
$120K - 200K a year
Develop and improve large-scale observability data collection and analysis systems to enhance operational insights.
8+ years of experience in distributed systems, proficiency with open-source observability tools, Python, and collaboration with data teams.
Job Description: • Collaborate with HW, and SW engineering teams to deliver observability solutions that meet their needs in EDA clusters. • Develop, test, and deploy data collectors, pipelines, visualization and retrieval services. • Define data collection and retention policies to balance network bandwidth, system load, and storage capacity costs with data analysis requirements. • Work in a diverse team to provide operational and strategic data to empower our engineers and researchers to improve performance, productivity, and efficiency. • Continuously improve quality, workloads, and processes through better observability. Requirements: • Experience developing large scale, distributed observability systems. • Ability to collaborate with data scientists, researchers, and engineering teams to identify high value data for collection and analysis. • Experience with turning raw data into actionable reports. • Experience with observability platforms such as Apache Spark, Elastic/Open Search, Grafana, Prometheus, and other similar open-source tools. • Python programming experience and use of API calls. • Passion for improving the productivity of others. • Excellent planning and interpersonal skills. • Flexibility/adaptability working in a dynamic environment with changing requirements. • MS (preferred) or BS in Computer Science, Electrical Engineering, or related field or equivalent experience. • 8+ years of proven experience. Benefits: • Equity • Benefits
This job posting was last updated on 12/15/2025