$120K - 180K a year
Design and build scalable alerting systems with query-based rules, intelligent routing, and frontend interfaces for monitoring and incident detection.
Strong TypeScript/Node.js skills, experience with query languages like PromQL and SQL, alerting systems, time-series databases, React frontend, observability tools, and distributed systems knowledge.
Description: • Design and build sophisticated alerting systems that enable proactive monitoring and incident detection across distributed systems • Develop query-based alert rules and expressions using PromQL, SQL, and other query languages to surface meaningful insights • Create intelligent alert routing, deduplication, and correlation mechanisms to reduce noise and improve signal quality • Build scalable backend services for alert evaluation, notification delivery, and alert management workflows • Optimize time-series data storage and query performance for high-volume metrics and telemetry data • Develop intuitive interfaces for alert configuration, visualization, and management using React and modern frontend technologies • Collaborate with cross-functional teams to understand monitoring requirements and deliver comprehensive alerting solutions • Mentor and guide engineers on best practices for observability and alerting architecture Requirements: • Strong proficiency in TypeScript/Node.js with a proven track record of building production-grade services • Experience with query languages for metrics and monitoring (PromQL, SQL, or similar) and ability to write complex queries for data analysis • Hands-on experience building or maintaining alerting systems, including rule evaluation engines and notification pipelines • Experience with time-series databases and columnar storage systems (ClickHouse experience is a plus) • Frontend development skills with React and modern JavaScript frameworks for building data visualization and management interfaces • Strong understanding of distributed systems, data structures, and algorithms • Experience with observability concepts including metrics, logs, traces, and their correlation • Ability to work independently with minimal supervision and a track record of learning quickly • Dedication to writing clean, maintainable, and well-tested code • Prometheus ecosystem, including AlertManager • Background in building rule engines or expression evaluation systems • Experience with notification systems and integrations (PagerDuty, Slack, webhooks, etc.) • Familiarity with observability tools like Grafana, ELK stack, or similar solutions • Experience with CI/CD pipelines such as BitBucket, Jenkins, CircleCI, etc. • Understanding of alert fatigue mitigation strategies and intelligent alerting patterns • Experience with high cardinality data and performance optimization • Willingness to speak your mind and share ideas • Appreciation for humor and a love for goats • Comfort working remotely Benefits: • health, dental, vision insurance • short-term disability and life insurance • paid holidays and paid time off • fertility treatment benefit • 401(k) • equity • eligibility for a discretionary company-wide bonus
This job posting was last updated on 10/11/2025