Intellibus

via LinkedIn

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

Senior Site Reliability Engineer – Performance & Observability

Anywhere

Contract

Posted 8/26/2025

Verified Source

Key Skills:

Java

UNIX/Linux

Kafka

Terraform

Jenkins

Docker

AWS (ECS, S3, Lambda, VPC)

BlazeMeter/JMeter

Python

CI/CD

Infrastructure as Code (IaC)

Compensation

Salary Range

$130K - 180K a year

Responsibilities

Lead platform performance testing, deployment automation, observability, and reliability engineering for financial trading systems.

Requirements

10+ years SRE experience with Java and UNIX/Linux, expertise in performance testing tools, AWS infrastructure, Terraform, Jenkins, Docker, Python scripting, and finance domain knowledge.

Full Description

Imagine working at Intellibus to engineer platforms that impact billions of lives around the world. With your passion and focus we will accomplish great things together! Our Platform Engineering Team is working to solve the Multiplicity Problem. We are trusted by some of the most reputable and established FinTech Firms. Recently, our team has spearheaded the Conversion & Go Live of apps which support the backbone of the Financial Trading Industry. We are looking forward to you joining our Platform Engineering Team as an SRE Architect/Engineer who specializes in performance testing, system reliability, and platform optimization. This role is focused on configuration, deployment, automation, networking, monitoring, logging, and environment management. We are looking for Architects who can do the below but not limited to: • Conduct a comprehensive review of a mission-critical platform across Incidents, Architecture, Code, Testing, Governance, Network, Monitoring/Alerting, and Data Layer. • Identify quick wins for stability and resiliency (e.g., single points of failure, required automation, operational gaps). • Define a pragmatic remediation plan with clear priorities, owners, and success metrics (SLIs/SLOs). • Establish a delivery cadence (standups, checkpoints, and readouts) to drive remediation through to production. • Demonstrate effectiveness via measurable outcomes (reduced MTTR, error budgets honored, latency/throughput targets met). Core SRE Responsibilities • Lead performance testing (load, stress, soak) using BlazeMeter/JMeter; analyze results and tune platforms (caching, thread/connection pools, GC, autoscaling, query/index tuning). • Own deployment & configuration automation for highly available systems; manage environment versioning and drift control. • Build/operate observability (Datadog, Splunk/ELK, CloudWatch/New Relic): dashboards, alerts, traces, logs, and SLO/error-budget policy. • Architect secure, scalable infrastructure on AWS (ECS, S3, Lambda, VPC) with IaC (Terraform); containerize/run services with Docker. • Optimize/maintain CI/CD in Jenkins (gates for quality, security, and performance); integrate automated tests into delivery pipelines. • Run SRE programs: on-call, incident response, post-mortems, and continuous improvement; partner with teams on Kafka/microservices best practices. Key Skills & Qualifications: • 10+ years in SRE/Infrastructure Engineering (Java + UNIX/Linux background). • Hands-on BlazeMeter/JMeter expertise and platform performance tuning. • Proficient with Kafka, Terraform, Jenkins, Docker, AWS (ECS, S3, Lambda, VPC), and IaC patterns. • Strong with Datadog / Splunk / ELK / CloudWatch / New Relic; builds usable dashboards and actionable alerts. • Networking fundamentals (VPCs, load balancers, DNS, TLS, peering) and environment management at scale. • Scripting/automation in Python and Bash. • Must have experience in finance/trading/transactional systems; SQL/Snowflake/Postgres; container internals. • Excellent communicator with prior technical leadership; startup/high-growth experience preferred. • Excellent communicator with proven technical leadership experience. We work closely with • Java • UNIX/Linux • Kafka • Event-Driven Architecture • AWS (ECS, S3, Lambda, VPC) • Terraform • Jenkins • Docker • CI/CD • Infrastructure as Code (IaC) Our Process • Schedule a 15 min Video Call with someone from our Team • 4 Proctored GQ Tests (< 90 Minutes) • 30-45 min Final Video Interview • Receive Job Offer If you are interested in reaching out to us, please apply and our team will contact you within the hour.

This job posting was last updated on 8/28/2025

JobLogr gets you hired faster

Save $15k

in lost income

Get back 54 hrs + hired 3.5x faster

than average job search

Try for Free

No credit card required

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »