Find your dream job faster with JobLogr
AI-powered job search, resume help, and more.
Try for Free
DE

DeepRec.ai

via Indeed

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

AI Evaluation Engineer

Anywhere
Full-time
Posted 11/24/2025
Verified Source
Key Skills:
TypeScript
OpenAI API
LLM ecosystems
Prompt engineering
Evaluation frameworks
Data pipelines
Statistical analysis
Monitoring and observability

Compensation

Salary Range

$180K - 180K a year

Responsibilities

Design, build, and own AI evaluation systems to ensure safe, reliable, and scalable deployment of AI-powered features.

Requirements

Strong software engineering skills with TypeScript, deep experience with OpenAI or similar LLMs, practical knowledge of prompting and evaluation techniques, and familiarity with statistical analysis.

Full Description

Location Remote work, United States Job Type Permanent Salary $180000 per annum AI Evaluation Engineer $180,000 Remote (US-based) Are you passionate about shaping how AI is deployed safely, reliably, and at scale? This is a rare opportunity to join a mission-driven tech company as their first AI Evaluation Engineer, a foundational role where you’ll design, build, and own the evaluation systems that safeguard every AI-powered feature before it reaches the real world. This organization builds AI-enabled products that directly helps governments, nonprofits, and agencies deliver financial support to people who need it most. As AI capabilities race forward, ensuring these systems are safe, accurate, and resilient is critical. That’s where you come in. You won’t just be testing models, you’ll be creating the frameworks, pipelines, and guardrails that make advanced LLM features safe to ship. You’ll collaborate with engineers, PMs, and AI safety experts to stress test boundaries, uncover weaknesses, and design scalable evaluation systems that protect end users while enabling rapid innovation. What You’ll Do • Own the evaluation stack – design frameworks that define “good,” “risky,” and “catastrophic” outputs. • Automate at scale – build data pipelines, LLM judges, and integrate with CI to block unsafe releases. • Stress testing – red team AI systems with challenge prompts to expose brittleness, bias, or jailbreaks. • Track and monitor – establish model/prompt versioning, build observability, and create incident response playbooks. • Empower others – deliver tooling, APIs, and dashboards that put eval into every engineer’s workflow. Requirements: • Strong software engineering background (TypeScript a plus) • Deep experience with OpenAI API or similar LLM ecosystems • Practical knowledge of prompting, function calling, and eval techniques (e.g. LLM grading, moderation APIs) • Familiarity with statistical analysis and validating data quality/performance • Bonus: experience with observability, monitoring, or data science tooling

This job posting was last updated on 11/26/2025

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »
JobLogr badgeTinyLaunch BadgeJobLogr - AI Job Search Tools to Land Your Next Job Faster than Ever | Product Hunt