Find your dream job faster with JobLogr
AI-powered job search, resume help, and more.
Try for Free
SB

Sierra Business Solution LLC

via Dice

Apply Now
All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

DevOps Engineer LLM & GPU Inference Services Remote Location

Anywhere
contractor
Posted 10/17/2025
Verified Source
Key Skills:
Cloud services
Distributed systems
LLM inference
Containerization (Kubernetes, Docker)
Infrastructure as code
CI/CD pipelines
APIs
Authentication and authorization
Logging and monitoring

Compensation

Salary Range

$120K - 180K a year

Responsibilities

Build and maintain distributed cloud services for LLM inference including deployment, monitoring, and optimization.

Requirements

Experience with cloud distributed systems, LLM hosting and inference, benchmarking tools, GPU software stacks, and distributed inference optimization.

Full Description

Job Description: >> Looking for devs with general cloud services distributed services experience, with LLM experience as a secondary skill. GPU experience is now low on the list of preferred skills: Dedicated Inference Service Requirements: >> Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CICD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.) >> Experience working with Large Language Models (LLMs), particularly hosting them to run inference >> Strong verbal and written communication skills. Your job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation. >> Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations. >> Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT >> Experience with one or more inference engines: e.g., vLLM, SGLang, and Modular Max >> Familiarity with one or more distributed inference serving frameworks: e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc. >> Experience with AMD and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, RCCL, etc. >> Knowledge of distributed inference optimization techniques - tensordata parallelism, KV cache optimizations, smart routing etc.

This job posting was last updated on 10/18/2025

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »
JobLogr badgeTinyLaunch BadgeJobLogr - AI Job Search Tools to Land Your Next Job Faster than Ever | Product Hunt