Sierra Business Solution LLC

via Dice

Apply Now

All our jobs are verified from trusted employers and sources. We connect to legitimate platforms only.

DevOps Engineer LLM & GPU Inference Services Remote Location

Anywhere

contractor

Posted 10/17/2025

Verified Source

Key Skills:

Cloud services

Distributed systems

LLM inference

Containerization (Kubernetes, Docker)

Infrastructure as code

CI/CD pipelines

APIs

Authentication and authorization

Logging and monitoring

Compensation

Salary Range

$120K - 180K a year

Responsibilities

Build and maintain distributed cloud services for LLM inference including deployment, monitoring, and optimization.

Requirements

Experience with cloud distributed systems, LLM hosting and inference, benchmarking tools, GPU software stacks, and distributed inference optimization.

Full Description

Job Description: >> Looking for devs with general cloud services distributed services experience, with LLM experience as a secondary skill. GPU experience is now low on the list of preferred skills: Dedicated Inference Service Requirements: >> Deep experience building services in modern cloud environments on distributed systems (i.e., containerization (Kubernetes, Docker), infrastructure as code, CICD pipelines, APIs, authentication and authorization, data storage, deployment, logging, monitoring, alerting, etc.) >> Experience working with Large Language Models (LLMs), particularly hosting them to run inference >> Strong verbal and written communication skills. Your job will involve communicating with local and remote colleagues about technical subjects and writing detailed documentation. >> Experience with building or using benchmarking tools for evaluating LLM inference for various models, engine, and GPU combinations. >> Familiarity with various LLM performance metrics such as prefill throughput, decode throughput, TPOT, and TTFT >> Experience with one or more inference engines: e.g., vLLM, SGLang, and Modular Max >> Familiarity with one or more distributed inference serving frameworks: e.g., llm-d, NVIDIA Dynamo, and Ray Serve etc. >> Experience with AMD and NVIDIA GPUs, using software like CUDA, ROCm, AITER, NCCL, RCCL, etc. >> Knowledge of distributed inference optimization techniques - tensordata parallelism, KV cache optimizations, smart routing etc.

Apply Now

This job posting was last updated on 10/18/2025

JobLogr gets you hired faster

Save $15k

in lost income

Get back 54 hrs + hired 3.5x faster

than average job search

Try for Free

No credit card required

Ready to have AI work for you in your job search?

Sign-up for free and start using JobLogr today!

Get Started »