via Glassdoor
$Not specified
Lead patching and endpoint management teams using automation and RMM tools to maintain security and compliance.
Experienced systems engineer with strong automation and endpoint management skills but no demonstrated AI/ML infrastructure or software engineering experience.
Remote At CloudGeometry, we're redefining how modern data and AI systems are built. As a leading cloud-native engineering firm, we work with pioneering technology companies to deliver high-impact solutions across infrastructure, machine learning, and intelligent applications. We are looking for a highly skilled AI Infrastructure Engineers x 5 people to join our growing team supporting large-scale AI/ML systems. This is a hands-on engineering role focused on building scalable, secure, and production-ready infrastructure that powers ML workflows end-to-end—from experimentation to deployment and monitoring. What You’ll Do • Design, implement, and maintain robust infrastructure for ML workflows across real-time and batch environments. • Build and support production-grade model lifecycle systems, including registration, versioning, and deployment workflows. • Develop APIs and backend services in TypeScript and Python to support model integration and orchestration. • Manage and optimize infrastructure using AWS and infrastructure-as-code (CDK preferred). • Work with Databricks MLFlow for end-to-end model management, including asset bundling and serving pipelines. • Collaborate with cross-functional teams including ML scientists, backend engineers, and DevOps to deliver high-impact features. • Monitor and improve infrastructure reliability, security, and performance across diverse deployment targets. • Contribute to CI/CD workflows, container orchestration (Docker, ECS), and automation for ML pipelines. Why Join CloudGeometry? You’ll work alongside top-tier engineers across the US, LATAM, and Europe on cutting-edge projects in AI, cloud, and enterprise SaaS. We value deep technical curiosity, strong collaboration, and a bias for action in solving meaningful problems. • Seniority Level Mid-Senior level • Industry • Software Development • Employment Type Full-time • Job Functions • Engineering • Information Technology • Skills • Large Language Models (LLM) • Software as a Service (SaaS) • Databricks Products • Python (Programming Language) • Infrastructure • TypeScript • MLflow • MLOps • Amazon Web Services (AWS) Requirements What We’re Looking For • 7+ years in software or infrastructure engineering with proven experience supporting AI/ML systems. • Deep hands-on experience with AWS services and modern IaC practices (Terraform/CDK). • Strong backend programming skills in TypeScript and Python. • Production-level use of MLFlow for model management and deployment. • Expertise in containerization (Docker), CI/CD automation, and orchestration tools. • Solid understanding of designing scalable and secure systems in cloud-native environments. • Strong communication skills, able to bridge gaps between engineering and product stakeholders. • Comfortable in fast-paced, collaborative environments working across time zones. Nice to Have • Exposure to LLM infrastructure and frameworks (e.g., DSPy, LangChain). • Knowledge of LLM performance metrics: latency, cost monitoring, and usage optimization. • Familiarity with semantic search tools and vector stores (e.g., OpenSearch, Pinecone). Benefits • Remote anywhere • Coworking space financial coverage • Flexible working hours • B2B with multiple benefits • Paid days off annually • Workspace program: 2500$ for work equipment of your choice. • Paid courses and certifications: example AWS, CKA, ML certifications • Participation at international conferences: like CNCF Summits, Kubecon, others Submit Resume Send us an application to jobs@cloudgeometry.com, and we’ll contact you in shortly.
This job posting was last updated on 3/3/2026