2 open positions available
Ensure high availability and performance of core services, develop deployment tools, and optimize infrastructure. | Requires expertise in cloud providers, Kubernetes, IaC, networking, CI/CD, and programming languages like Python or Golang. | Senior Site Reliability Engineer About the Company Clarifai is a leading, compute orchestration AI platform specializing in computer vision and generative AI. We empower organizations to transform unstructured image, video, text, and audio data into actionable insights, significantly faster and more accurately than manual processes. Founded in 2013 by Matt Zeiler, Ph.D., Clarifai has been at the forefront of AI innovation since achieving the top five placements in the 2013 ImageNet Challenge. Our diverse, globally distributed team operates across the United States, Canada, Estonia, Argentina, and India. We have secured $100M in funding, including a $60M Series C round, backed by industry leaders such as Menlo Ventures, Union Square Ventures, Lux Capital, NEA, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm, and Osage. Clarifai is proud to be an equal-opportunity workplace committed to building and maintaining a diverse and inclusive team. Your Impact Clarifai’s platform is a kubernetes-native distributed system that requires the orchestration of many components. Efficiently serving and training large neural networks presents unique design and infrastructure challenges. You will be critical to solving these challenges both in the context of the cloud and in on premise environments. Additionally, you will be responsible for our broader cloud infrastructure and development tools and environments. The Opportunity Ensure the smooth operation and high availability of Clarifai's core services Monitor system performance, identify bottlenecks, and implement optimizations to enhance reliability and efficiency Develop Kubernetes resources and custom tooling for seamless cloud and on-premise deployments Design and implement scalable, secure, and cost-effective infrastructure solutions. Partner with teams across the organization to identify & solve engineering challenges Requirements BS/BA in Computer Science or related degree Good knowledge of cloud providers (AWS, GCP or similar) Expertise with Kubernetes (EKS, GKE, self-hosted) and Infrastructure as Code using Terraform, Helm Solid understanding of web and networking (HTTP, TLS, DNS, Certificates, etc) Experience with CI/CD pipelines using tools such as GitHub Actions, ArgoCD, and Atlantis Strong interpersonal skills working with teams across different time zones and regions Great to Have Knowledge of basic Microservice Architecture principles Familiarity with security best practices for cloud-based systems. Experience with relational databases, message queues, key value stores Experience writing python, golang, or any other popular programming language Familiarity with any RPC framework Experience developing & building custom Kubernetes operators
Developing full-stack solutions, leading API and backend development, and implementing scalable data architectures. | Extensive experience with full-stack development, cloud services, and API integration; no specific ML infrastructure experience or machine learning frameworks mentioned. | About the Company Clarifai is a leading, full-lifecycle deep learning AI platform for computer vision, natural language processing, LLM's and audio recognition. We help organizations transform unstructured images, video, text, and audio data into structured data at a significantly faster and more accurate rate than humans would be able to do on their own. Founded in 2013 by Matt Zeiler, Ph.D. Clarifai has been a market leader in AI since winning the top five places in image classification at the 2013 ImageNet Challenge. Clarifai continues to grow with employees remotely based throughout the United States, Canada, Argentina, India and Estonia. We have raised $100M in funding to date, with $60M coming from our most recent Series C, and are backed by industry leaders like Menlo Ventures, Union Square Ventures, Lux Capital, New Enterprise Associates, LDV Capital, Corazon Capital, Google Ventures, NVIDIA, Qualcomm and Osage. Clarifai is proud to be an equal opportunity workplace dedicated to pursuing, hiring, and retaining a diverse workforce. Your Impact You will be essential in contributing to our core ML infrastructure, helping researchers and users to train and serve state of the art models to our customers! The Opportunity Work with research teams to design and build our training infrastructure Prototype new training frameworks and production-ize solutions at scale Design, optimize and test model integration infrastructure Requirements Experience owning large technical initiatives and small teams Experience designing and architecting distributed microservice systems Experience with a cloud platform (AWS, GCP, Azure) Hands-on experience implementing production machine learning systems at scale Familiarity with setting up ML lifecycle systems Experience developing machine learning infrastructure Comfortable working with open source software Great to Have Prior experience with with Tensorflow, PyTorch, Onnx, Nvidia Triton, kubeflow is a plus Experience with multiple cloud platforms (AWS, GCP, Azure) Experience with Golang, Relational DBs Experience with realtime and async processing of data The salary hiring range for this position is $150,000 - $240,000 and flexible depending on relevant experience.
Create tailored applications specifically for Clarifai with our AI-powered resume builder
Get Started for Free