via Remote Rocketship
$100K - 140K a year
Design and build scalable AI-powered data pipelines and services integrating large datasets, embeddings, and semantic search.
5+ years software engineering with backend focus in Python, AWS, data warehousing, LLM APIs, and startup production experience.
Job Description: • Design and build data-driven tools that operate on large datasets stored in S3 and Snowflake • Implement pipelines that: • Extract specific columns or datasets from Snowflake • Generate vector embeddings via APIs such as OpenAI • Store and manage embeddings in vector databases like Pinecone • Enable semantic search and similarity-based retrieval • Develop enrichment workflows that: • Query structured data • Use LLM APIs to generate new derived columns • Write enriched results back into Snowflake • Build reusable internal services and SDKs around embedding generation, prompt orchestration, and data augmentation • Optimize performance and cost across AWS infrastructure • Work closely with product and data teams to turn use cases into scalable engineering solutions • Ensure reliability, observability, and maintainability of AI-powered pipelines Requirements: • 5+ years of software engineering experience • Strong backend engineering skills (Python preferred; other modern languages acceptable) • Solid experience with: • AWS (IAM, Lambda, ECS/EKS, S3, networking, security best practices) • Data warehousing (Snowflake preferred) • API design and distributed systems • Hands-on experience working with LLM APIs (e.g., OpenAI) and embedding workflows • Experience with vector databases (Pinecone or similar) • Strong understanding of data modeling, ETL/ELT patterns, and performance optimization • Production experience in at least one startup environment • Ability to operate independently and ship high-impact systems end-to-end Benefits: • Work on practical, production-grade AI systems • Direct impact on how data is leveraged across the company • Startup speed with real ownership and autonomy • Opportunity to define the internal AI platform from the ground up
This job posting was last updated on 3/2/2026