via Eightfold
$0K - 0K a year
Collaborate on advanced machine learning research projects, present findings, and contribute to research community activities during a 12-week internship.
Must be enrolled in a master's or PhD program in a STEM field with at least 3 years programming experience in Python or C++, and have deep knowledge of ML frameworks, transformer models, and GPU programming.
Research Interns put inquiry and theory into practice. Alongside fellow doctoral candidates and some of the world's best researchers, Research Interns learn, collaborate, and network for life. Research Interns not only advance their own careers, but they also contribute to exciting research and development strides. During the 12-week internship, Research Interns are paired with mentors and expected to collaborate with other Research Interns and researchers, present findings, and contribute to the vibrant life of the community. Research internships are available in all areas of research, and are offered year-round, though they typically begin in the summer. Currently enrolled in a master's, or PhD program in Computer Science, Electrical Engineering, or a related STEM field. Completed at least 2 academic courses or projects involving machine learning systems. At least 3 years of experience programming in Python, C++, or a similar systems-oriented language through work, projects, or research. In addition to the qualifications below, you'll need to submit a minimum of two reference letters for this position as well as a cover letter and any relevant work or research samples. After you submit your application, a request for letters may be sent to your list of references on your behalf. Note that reference letters cannot be requested until after you have submitted your application, and furthermore, that they might not be automatically requested for all candidates. You may wish to alert your letter writers in advance, so they will be ready to submit your letter. Demonstrable Contribution to open-source ML framework or ML systems software. Deep and strong understanding of transformer-based model architectures, including attention mechanisms, KV cache behavior, and common training and inference bottlenecks. Experience with modern ML frameworks and runtimes such as PyTorch, Hugging Face Transformers, SGLang, vLLM, or TensorRT-LLM. Experience with GPU or accelerator programming using CUDA, Triton, or similar tools, and familiarity with profiling and performance analysis. Familiarity with benchmarking and performance profiling tools for ML workloads. Working knowledge of low-precision numeric, quantization methods, or hardware-software co-design considerations for large-scale model efficiency is a plus. Coursework, research, or project experience in areas such as ML systems, model optimization, kernel development, or numerical computing. Proficient analytical and problem-solving skills, with an interest in ML systems and computational performance.
This job posting was last updated on 12/3/2025