via Remote Rocketship
$NaNK - NaNK a year
Lead HPC system integration, develop and maintain scheduling software, support HPC environments, and collaborate with vendors and users.
Requires 10+ years of HPC systems development, Linux/UNIX expertise, proficiency in C/C++/Fortran, scripting skills, and experience with HPC applications and system configuration tools.
Job Description: • Oversee and directly contribute to significant ongoing HPC integrations to the environment • Write and shepherd scalable feature designs through the entire software development process, from requirements and use cases to release • Design and develop enhancements to the PBSPro batch scheduler based on customer-driven requirements • Work extensively with PBS vendor, Altair, on bug fixes and feature releases • Apply best practices in software engineering, delivering projects on time, on budget, and with excellent quality • Provide support to staff and end users to resolve batch scheduler issues • Modify existing software to correct errors and/or improve performance • Mentoring junior staff and cross training peers • After hours/weekend support as required • Moderate and contribute to Supercomputing System Administration that contributes to: • Day-to-day operations of the Linux HPC clusters and storage systems • Proactive monitoring, analyze, and correct system issues • Development of scripts to automate repetitive tasks or tools to enhance support of the HPC systems • System performance analysis and tuning • Building, installing, and supporting user-requested software • Supporting evaluation and assessment of new HPC technology • Resolving user report issues and manage support tickets requests in Remedy • Creating, maintaining, and validating documentation pertaining to the environment • Assisting with the management of low-latency hypercube and fat-tree based networks • Assisting with the administration of multiple filesystem environments, including parallel filesystems, traditional NFS, tape based hierarchical storage, and vendor specific solutions. Requirements: • Bachelors of Science degree in Computer Science or related field • Strong computer science background with in-depth systems-level knowledge in operating systems and networking • Solid understanding of the software development process, including requirements, use cases, design, coding, documentation and testing of scalable, distributed applications in a Linux environment • A minimum of 10 years experience with integration development of HPC systems and scheduling software (PBS, Slurm, or Moab/Torque) • A minimum of 10 years of experience developing system software in heterogeneous, multi-platform HPC environments • Strong ability to analyze, debug and maintain the integrity of an existing code base • Demonstrated equivalence of 10 years of Linux/UNIX user support experience and hands-on experience with administration of Linux systems • Experience working with HPC applications and proficiency in at least C, C++, or Fortran • Superior scripting skills and excellent attention to detail; proficiency in at least Python, Perl, or Bash • Strong ability to interact with customers to understand needs, elicit requirements, and get feedback on prototype solutions • Excellent communication and people skills; excellent time management and organizational skills • Experience with system configuration management tools e.g. puppet, chef, ansible • Experience with revision control software e.g. CVS, SVN, Git • Track record of delivering commercial quality software on schedule with excellent quality through multiple release cycles • Proficiency at technical writing. Benefits: • U.S. citizenship and ability to obtain a Public Trust security clearance are mandatory • Travel to customer site required 2-4 times a year.
This job posting was last updated on 12/17/2025