via LinkedIn
$120K - 160K a year
Manage and support HPC systems including installation, configuration, troubleshooting, and maintenance of hardware and software in a large enterprise environment.
Requires 5+ years HPC system engineering experience, proficiency with Linux OS, scripting, cloud platforms, and relevant certifications or equivalent experience.
Job Title: Solutions Integration Engineer (HPC System Engineer) Job Type: 12 months contract (Temp to perm) Job Location: Houston, TX 77042 Specific Work Requirements: • A minimum of 5 years’ experience working in a large HPC enterprise environment comprising thousands of servers, large storage solutions, tape and tape automation. • Proficient in the installation, configuration and management of Linux based operating systems, preferably using RHEL, CentOS, Rocky Linux. • Experience with IBM’s xCAT distributed computing management software. • Experience with installation and maintenance of computer hardware including servers, tape drives, robotic tape libraries, GPGPU, SSD, disk arrays. • Experience with containerization. • Knowledge of networking and datacenter technologies, switching, routing, high-availability, LAN / WAN / WLAN topologies and system configuration for Ethernet, InfiniBand, and Fiber Channel SAN. • Experience with HPC Storage Solutions, for example configuration and operation of HPE ClusterStor systems, NetApp, Dell Isilon, and Pure Storage. • Ability to write and troubleshoot Bourne, Bash and C Shell, Perl, Python, Ruby and MRTG scripts. • Experience with PostgreSQL and database installation and support. • Experience with Google Cloud Platform and Azure public clouds. Able to provision and manage instances, build images, write installation scripts. • Experience with configuration tools like Ansible and Terraform. • Experience with backup and recovery tools, IBM Spectrum, Dell Networker. • Good knowledge of Linux security, including configuration of endpoint security tools. • Ability to evaluate HPC system environments and make recommendations for improvement in performance and manageability. • Ability to investigate, debug and diagnose system level issues. General Work Requirements • Conform to local change management philosophies, including full testing on non-production systems, prior to production deployment. • Effectively communicate all change activities to all affected parties including a clear description of the change, related service outages and possible effects on the different environments we support. • Ensure IT deployment standards are maintained, with verification through reporting systems. • Meet KPO requirements for InTouch support processing, including full documentation of problem resolution, creation of knowledge content and best practice items. • Show a good understanding of computer equipment, and its care and maintenance. • Work with other internal support groups, systems, networking, programming, desktop support, computer operations, and facilities as required to complete administration functions. • Work with a variety of vendors in technical environments and in the reporting and investigation of system problems. • Provide a written weekly status report to the team manager and be prepared to present and discuss this with the team at a weekly status meeting. • Prepared to work outside of normal hours as system maintenance often must be performed outside of prime time; provide 24/7 support to computer operations; work with other remote support locations, for example Kuala Lumpur, backing follow the sun support. • Participate in support on-call schedule and in weekend power outages, normally two per year and in emergency data center activities. • Peer-review all major projects, as part of the normal deployment philosophy. • Ensure compliance with all quality assurance, best practice procedures and QHSE requirements, as defined by job position.. Qualifications • Bachelor's degree from a four-year college in computer science studies or 5-10 years equivalent work experience, and current industry recognized training and certification, for example from Cisco, RedHat or Microsoft.
This job posting was last updated on 12/9/2025