x
Get our mobile app
Fast & easy access to Jobstore
Use App
Congratulations!
You just received a job recommendation!
check it out now
Browse Jobs
Companies
Campus Hiring
Download App
Jobs in Singapore   »   Jobs in Singapore   »   Information Technology Job   »   HPC System Engineer (System), NSCC
 banner picture 1  banner picture 2  banner picture 3

HPC System Engineer (System), NSCC

A*star Research Entities

Job Summary:

The HPC System Engineer will be responsible for managing, monitoring and optimizing the operational of supercomputing system. This role involves collaborating with various research and technical teams to optimize HPC resources utilization. Successful candidate with demonstrated experience in the HPC field may be considered for a Senior position.

Roles and Responsibilities:

System administration and optimization

  • Work with Managed Services teams in managing and administering HPC systems, including servers, storage, and internal network components.
  • Ensure the reliability and availability of HPC infrastructure.
  • Provide support on technical queries and troubleshooting HPC-related problems.
  • Implement best practices for system monitoring and reporting.
  • Develop utility tools to support monitoring, tuning, and troubleshooting activities.
  • Document incident details, resolution, and lessons learned to enhance future problem-solving.
  • Implement security measures and monitoring to protect HPC systems.
  • Conduct regular security check and assessments within HPC system infrastructure.
  • Monitor system performance and optimize the performance through tuning and troubleshooting.

Resource and workload management

  • Monitor HPC resource utilization.
  • Develop and evaluate policies for allocating HPC resources.
  • Optimize job scheduling to maximize resource utilization.

Designing and planning

  • Assess future computational requirements and plan for system expansion.
  • Assist in the designing of future HPC system acquisition.
  • Study and evaluate emerging technologies and trends, including but not limited to:
  • processor and accelerators
  • interconnect technology
  • storage solutions
  • programming models

Qualifications:

  • Degree in a Computer Science, Engineering, IT or other relevant areas.
  • At least 3 years of experience in managing HPC systems.
  • Highly proficient in UNIX/Linux environments and command line interface (CLI).
  • Experience with cluster management software (xCAT, BCM, PHPC, HPCM).
  • Experience with job scheduling and workload management software (Slurm or PBS Pro)
  • Strong knowledge of HPC storage principles and experience in managing parallel file system (Lustre, GPFS, BeeGFS).
  • Strong knowledge of RDMA-based interconnect (InfiniBand, RoCE).
  • Understanding of basic network protocols like DHCP, DNS, TFTP, SMTP, etc.
  • Good knowledge of scripting languages like Python, Bash or Perl.
  • Demonstrate ability to analyse complex issues and develop effective solutions.
  • To be considered for Senior position, candidates should have at least 5 years of experience in roles that involve the deployment of HPC systems, covering key areas such as designing, installing, configuring, documentation and providing admin/user training.

Sharing is Caring

Know others who would be interested in this job?