x
Get our mobile app
Fast & easy access to Jobstore
Use App
Congratulations!
You just received a job recommendation!
check it out now
Browse Jobs
Companies
Campus Hiring
Download App
Jobs in Singapore   »   Jobs in Singapore   »   Customer Service Job   »   Global Operations Centre HPC Engineer
 banner picture 1  banner picture 2  banner picture 3

Global Operations Centre HPC Engineer

Firmus Metal International Pte. Ltd.

ROLES AND RESPONSIBILITIES


The Global Operations Centre (GOC) HPC Engineer is a technical specialist responsible for the daily operations and maintenance of the company's high-performance computing (HPC) environment. The HPC Engineer will be collaborating closely with senior engineers, monitoring system health, troubleshooting issues (especially those related to NVIDIA H-100, Infiniband and Mellanox), assisting the Global Operations Centre and creating clear documentation to ensure smooth and efficient operations.

  • Assist in the deployment, configuration, and maintenance of HPC hardware and software components.
  • Monitor the health and performance of HPC systems, identifying and resolving issues proactively.
  • Participate in on-call rotation to ensure 24/7 availability and responsiveness to critical issues.
  • Provide technical support to the GOC Support Specialist team in troubleshooting HPC-related problems.
  • Analyze system logs, performance data, and user reports to diagnose and resolve issues.
  • Document incident details, resolutions, and lessons learned to enhance future problem-solving.
  • Create and maintain comprehensive SOPs for common HPC tasks, incident response procedures, and system configurations.
  • Ensure documentation is clear, accurate, and up-to-date, contributing to knowledge sharing within the team.
  • Communicate effectively with the GOC team, IT stakeholders, and end-users to ensure clear understanding of issues and resolutions.
  • Participate in team meetings, project discussions, and knowledge-sharing sessions to foster a collaborative environment.

SKILLS AND EXPERIENCE

  • Bachelor’s degree in computer science, Engineering, or a related field.
  • 8+ years of experience in HPC system administration, Linux/Unix environments, and troubleshooting complex technical problems.
  • Strong understanding of HPC architecture, networking, storage, and job scheduling systems.
  • In-depth knowledge of Infiniband fabric topology and Mellanox hardware capabilities.
  • Proficiency in Linux/Unix operating systems and command-line tools.
  • Experience with scripting languages (e.g., Bash, Python) for automation and problem-solving.
  • Familiarity with HPC software & administration, and tools (e.g., Slurm, Kubernetes etc).
  • Excellent problem-solving and analytical skills.

Sharing is Caring

Know others who would be interested in this job?

Similar Jobs
Customer Experience Team Lead, Singapore
PRISM+
Quick Apply
Customer Service Executive (Manufacturing MNC)
Recruitpedia Pte. Ltd.
Quick Apply
Customer Service Administrator / Global MNC / West
Recruitpedia Pte. Ltd.
Quick Apply
Customer Relationship Coordinator (Korean Speaking)
Afton Chemical Asia Pte. Ltd.
Quick Apply
Client Service Executive (Hospital)
Wecruit Pte. Ltd.
Quick Apply
Patient Service Officer
Wecruit Pte. Ltd.
Quick Apply
Service Technician (Electrical Home Appliances) Basic up to $4000 - R22103957
Staffking Pte. Ltd.
Quick Apply
Healthcare Customer Service Associate - Ref:MH
Jobstudio Pte. Ltd.
Quick Apply
Customer Service CUM Sales Support Executive (Dip/1yrExp/West)
Careernexus Pte. Ltd.
Quick Apply
Service Engineer (Medical Devices) - Up to $3.6K #HKh
Recruit Express Pte Ltd
Quick Apply