Jobs in Singapore » Jobs in Singapore » HPC Systems Manager

HPC Systems Manager

Kla-tencor (singapore) Pte. Ltd.

Job Type   /   Job Level

Full-time   /   Manager

Job Location

Singapore, Singapore, Singapore

Salary Offered

As an System Design Engineering Manager specializing in AI Systems and High-Performance Computing (HPC), you’ll play a pivotal role in shaping the future of AI-driven solutions. Your leadership will drive innovation, optimize performance, and foster collaboration across cross-functional teams. Let’s delve into the details:

Key Responsibilities:

Technical Leadership:

Lead a team of engineers, and system architects to develop the platform for DL training and inference
Define the technical vision, strategy, and roadmap for DL/AI systems within the HPC domain.
AI Infrastructure Design and Optimization.
Collaborate with hardware and software teams to design and optimize AI infrastructure.
Ensure seamless integration of AI workloads with existing HPC clusters.

GPU Cluster Management and Scalability:

Oversee the management of GPU-based clusters.
Scale AI infrastructure to handle large-scale training and inferencing workloads.

Performance Tuning and Benchmarking:

Drive performance improvements by analyzing bottlenecks and optimizing system components.
Benchmark AI models and algorithms on HPC clusters.

Collaboration and Communication:

Work closely with product managers, researchers, and stakeholders to align AI initiatives with business goals.
Communicate technical progress, risks, and opportunities to senior leadership.

Requirements:

Proven track record in managing engineering teams, preferably in AI, HPC, or related fields.
Familiarity with AI frameworks (TensorFlow, PyTorch, etc.) and HPC tools (Slurm, OpenMPI, etc.).
Strong understanding of GPU architectures, CUDA programming, and parallel computing.
Knowledge of containerization (Docker, Kubernetes) and cloud-based AI deployments.
Ability to mentor and develop team members.
Excellent decision-making, problem-solving, and conflict resolution skills.
Effective communication across technical and non-technical stakeholders.
Experience presenting technical concepts to executive leadership.