Jobs in Singapore » Jobs in Singapore » Engineer, SRE (Sea Labs)

Engineer, SRE (Sea Labs)

Garena Online Private Limited

Job Type   /   Job Level

Full-time   /   Others/Any

Job Location

Singapore, Singapore, Singapore

Salary Offered

We are seeking a highly skilled Site Reliability Engineer (SRE) with a strong background in maintaining self-hosted Kubernetes clusters, where your primary focus will be on ensuring the stability and reliability of our production environment. Ensuring a smooth running infrastructure supports the work of our AI researchers as it provides them a steady and dependable platform.

Job Description

Work closely with AI researchers to understand their workflow and infrastructure needs, optimizing the cluster configurations accordingly.
Implement monitoring, alerting, and self-healing systems to ensure high availability and performance of the clusters.
Collaborate with development teams to design and implement best practices for infrastructure as code (IaC).
Drive automation initiatives to reduce manual toil and improve system resilience and scalability.
Document system design and procedures, provide guidance for researchers on our cluster advance usage.

Job Requirements

Bachelor's degree or higher in Computer Science, Engineering, or related fields.
Proven experience in managing self-hosted Kubernetes clusters in a production environment.
Strong understanding of containerization, orchestration, and the Kubernetes ecosystem.
Familiarity with AI workflows, machine learning/deep learning research background is a plus.
Proficiency in at least one programming language (e.g., Python, Go) and scripting skills for automation.
Good working attitude, problem-solving, critical thinking, and communication skills.

✱ This job post has expired ✱

Sharing is Caring

Know others who would be interested in this job?

Apply Now

Update & Apply

Save