Roles & Responsibilities
· Design, deploy, and manage CI/CD pipelines to ensure smooth and consistent delivery of software to production.
· Administer, scale, and optimize Kubernetes (k8s) deployments, ensuring high availability and fault tolerance for all applications.
· Architect and maintain microservices infrastructure, ensuring seamless communication, efficient scaling, and robust security.
· Implement, customize, and oversee monitoring solutions for proactive detection of system anomalies and performance bottlenecks.
· Lead capacity planning and resource management efforts, leveraging pressure testing to benchmark, tune, and enhance system performance.
· Swiftly identify, troubleshoot, and remediate issues affecting critical service operations, ensuring maximum uptime and minimal disruption to users.
· Proactively participate in the creation, maintenance, and enhancement of Standard Operating Procedures (SOPs) to ensure uniformity in operational tasks.
· Manage high-severity incidents and events with significant customer impact, focusing on rapid detection, analysis, and recovery while coordinating with cross-functional teams.
· Innovate and develop automated operational tools and systems to minimize manual interventions and streamline processes.
Collaborate closely with development, QA, and business teams to align infrastructure and operations with organizational goals and customer needs.
Knowledge & Competencies:
· Bachelor's or higher degree in Computer Science, Information Systems or related fields
· Hands-on experience with at least one of the programming languages: Bash, Go, Python
· Good command of Linux environment with deep understanding of the Linux Operating System, including Kernel, Memory, Process, Threads, Static / Shared Libraries, IPC, Signals
· Understanding of standard networking protocols such as HTTP, DNS, SSL, TCP/IP, ICMP
· Experience in large-scale distributed environments. Familiarity with distributed systems including the CAP Theorem, Microservices
· Experience with container technology such as Docker, Kubernetes
· Experience with monitoring tools like Prometheus, Zabbix
· Strong sense of ownership, customer service, and integrity demonstrated
· Passion for eliminating repetitive manual processes using automation
· Fast learning ability and a good team player
Fluency in both English and Mandarin to deal with international stakeholders and stakeholders who are based in HQ