Key Responsibility
· Lead an infrastructure team to maintain close relationships with the technology, application development and enterprise architecture teams across the organization and consistent infrastructure practices and principles for all solutions delivered. These practices and principles should be aligned to the overall organizational strategy and product.
· Drive implementation of Site Reliability Engineer (SRE) and Chaos Engineering design for all systems
· Champion production resiliency and availability, focusing on superior client experience, by working with the businesses, technology, application and architecture teams
· To monitor, assess and recommend new technology and advances that can help progress the organization’s business strategies.
Responsibilities
· Manages teams across multiple countries in ASEAN countries, Greater China and India, as well as partner with external teams.
· To develop and embed consistent practice patterns which aligns to the organization’s product strategy, building reusable operational frameworks and drive the adoption by multiple teams across the organization.
· To maintain an open library of shared infrastructure components for teams to reuse.
· To work with application teams across the organization to support their efforts to modernize their infrastructure platform architecture.
· To articulate technical solution(s) to senior stakeholders to get buy in.
· Support design and development of infrastructure solutions that improve resiliency, scalability and reliability for apps.
· Responsible for the set up and configuration of an Enterprise sharable IaaS platform by working with global production service teams
· Ability to conduct research into software issues and products as required.
· Hands-on development and experimentation on new technologies and techniques.
· Ability to work with the latest tools and techniques.
· Ability to effectively prioritize and execute tasks in a high-pressure, fast paced global environment.
· Knowledge in various different open-source technologies and techniques.
· Drive effective communication between business and technology with regards to production service reliability and performance.
· Drive continuous improvements in processes or systems leveraging Site Reliability Engineering methods.
· Improve the reliability and availability of systems by gathering hard data, designing systems for increased service reliability and performance.
· Provide expert advice and training to our application development teams as to which technology solutions and advanced reliability techniques to use on each situation.
Requirements
· Bachelor’s Degree in Computer Science and at least 15 or more years of relevant experience
· Experience driving major transformation programmes for production resilience and performance, and client experience.
· Experience in Site Reliability Engineering is essential.
· Subject matter expert in addressing problems related to system network and application design, performance, integration and security.
· Experience with complex, highly scalable distributed systems based on cloud and multi-cloud environment.
· Experience in End-to-End Enterprise Architecture with deep understanding of Cloud native,
· Microservice Architecture and service monitoring on cloud, PaaS environment such as OpenShift, Kubernetes is essential
· Experience with Core Java 8, Cloud Formation (or equivalent), Amazon Web Services (or equivalent), relational and non-relational databases, and Linux, Unix systems
· Experience in Agile and Test-Driven Development (TDD) methodologies
· Experience with high availability, high-scale, and performant systems.
· Experience with managing cloud architecture-based environment is highly desired.