In this role, you will perform functions to ensure system reliability and stability for AIA Singapore’s applications and services.
WHAT YOU’LL BE DOING
You will have the opportunity to manage a variety of complex system and champion services incidences/issues to fix them and learn as much as possible from these incidences:
- Ensure the reliability and availability of critical production services and applications.
- Be responsible to monitor system performance, identify potential issues and implement proactive measures to prevent downtime.
- Work closely with development and infrastructure teams to build and maintain scalable and resilient applications, automate processes and improve overall system health.
- Analyze system performance metrics, identify bottlenecks and optimize performance.
- Provide guidance on engineering practices for building and maintaining resilient services.
- Be on-call rotations.
- Accountable to the duty assigned by supervisor in order to meet operational and/or other requirements.
WHAT WE ARE LOOKING FOR
- Degree from a recognized University preferably in Information Technology, Computer Science, Computer Engineering.
- A minimum of 5 years in an operations leadership position managing call rotation, leading incident response and no-blame post-mortem analysis.
- At least 3 years of experience in software development in one or more of the following: Java, Node.JS, AEM, ASP, ReactJS, React Native etc. Experience and knowledge on full stack are preferred.
- Have at least 5 years of experience managing engineering team with hand-on knowledge on application, networking, OS (Unix and Windows), database and storage for both on-premises.
- Experience with distributed systems and public cloud services, such as Azure Kubernetes Services, Azure AppInsight, etc.
- Strong experience in a Continuous Integration/Continuous Delivery (CI/CD) environment with strong appreciation of change/version control process and methodologies.
- Worked with DevOps and Automation tools (E.g. Selenium, SOAPUI, Bamboo, Jenkins, Ansible, Marvin, Github, Bitbucket, Nexus, Jira, Confluence etc).
- Uses best practices and knowledge of internal/external business issues to improve products or services.
- Ability to work in high-pressure environment, troubleshoot complex issues across on-prem and cloud quickly, and successfully handle multiple priorities.
- Have systematic problem-solving approach, effective communications skills and have sense of ownership and drive.
- Works independently with minimal guidance.
- Manage resource and ability to perform capacity planning.
- Applies best practices and knowledge of internal/external business issues to improve products or services in own discipline.
- Solves moderately complex problems; takes a new perspective on existing solutions.
- Interprets customer needs, assesses requirements and identifies solutions to non-standard requests.
- Is accountable for technical contribution to project team or sub-team, and builds awareness of costs related to own work