Contract: 8 months, Potential to extend
Work Location: Harbourfront Centre
Experience: 6-10 Years
Role Description/Responsibilities:
- Accountable for day-to-day operational activities in TRUST systems, to ensure optimum system performance and determine the system strategy for business continuity.
- Review, implement and uphold the effective implementation of IT policies and operations protocols within TRUST.
- Lead a team of operational engineers to work with outsourced service providers and stakeholders in delivering TRUST operational objectives e.g. data conveyance, tools updates and maintenance, troubleshooting system issues, .
- Review and plan for systems integration efforts to ensure the TRUST system is built in an efficient and robust manner towards a high level of maintainability, reliability and availability based on established guidelines and best practices.
- Monitor and ensure that ATFM and Service Desk operations are functional, assist and triage issues.
- Review and ensuring backup solution for TRUST data and system are in place
- Responsible for ensuring data pipeline in operational state and ensuring data integrity and accessibility.
- Manage application/platform and security incidents, work with various internal teams and vendors to resolve issues on a timely basis to meet SLA, escalating to higher management if necessary. Reporting of incidents, short- and long-term incident resolution plans at appropriate forums.
Job Requirements and Skillsets:
- Formal AWS Certification.
- Substantial technical experience in AWS [direct/related work]
- Degree in Computer Science/Engineering, Information Technology, or in relevant disciplines.
- At least 6 years of working experience in Cloud-based services, IT operations and vendor management
- Proactive and dedicated individual with strong leadership, and multi-tasking capabilities
- Ability to build and maintain relationships with a wide array of people at both junior and senior levels
- Experience in running incident, problem and change management processes.
- Implementing processes as per ITIL framework. – incident, problem and change management, service transition.
- Familiarity with security and access control measures to control privileged access to test and production environment.
- Experience in networking technologies such as WAN, LAN, Network Security, Firewall rules, Load Balancers, VPNs and DNS.
- Knowledge of disaster recovery, system backup and restore
- Experience with cloud-based services (e.g. AWS including including Redshift, EMR, QuickSight, Lambda, Glue, etc) and project management tools (e.g. Atlassian, JIRA) are an added advantage
- Infrastructure as Code (IaC): Familiarity with Infrastructure as Code tools (e.g., Terraform, CloudFormation) for managing AWS resources.
- Cost Management: Ability to optimize costs and manage AWS budgets effectively.
- Database and Data Warehousing: Knowledge of database management systems, data warehousing concepts, and SQL.
- Monitoring and Troubleshooting: Proficiency in monitoring tools (e.g., CloudWatch) and troubleshooting issues in AWS environments.
Candidates with work experience on any of the following will be considered favorably:
- Experience with setting up and or running operations for research projects, and machine learning platforms will be an advantage.
- Experienced in and ability to navigate the Public Agency IT/System environment is preferred.
- Experience in data analytics systems is preferred.
- Experience with working in Government Commercial Cloud (GCC) environment and familiar with relevant change control framework will be an advantage
- Understanding of DevOps principles and practices.