Trust is the first of a new breed of banks in Singapore – digitally native and focused on delivering a delightful customer experience. You will work in a fast-paced and collaborative environment to solve new and interesting challenges each day. Together with our Trust team, you will help shape the future of our bank and be able to work on and solve many interesting challenges which we are facing, learn new ways of working, and help build delightful high quality products for our customers.
As a Tech Operations Lead you'd be able to work on and solve some of the many interesting challenges we are facing, learn new ways of working, and build delightful high-quality products for our customers.
Job Responsibilities
As Tech Operations Lead, you will operate at the intersection of all three key areas, Product, Technology & Operations pillars which are aligned to Security, Risk, Stability, Resilience. This senior leadership role requires a visionary and strategic thinker who will be responsible for ensuring the security, risk mitigation, stability, and resilience of our systems and services.
You will be working on and solving some of the many interesting challenges we are facing. You will need to guide and train the TechOps team to troubleshoot, optimise and improve our stack whether that be our cloud infrastructure, microservice applications, or our data platform. Understanding the full stack and being able to support and debug issues will be a core requirement for this role.
Your team will drive troubleshooting, debug customer issues, enhanced observability, and work towards improve release quality. You are responsible for system reliability, working towards increasing productivity and reducing time to market by striving to reduce technical debt of the services TechOps team supports.
Responsibilities Include:
- Issue Resolution and Troubleshooting: Lead the Tech Ops team in troubleshooting and debugging customer issues. Enhance observability and work towards improving the quality of product releases.
- System Reliability: Take ownership of system reliability, with the goal of increasing productivity and reducing time-to-market by minimizing technical debt in the services supported by the Tech Ops team.
- Feedback and Issue Prioritization: Review feedback from clients and internally identified gaps. Collaborate with Product and Operations teams to prioritize and address system issues. Report on trends in client issues and prioritize them based on their impact on clients, reputation, and finances.
- Bank-wide Prioritization for TechOps: Keep track of prioritized items across the organization and provide early warnings for potential delays in completion.
- Standardized Engagement Model: Standardize the engagement model with partner systems, including contacts, contracts, and maintenance schedules.
- Alert Monitoring and Incident Management: Oversee a team responsible for monitoring and triaging alerts for services and platforms. Familiarize yourself with the Incident Management procedures and ensure the team follows them to prioritize and resolve issues. Troubleshoot incidents, identify root causes, and work on permanent resolutions.
- Runbook Preparation: Ensure that runbooks are prepared and made available to support staff before new features are released to production. This may add lead time to feature delivery, but it's essential for system stability.
- Service Level Adherence: As the Incident Manager, ensure that production issues are responded to and resolved within agreed turn-around times. Collaborate with Level 3 teams for scheduling and delivery of fixes.
- Infrastructure Support and Optimization: Support, monitor, and optimize AWS cloud infrastructure. Identify and resolve issues while scaling the platform to accommodate customer growth.
- Real-time Observability and Automation: Refine real-time observability and metrics to improve visibility into the performance of the full stack. Recommend process and technical improvements that lead to automation and scalability. Create, maintain, and optimize automation scripts and configurations for infrastructure provisioning, configuration management, and application deployment.
- Automation: Recommend process and technical improvements that are automated and scalable. Create, maintain, and optimize automation scripts and configurations for infrastructure provisioning, configuration management, and application deployment.
- Documentation: Maintain detailed documentation of TechOps processes, procedures, and configurations to ensure knowledge sharing and compliance.
- Performance Optimization: Identify and implement performance improvements to enhance system efficiency and reduce operational costs.
- Disaster Recovery Planning: Develop and test disaster recovery plans and procedures to ensure business continuity in case of system failures.
In order to be successful at the role, you must have the following:
- 12+ years of overall experience with the bulk of this experience in technology operations
- Excellent problem-solving and troubleshooting skills.
- Strong communication and collaboration abilities.
- Experience with incident management and system reliability.
- Familiarity with automation and optimization techniques.
- Open and transparent and a thirst for learning
- You would also have an educational background in Computer Science or related fields to have strong foundational knowledge of computer science fundamentals.
- Overall, you will have strong understanding of IT infrastructure and the best practices across its different domains.
- Familiar with Kubernetes (K8S) Microservice Architectures
- Familiar with Distributed Systems, Performance Tuning, massive Concurrency Handling, and Caching Mechanisms
- Familiar with Databases, such as Postgres and DynamoDB, and messaging solutions such as Kafka
- Knowledge of AWS resources such as EC2, RDS, ALB, ElasticCache, Auto-scaling Group, S3, VPC, EKS, ECS, Cloudfront, Route53, CloudWatch, Lambda would be required
Role Specific Technical Competencies
- Familiar with Kubernetes (K8S) Microservice Architectures
- Familiar with Distributed Systems, Performance Tuning, massive Concurrency Handling, and Caching Mechanisms
- Familiar with Databases, such as Postgres and DynamoDB, and messaging solutions such as Kafka
- Knowledge of AWS resources such as EC2, RDS, ALB, ElasticCache, Auto-scaling Group, S3, VPC, EKS, ECS, Cloudfront, Route53, CloudWatch, Lambda would be required
If you apply for a job with Trust or submit any personal information in connection with a possible job opportunity, you agree to our privacy notice for job applicants.
Come as you are! Trust is an inclusive and open-minded workplace. If you are good at what you do and care about doing a good job, that’s what we focus and want from you. So come as you are.