Business Function
Group Technology and Operations (T&O) enables and empowers the bank with an efficient, nimble and resilient infrastructure through a strategic focus on productivity, quality & control, technology, people capability and innovation. In Group T&O, we manage the majority of the Bank's operational processes and inspire to delight our business partners through our multiple banking delivery channels.
DBS Bank is looking for a Platform SRE Engineer with experience working on enterprise level data engineering, analytics, and observability applications. The SRE engineer would be responsible for ensuring high availability of the platform services and perform continuous improvements to increase the platform’s efficiency and resiliency. The SRE engineer will also perform automation development tasks to remove toil and increase the team’s productivity.
Responsibilities
- Develop monitoring and onboarding guidelines for various applications in AppDynamics, ensuring accurate monitoring and data collection.
- Identify and resolve performance issues through detailed analysis of transaction traces, application logs, and system metrics.
- Automate routine tasks and reporting processes using AppDynamics APIs and scripting, reducing manual effort and improving efficiency.
- Participate in on-call support rotation, addressing, and resolving critical performance issues in a timely manner.
- Collaborate with stakeholders to define performance metrics and monitoring requirements aligned with business goals.
- Design and implement monitoring solutions to track application performance, identifying bottlenecks and optimising system efficiency.
- Conduct performance tuning and capacity planning to ensure applications meet scalability and reliability requirements.
- Develop custom dashboards and reports to provide actionable insights and drive decision-making processes.
- Collaborate with development and operations teams to integrate AppDynamics with CI/CD pipelines and other DevOps tools.
- Configure and fine-tune alerts to proactively detect and address performance issues before they impact end-users.
- Continuously review and enhance monitoring processes and methodologies to improve efficiency and effectiveness.
- Contribute to internal knowledge bases, create documentation, and share insights with the team to promote a culture of learning and collaboration.
- Work with LOBT application teams to develop long-term monitoring strategies that align with business goals and technology roadmaps.
- Create data retention polices and access control (RBAC) to manage user permissions.
- Perform application maintenance, patching, upgrading controller versions, agents etc and ensure EOS/EOL is maintained.
Requirements
- University graduate (computer science or related field) with good experience working with contemporary technologies and scripting languages.
- Strong communication skills and ability to explain protocol and processes with team and management
- A passion for learning and using new technologies in the open source communities.
- A passion for coding.
- Min 8 years of IT work experience.
- Working knowledge on AppDynamics OnPrem & SAAS
- In-depth experience in Unix/Linux/Shell/Python scripting with quality, scalability, and extensibility
- Experience in standardising PII settings in AppDynamics ensuring sensitive data is handled correctly, minimising risk, and protecting user privacy.
- Experience in triaging and troubleshooting application problems quickly in AppDynamics by using various techniques - Transaction snapshots, Diagnostic Sessions, Data Collectors
- Knowledgeable and experienced in SRE (Site Reliability Engineering) practices covering monitoring, observability, performance management, automation, and resiliency.
- Knowledge in Elastic Stack, Confluent Kafka, Grafana, Prometheus & other APM tools is a plus.
- Knowledge in AI/ML capabilities to automate RCA’s and shorter MTTR when issues arise.
- Good understanding of Network routing, Load balancing and Networking protocols; a base knowledge of TCP/IP, with an understanding of HTTP and DNS
- Ability to contribute to discussions on design and strategy.
- Adequate knowledge of database systems (RDBMS, MariaDB, SQL, NOSQL), Object Oriented Programming and web application development.
- Good problem diagnosis and creative problem-solving skills
- Experience in NodeJS, Spring boot could be a plus.
- Experience in automation tools (e.g. Ansible) & DevOps pipelines would be a plus.
- Self-driven, committed, and reliable team player.
Apply now
We offer a competitive salary and benefits package and the professional advantages of a dynamic environment that supports your development and recognises your achievements.