About SGB:
SGB is a new digital bank that will offer a secure and integrated platform to access and
manage conventional and digital assets and financial solutions, including round-the-clock real
time settlement, trading connectivity, custody and asset management. It serves global
investors, innovators and institutions looking for a differentiated digital banking experience.
SGB is licensed by the Central Bank of Bahrain (CBB).
About the Team:
The Site Reliability Engineering (SRE) team is responsible for ensuring the stability, reliability,
and performance of the digital bank's services and infrastructure. Key responsibilities include
system availability, performance and capacity management, incident management, change
management, automation, CICD, backup and disaster recovery, security and compliance.
Responsibilities:
● Design and set SLI and SLO for various systems in SGB, drive stability-related workstreams
with cross-functional teams.
● Define Change Management processes, drive process implementations, and continuous
improvement with relevant teams.
● Defining incident management processes, including incident response and resolution
workflows, root cause analysis and drive corrective actions.
● Define and improve backup and disaster recovery strategies, ensure stability and
availability of systems in SGB.
● Design and maintain CI/CD pipelines, collaborate with developers to improve the
pipelies’s efficiency, support testing and deployment activities.
● Ensure security and compliance of the development, operation and change management
practices through collaboration with relevant teams.
Qualifications required:
● A bachelor's degree in computer science, information systems, or its equivalent.
● Strong sense of responsibility and passionate system operation and stability work,
excellent communication, problem solving, and critical thinking skills.
● Extensive experience in system and service stability related work, familiar with high-
availability architecture, backup and disaster recovery strategies.
● Expensive hands-on experience on monitoring platforms, like Zabbix, Prometheus,
grafana and automation tools, like Ansible, terraform.
● Familiar with AWS Cloud, Linux OS, TCP/IP, load balancers, NGINX, http protocol,
databases, storages.
● Solid programming skills, well versed in least one of the programming languages: Python,
JAVA, Golang.