Head of Infrastructure
About Us
DFI Retail Group (“Group”) is a leading pan-Asian retailer. At 30th June 2021, the Group and its associates and joint ventures operated over 10,000 outlets and employed some 230,000 team members. The Group had total annual sales in 2020 exceeding US$28 billion. The Group provides quality and value to Asian consumers by offering leading brands, a compelling retail experience and great service; all delivered through a strong store network supported by efficient supply chains. The Group (including associates and joint ventures) operates under a number of well-known brands across food, health and beauty, home furnishings, restaurants and other retailing.
The Role
DFI Retail Group is currently seeking a Head of Infrastructure/SRE/DevSecOps with experience supporting a global company.
Reporting to the CIO, you will be in charge of the infrastructure (on premise and various Cloud Service Providers), databases, network (traditional and SDN), observability and monitoring to support application teams (SRE), and developing and maintaining the DevSecOps pipelines to support various product development teams. You will be key in driving the shift from traditional infrastructure to the public cloud making use of cloud native technologies.
You will build out expertise in Site Reliability Engineering, promote DevSecOps methodologies and focus on automation to improve operations. You will own the tools and initiatives for scaling the systems we support, optimizing performance, and improving the reliability and availability of our systems. You will be working closely with the rest of the Engineering team (Architecture, Cybersecurity, Product Development, Enterprise Systems) and provide insight and visibility on production systems performance based on a solid data driven approach.
You will identify needs, weaknesses, risks and build the roadmap to address them and bring the platform and services to the next level. You will need to be familiar with supporting container/kubenetes and serveless environment.
- Shape, manage, lead and mentor the Infrastructure, Network, Middleware, DevSecOps & SRE Team in a decentralized and global environment
- Define and drive the roadmap and priorities to bring the platform to the next level and support business requirements and operation excellence
- Manage, maintain, upgrade and monitor the critical infrastructure in a highly available environment
- Keep the overall platform, systems, data and information secure in applying best practices and techniques
- Identify risks early on and ensure they are addressed before they become actual problems
- Organize the team and define related processes to achieve 24/7 level 3 support
- Work closely with the rest of the Engineering team to design and architect the platform and services, productize services through configuration management, monitoring, alerting, and documentation
- Identify parts of the system that do not scale, provide palliative measures and drive long term resolution of these incidents
- Ensure public cloud follow well architecture framework and follow compliance prerequisites.
- Propose ideas and solutions within the infrastructure team to reduce the workload by automation (Terraform, Ansible, …)
- Perform and run blameless root cause analyses on incidents and outages while looking for answers that will prevent the incident from ever happening again
- Keep up to date with trends and innovation in engineering, including containers and orchestration, serverless and other programming paradigms, microservices, etc.
- Sustain learning and knowledge sharing culture in the organization and aim at achieving a high level of technical excellence and stability.
- Have a proactive, go-for-it attitude: when you see something broken, you can't help but fix it
- Prioritize tasks, work independently, and call out exceptions effectively
SRE Specifics
- Build further our monitoring and alerting solution for all our production platforms, APIs and systems
- Identify the SLI (Service Level Indicators) that will align the team to meet the availability and latency objectives (Service Level Objectives)
- Measure and optimize performance and solve issues across the entire stack: hardware, software, application, and network
- Define relevant KPI and metrics to assess and follow on the performance of the platform and systems
- Actively look for opportunities to improve the availability and performance of the system by applying the learnings from monitoring and observation with the rest of the team
- Represent the SRE organization in design reviews and operational readiness exercises for new and existing services
- Promote and spread the SRE culture across the organization and teams
DevSecOps Specifics
- Build and maintain a “golden pipeline” to support the product development teams.
- Identify the SLI (Service Level Indicators) that will align the team to meet the availability and latency objectives (Service Level Objectives)
- Help product teams standardise on build processes and integrate with cybersecurity appsec functions
- Automate pipeline to ensure smooth delivery of products with minimal human interaction needed
- Champion the Agile approach to infrastructure management
· The ideal candidate is a highly driven, self-motivated, technically hands-on individual who is truly excited about creating meaningful impact, willing to build and lead a
About You
· Highly driven, self-motivated, technically hands-on individual who is truly excited about creating meaningful impact
· Previous experience in building and leading teams of engineers.
· Have a combined startup mindset with the scale of an industry leader
· Experience in Project and change management
Benefits
· Team Member Discount
· Subsidized Medical and Dental Benefits
· Training Opportunities
· Career Advancement
Working Location: Tampines