You will:
- Be responsible for the cloud infrastructure in terms of: scalability, availability, performance, (cost & resource) efficiency, capacity planning.
- Spend lesser than 50% of the time working through our vendors in carrying out the day-to-day IT operation such as; performance monitoring, attending to issues, manual intervention and service requests (infrastructure provisioning, deployment, data/system backups, patching and disaster recovery).
- Be responsible for system administration tasks such as OS & application patching, software upgrades, backup, restore, etc. for our cloud infrastructure (AWS, Azure).
- Drive troubleshooting, incident response/ resolution and blameless post-mortems.
- Be responsible to maintain services once they go live by measuring and monitoring availability, performance, and overall system health.
- Strive to streamline and secure cloud infrastructure management by proactively monitoring and protecting system boundaries, application deployment and release status, automating manual tasks, and keeping systems secured.
- Spend most of the time on development tasks such as writing Infrastructure-as-Code (IaC), continuous improvement, and driving initiatives to improve automation, scalability and reliability.
- Develop automation code for change control, configuration management, deployment and maintenance of infrastructure and applications through CI/CD pipeline.
- Improve service resiliency through high levels of automation, to effectively detect/ predict/ prevent issues in the environment.
- Develop and fine-tune change & incident management processes across teams.
- Scale systems sustainably through mechanisms like automation; evolve systems by pushing for changes that improve reliability and velocity.
- Automate provisioning by IaC and Configuration-as-Code (CaC) and other tasks by serverless functions (e.g. AWS Lambda, Azure Functions).
- Review resource/ workload to optimise cost.
You have:
- At least a diploma in Computing/ Computer Studies/ Information Systems/ ICT or equivalent.
- To enjoy cloud, systems and security management, and relentlessly automating work.
- Related cloud experience attested by certifications in SysOps, DevOps (AWS preferred)
- At least 3 years of hands-on experience operating and maintaining systems running on Cloud infrastructure (AWS preferred).
- Familiarity with AWS IAM.
- Proven experience in various IaC (e.g. Cloudformation and Terraform) & CaC (e.g. Ansible) tools.
- Hands-on experience administering Unix & Windows operating systems as well as automating with shell scripts.
- Deep appreciation of infrastructure and application monitoring, logging, alerting, release and configuration management.
- Deep understanding in networking (e.g. HTTP/ TLS protocol, TCP/IP, routing tables, network topology, load balancers, DNS, NTP, Network VPC/ vNET Peering).
- Experience in standard IT security practices (e.g., encryption, certificates, key management, SSH).
You will catch our attention if you:
- Have proven experience in Site Reliability Engineering (SRE), DevSecOps practices and methodologies.
- Have experience in operating containerised workloads (using Docker / Docker Compose, Kubernetes).
- Have experience operating internet-facing 24/7 high-load applications (e.g. eCommerce).
- Have experience in administering WordPress Multisite and Bitnami systems