Responsibilities: As a Reliability Engineer, you will be responsible for: • Develop automation and processes to enable and constantly improve the deployment and management of runtime at scale (either namespaces or Kubernetes clusters). • Monitor and troubleshoot Kubernetes clusters, identifying and resolving performance bottlenecks, security vulnerabilities, and other operational issues. • Stay updated with the latest Kubernetes developments, best practices, and industry trends, and recommend relevant improvements to our platform. • Collaborate with development teams to containerize applications and deploy them on Kubernetes, ensuring best practices for scalability, availability, and performance. • Develop automation and processes to enable and constantly improve the deployment and management of applications on the runtime platform. • Participate in on-call rotations and respond to incidents in a timely manner, conducting post incident reviews and implementing preventive measures. • Monitor services to identify bottlenecks, forecast system behaviour and scale infrastructure as needed. • Implement comprehensive monitoring solutions to provide real-time insights into application and infrastructure health • Efficiently manage incidents and outages, minimizing MTTR • Build automation around system health assessment and self-remediation Requirements: • Bachelor's degree or Diploma in Computer Science, Engineering, or a related field (or equivalent experience). • Proven experience as a Reliability Engineer or similar role, with a strong background in containerization, orchestration, and cloud-native technologies. • In-depth understanding of Kubernetes architecture, components, and operational best practices. • Hands-on experience with containerization technologies like Kubernetes, especially AWS EKS, and Helm. • Proficiency in scripting and automation using tools like Bash, Python, or similar. • Solid understanding of networking, security, and storage concepts in Kubernetes. • Ability to troubleshoot and resolve complex technical issues related to Kubernetes and containerized applications. • Experience with integrating Kubernetes with AWS cloud technologies, such as Secrets Manager, Load Balancers, etc. • Strong communication and collaboration skills, with the ability to work effectively in cross functional teams. • Experience with CI/CD tools (Jenkins, GitLab CI/CD, ArgoCD) and version control systems (Git). • Experience in Error Budgets to balance reliability with the pace of innovation • Familiarity with other cloud platforms (GCP, Azure), and infrastructure-as-code (Terraform) is advantageous • Certifications such as Certified Kubernetes Administrator (CKA) or Certified Kubernetes Application Developer (CKAD) are a plus • Experience with observability and monitoring tools (Prometheus, Grafana, ELK Stack) is a plus • Experience with pager app is a plus • Experience with automate testing tools (testkube, ginkgo) is a plus • Experience with implementing and maintaining Kubernetes operator using Go is a plus • Experience with service mesh technologies is a plus • Experience with Chaos Engineering is a plus Soft skills: • Excellent problem-solving mindset and strong analytical abilities • Clear and effective communication skills • Adaptability and continuous learning mindset