Job Description:
As a Kubernetes Operation Specialist, you will play a critical role in ensuring the reliability, scalability, and performance of our cloud-based systems. Your responsibilities include:
1. Infrastructure as Code (IaC) Automation:
- Utilize tools like Terraform and Ansible to automate cloud resource provisioning.
- Implement infrastructure changes as code, ensuring consistency and repeatability.
2. Kubernetes Cluster Management:
- Deploy and manage Kubernetes clusters for efficient container orchestration and microservices architectures.
- Optimize cluster performance and ensure high availability.
3. Monitoring, Logging, and Alerting:
- Implement robust monitoring, logging, and alerting solutions.
- Proactively identify and address performance bottlenecks and issues.
4. Collaboration with Product and Development Teams:
- Work closely with product and development teams to troubleshoot and correct errors promptly.
- Optimize systems to enhance operational efficiency.
5. Automation and Standardization:
- Continuously iterate on operational processes, moving towards automation and standardization.
- Improve overall service quality by streamlining workflows.
Job Requirements:
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- Minimum 5 years of experience as a DevOps/SRE Engineer, specializing in Kubernetes, Infrastructure as Code, and cloud-native tools.
- Familiarity with the Linux kernel, including practical experience in kernel networking, storage, file systems, memory, scheduler, and cgroup.
- Demonstrated understanding of Kubernetes and containerization technologies, including deploying and managing Kubernetes clusters.
- Senior Kubernetes operation experience, familiarity with Helm, Istio and practical expertise in optimizing high availability and disaster recovery architectures.
- Proficient troubleshooting skills for system layer and network layer performance issues and failures.
- Have operation experience with open-source software such as Etcd, Zookeeper, Kafka, ELK and Nginx/HAProxy, is preferred.