Job description & Responsibility :
- Collaborate with application development teams to identify and address reliability and performance concerns during the development lifecycle.
- Monitor and maintain the health of our applications, both in production and pre-production environments.
- Proactively identify and mitigate risks, bottlenecks, and potential points of failure in the application stack.
- Develop and maintain monitoring and alerting systems to quickly detect and respond to incidents.
- Participate in incident response and post-mortem activities to minimize downtime and prevent future occurrences.
- Automate repetitive tasks and processes to improve efficiency and reduce human error.
- Contribute to the design, implementation, and optimization of our deployment pipelines.
- Work on capacity planning, scaling, and load balancing strategies to accommodate growing user traffic.
- Collaborate with development teams to ensure that applications are designed with reliability and scalability in mind.
- Experience in Application Support.
- Proficiency in programming languages such as .NET, Java and C#.
- Strong knowledge of containerization technologies (Docker, Kubernetes) and cloud services (e.g., AWS, GCP, Azure).
- Experience with monitoring and observability tools (e.g., Datadog, Prometheus, Grafana, ELK Stack, or similar).
- Familiarity with CI/CD pipelines and automation tools (e.g., Cloudbees, Jenkins, GitLab CI/CD).
- Solid understanding of infrastructure as code (IaC) principles and tools.
- Excellent problem-solving skills and the ability to work effectively in high-pressure situations.
- Strong communication and collaboration skills to work closely with development and operations teams.
SKILLS: NET, Java, C#, Docker, Kubernetes, AWS, GCP, Azure, Datadog, Prometheus, Grafana, ELK Stack, Cloudbees, Jenkins, GitLab CI/CD