Observability Principal Engineer , Site Reliability Engineer (SRE) or Monitoring specialist
2 weeks ago
We are seeking a skilled Observability Principal Engineer with at least 2-3 years of experience in observability to join our dynamic team. In this rol..
We are seeking a skilled Observability Principal Engineer with at least 2-3 years of experience in observability to join our dynamic team. In this role, you will be responsible for implementing, managing, and optimizing observability tools. You will work closely with cross-functional teams to ensure that our systems are monitored effectively, and issues are identified and resolved proactively.
Key Responsibilities:
- Design, implement, and maintain observability frameworks using tools such as Prometheus, Grafana, ELK Stack, tableau or similar.
- Design, implement, and maintain Monitoring tools such as BMC, CA, SolarWinds, SCOM, Dynatrace, Datadog or similar.
- Create and manage dashboards, visualizations, and reports to communicate system health and performance metrics.
- Collaborate with the sales team to understand client requirements and demonstrate how our observability solutions can address their specific needs.
- Prepare and deliver presentations, demos, and workshops to potential clients showcasing the capabilities and benefits of our observability tools.
- Troubleshoot and resolve tools-related issues in a timely manner.
- Assist in the training and mentoring of team members on observability and monitoring tools and practices.
Job Requirements:
- Bachelor’s degree in computer science, Engineering, or a related field.
- 2-3 years of experience in software development, Implementation, operations, or a related field with a focus on observability tools.
- Proficiency in implementing and managing observability tools.
- Solid understanding of cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes).
- Experience with scripting languages (Python, Bash, etc.) for automation tasks.
- Knowledge of best practices in monitoring, logging, and incident management.
- Strong analytical skills with the ability to diagnose issues and propose effective solutions.
- Excellent communication and collaboration skills, with a proactive approach to problem-solving.
- Technical experience in Enterprise Monitoring tools such as Dynatrace, Grafana, BMC,
- Knowledge of Automation tools, Cloud Technologies and DevOps Concepts, Open systems and Networking Technologies
- Good knowledge in various monitoring tools e.g. BMC, SolarWinds, CloudWatch and Azure.
- Experience with configuration management tools (Ansible, Terraform, etc.).
- Familiarity with APM (Application Performance Management) tools such as New Relic, Dynatrace, or similar.
- Understanding of network protocols and architectures.
- Experience with orchestration tools (e.g., BMC, Kubernetes, Apache Airflow, Jenkins) to create and manage automated workflows for deploying, monitoring, and scaling observability solutions.
Preferred Qualifications:
- Proficiency in observability tools (e.g., Grafana, ELK Stack, Datadog, Prometheus etc).
- Proficiency in ITOM tools (e.g., BMC, Dynatrace, CA, SCOM, IBM, SolarWinds etc).
- Strong understanding of monitoring and logging frameworks.
- Experience with distributed systems and microservices architecture
- Ability to write scripts for automation and data analysis.
- Experienced in cloud platforms (AWS, Azure, GCP) and their monitoring services.
- Experience with CI/CD pipelines and infrastructure as code (IaC) tools like Terraform or Ansible
- Relevant certifications in cloud computing, DevOps, or observability tools can be a plus.
Official account of Jobstore.