Job Description
- Platform Development & Integration: Partner with vendors and internal teams to design and develop the Synapxe observability platform, integrating AI/ML functionalities for advanced monitoring, logging, and anomaly detection.
- Infrastructure Onboarding: Lead efforts to onboard existing infrastructure devices to the observability platform, ensuring comprehensive system coverage.
- Anomaly Detection & Self-Healing: Implement and optimize capabilities for early anomaly detection, pattern analysis, self-healing, infrastructure resizing, noise reduction, and outage prediction to improve system reliability and reduce response times.
- Visualization & Reporting: Create and maintain visualizations in observability tools, offering unified views for infrastructure and security. Support users in report customization and dashboard creation to facilitate data-driven insights.
- Team Leadership: Guide and mentor team members and junior engineers, fostering a collaborative environment to promote skill development and alignment with organizational goals.
- Technical Support & Troubleshooting: Provide advanced troubleshooting and analysis for platform users and support team members in resolving complex issues. Document processes and create user guides for ongoing support.
Requirements
- At least 6 years of experience in an enterprise-level infrastructure environment.
- Strong technical expertise in infrastructure monitoring (Elasticsearch), logging, and visualization tools.
- Proficiency in server and network technologies and infrastructure automation.
- Knowledge of programming and scripting languages for automation.
- Excellent troubleshooting and pattern analysis abilities.
- Strong leadership, communication, and documentation skills.
- Ability to work collaboratively, fostering a team-oriented dynamic.