About the Role:
We are seeking a highly skilled and experienced onsite Manager to lead our Digital Operations Center (DOC) team. The ideal candidate will have a strong background in leading teams who are responsible for alerting, monitoring, and escalations within a 24/7/365 Digital Operations Center (DOC) environment. This role is critical in ensuring the stability, performance, and reliability of our network and services. The onsite DOC Manager will provide technical expertise, leadership, and support of the daily activities of the U.S. DOC Duty Managers (watch officers).
What You’ll Do:
- Leading:
- Lead efforts involving the monitoring of key internal services and incident teams, as well as escalation and alerts to our crisis team(s).
- Champion best practices of the DOC Playbook through the entire DOC team.
- Guide the implementation, continuous improvement and documentation of new and existing policies, procedures and processes for the DOC
- Coordinate with facilities and security with regard to physical location(s) requirements.
- Monitoring and Alerting:
- Continuously monitor internal IT and product-related incidents using various tools and platforms.
- Develop, configure, and manage alerting systems to promptly identify and alert relevant parties to emerging issues.
- Facilitate crisis team assessments or activations through rapid situational awareness to incident coordinators.
- Incident Management and Escalations:
- Coordinate and escalate issues to appropriate teams and stakeholders as needed.
- Maintain clear and concise communication during incidents, providing regular updates to stakeholders.
- Collaborate with cross-functional teams to support root cause analysis of complex issues.
- Document and maintain standard operating procedures for DOC response and escalation processes.
- Continuous Improvement:
- Identify opportunities for improving monitoring and alerting systems and processes.
- Participate in post-incident reviews and contribute to the development of preventive measures.
- Stay up-to-date with industry trends and best practices in all-hazards operations center operations and technologies.
- Documentation and Reporting:
- Maintain detailed and accurate incident logs and documentation.
- Generate regular reports on tracked incidents, assessments, and status.
- Provide insights and recommendations based on incident analysis and trends.
- Facilitation of post-mortem and Root Cause Investigations with Site Reliability Engineers.
What You’ll Need:
- Education:
- Bachelor’s degree in Computer Science, Information Technology, or a related field. Equivalent work experience will be considered.
- Experience:
- Minimum of 8 years of experience in a leadership position, ideally overseeing Digital Operations Center (DOC), Major Incident Management Operations (IMOC) or similar environment.
- Proven experience with maintaining a common operating picture with existing monitoring tools and situational awareness dashboards.
- Strong understanding of network protocols, systems and infrastructure.
- Skills:
- Excellent problem-solving and analytical skills.
- Strong communication and interpersonal skills.
- Ability to work effectively under pressure and manage multiple priorities.
- Experience with ServiceNow Major Event Management Modules a plus.
- Familiarity with ITIL practices and frameworks is desirable.
- Familiarity with incident command system (ICS) principles and best practices in the technology industry.
- Work Environment:
- This position may require shift work to ensure 24/7 coverage of the DOC.
- Ability to work in a high-stress, fast-paced environment.