Operational Resilience Engineering provides resilience measurement and monitoring for the firm’s Important Business Services, a priority for regulators globally, with key deadlines over the next few years. Future state vision for this function is to design for resilience as we launch engineering platforms and products, share availability of critical services publicly with our clients, reduce operational loss and support costs, and continuously demise unused systems.
Heightened focus on Operational Resilience is key to our strategic theme of enhancing Engineering risk management. The measurement and monitoring of operational resilience includes:
• Defining a consistent taxonomy for Business Services, associated functions and Engineering Services
• Enabling dependency mapping of those Services to the assets they rely on, including systems, people, processes, vendors and facilities
• Creating a scenario catalog with impact tolerances/error budgets, ensuring recovery plans are in place and automating regular testing
• Maintaining a Resilience Dashboard for users across the firm to define the above, and view resulting analytics
The tooling to enable this initiative is a joint-venture between Operational Resilience Engineering, Site Reliability Engineering, Operational Risk and Resilience Metrics, Enterprise Technology Operations and Risk Engineering.
The program involves architecting and building the tooling from scratch on cutting edge technology stack that is highly scalable and performant to support 4,000+ users firmwide. The tooling is being built with the intent to be fully cloud native over the next few years, while starting in a hybrid mode using containerized deployments in view of time to market. Given the regulatory nature of the program, all the data will be modelled in the firm’s strategic data platform (Alloy) and go through data governance sign offs. Tooling will be built in tight alignment with site reliability principles to ensure a highly resilient architecture.