Business Resumption Planning for Banks
by Aaron Cohen, Technology Architect, Federal Reserve Bank of Chicago, and Anthony Toins, Examiner, Federal Reserve Bank of Chicago
Business resumption planning is a comprehensive bankwide process that defines how a bank is to respond to and recover from business disruptions, enabling a bank to continue to support constituents and stakeholders alike. The plans incorporate business processes, people, and technology.
Many community banks rely heavily on third-party service providers to deliver core banking solutions that, when key services fail, create a single point of failure for these banks. In 2012, when Superstorm Sandy disrupted payment processing and thereby affected liquidity levels, many community bankers realized the importance of having cost-effective solutions to manage single-point-of-failure situations.
Business resumption planning is not only about third-party risk but should also address everything from pandemics, like the 1918 flu pandemic,1 to terrorist attacks, such as the September 11th attacks, to natural disasters, to nation- or state-sponsored cyberattacks2 against financial sector institutions. The planning process should address the range of disruptions or failures that could occur and include mitigants for each type of disruption or failure.
This article discusses business resumption in the context of business continuity and disaster recovery planning.3 The goals are to provide banks with concepts and ideas to consider when developing or strengthening their business continuity planning processes as well as to encourage a dialogue between institutions and examiners about business resumption planning based on a shared language.4
Business Continuity Planning Process
The business continuity planning process includes developing strategies for the resumption of critical business processes and the technical recovery of critical information systems supporting those functions. A bank should approach business continuity planning as a bankwide responsibility that should prioritize business objectives. Business continuity planning should consider how essential processes, business units, departments, and information systems will contribute to a coordinated response to a bankwide disruption. The approach should include plans for both short-term and long-term disruptions and recovery operations. A tight integration of the institution’s overall planning process with that of the individual business units’ plans for resumption of essential processes is critical for business resumption and recovery. Bank senior management should set the tone at the top that business continuity is everyone’s responsibility and not just an information technology (IT) issue handled by the IT function.
Banks should consider adopting an iterative approach to business continuity planning. The four steps for an effective program are (1) business impact analysis, (2) risk assessment, (3) risk management, and (4) monitoring and testing.5 Additionally, when key bank functions are outsourced, third-party risk should be considered during the planning process. The business continuity planning process should evolve continuously in response to changes in potential threats and business operations and to address audit recommendations and test results.
Business Impact Analysis
The first step in the business continuity planning process is the business impact analysis, which identifies mission-critical business functions and quantifies the impact a loss of those functions (for example, operational and financial) may have on the organization.6 It also should determine how quickly essential business units and/or processes can return to full operation following a disruption, as well as identify the resouces required to resume operations. It is important that the analysis include a bankwide view, with contributions from senior management representatives from all lines of business, not just the IT function. And, finally, the business impact analysis should be approved by both the bank’s senior management and board of directors and should be updated at least annually or when there are significant changes at the bank to either business processes or the IT infrastructure.
A business impact analysis should include:
- an assessment and prioritization of all business processes;
- identification of the potential impact of business disruptions resulting from uncontrolled, unknown events on the bank’s business functions and processes;
- identification of the legal and regulatory requirements;
- an estimate of maximum allowable downtime; and
- an estimate of recovery time objectives,7 recovery point objectives,8 and critical path recovery (banks should document how recovery times/objectives are determined and whether they are validated by testing).
Risk assessment is the second step in the business continuity planning process. While a risk assessment determines what could cause an outage, a business impact analysis attempts to measure the effects should an outage occur. The risk assessment identifies threats, vulnerabilities, and the potential impact on a bank’s critical activities and supporting resources. Senior management should use this information to identify where risks exceed risk appetite and develop a program to reduce the likelihood and impact of disruptions.
The risk assessment should include:
- an evaluation of business impact analysis assumptions using various disruption scenarios;
- analyses of potential disruptions based on the impact to the bank, its customers, and the local economies served;
- identification of the legal and regulatory requirements;
- prioritization of potential business disruptions based on severity;9 and
- an analysis of the gap between existing business continuity planning and the policies and procedures that should be implemented..
A bank’s senior management should be responsible for maintaining a current risk assessment based on changes to the bank’s IT environment, audit findings, and business continuity/disaster recovery planning test results.
Bankwide Risk Management
Risk management is the third step in the development and maintenance of a sound business continuity planning process. Risk management in this context should be able to measure and reduce risks to an acceptable level through a well-developed business continuity planning process. This process should be based on the business impact analysis and risk assessment. While the development and maintenance of the business continuity plan may be outsourced, the ultimate responsibility for risk management resides with the bank’s board and senior management.10 The business impact analysis and risk assessment should be an integral part of the formally documented business continuity plan. The impact analysis and risk assessment should provide the bank with sufficient information to monitor its business continuity plan and to determine when material and significant changes in internal and external conditions have occurred that necessitate revisions to the plan. The business continuity plan should focus on threats that have a relatively high likelihood of disrupting operations and should describe the various types of realistic events that could prompt the formal declaration of a disaster and the process for invoking the business continuity plan. Also, the business continuity plan should be updated by each business unit, reviewed and approved by the board and senior management at least annually, and communicated to employees for timely implementation.11
Monitoring and Testing the Plan
Monitoring and testing make up the final step and validate that the business continuity planning process remains viable and does not overlook significant changes that may require revisions to the plan. Therefore, senior bank management should commit sufficient budget, staff, and time to a robust bankwide testing program to validate that the business resumption plans would actually work in the event of a disruption. Bank testing programs should define roles and responsibilities; outline test strategies and test plans; analyze and report testing results, including lessons learned; and lead to the development of action plans to address weaknesses identified through the testing.
Business Continuity Planning for Outsourced Technology Services Management
Banks are increasingly outsourcing critical operations to third-party service providers. However, this practice does not relieve bank management of its oversight responsibility for ensuring that outsourced activities are conducted in a safe and sound manner. An effective vendor management program should provide the framework for bank management to identify, measure, monitor, and mitigate the risks associated with outsourcing. The bank’s oversight process should provide sufficient information to monitor the performance of its third-party service providers that could negatively affect the bank’s ability to recover IT systems and return critical functions to normal operations in a timely manner. There are four key areas of business continuity planning that banks should address with respect to the resilience of technology services:12
- Third-Party Management addresses the bank’s responsibility to control the business continuity risks associated with its technology service providers and their subcontractors.
- Third-Party Capacity addresses the potential impact of a significant disruption of a third-party servicer’s ability to restore services to multiple clients.
- Testing with Third-Party Technology Service Providers addresses the importance of validating business continuity plans with technology service providers and provides considerations for a robust third-party testing program.
- Cyber Resilience addresses aspects of business continuity planning unique to disruptions caused by cyber events.13
Test Strategies and Approaches
After building out an effective business continuity planning program and incorporating third-party risk, a bank should test its plans at least annually.14 However, there may be situations that require a bank to test the plans more frequently. For instance, if a bank undergoes a merger or acquisition or if there have been material changes to business processes or the IT infrastructure, the bank should consider retesting the business resumption plans to reflect the new environment.
There are four testing approaches15 (listed in order of least to most rigorous):
- Tabletop exercise
- Walk-through drill
- Functional drill
- Full-interruption test
Preliminary Exercises. Tabletop exercises and walk-through drills should be viewed as preliminary tests to the more rigorous testing methods discussed below. In these preliminary tests, representatives from each of the bank’s functional areas meet and review the business resumption plans. In a tabletop exercise, the bank’s business line representatives review and evaluate the plans in context of objectives, scope, assumptions, and organizational structure, as well as review testing, maintenance, and training requirements. In a walk-through drill, the representatives take testing one step further and identify a specific potential disruptive event scenario. The representatives talk through the steps that would be performed as part of the restoration and recovery of the bank’s business operations. The challenge with these two methods is that they give minimal insight into how the bank would actually respond in the event of a real disruption because none of the business resumption plan components are actually engaged and evaluated for real-world effectiveness.
Real-World Testing. Functional drills and full-interruption tests involve implementing and executing the bank’s business resumption plans in a setting that closely mimics real-world disruptive events. A functional drill is a full test of the bank’s plans and generally includes running the bank’s business operations from an alternate site and the primary site concurrently and comparing the results. The end goal is to determine if the alternate site can support the bank’s business operations. By contrast, a full-interruption test shuts down the primary site’s operations and has the alternate site support the bank. The full-interruption method should be thoroughly planned before executing to ensure that business operations will not be negatively affected.
Senior bank management should ensure that the appropriate staff is assigned to participate in testing. Senior bank management should also evaluate the inherent tradeoffs between testing rigor and the level of confidence provided by the testing approaches and select a method that is most appropriate for the bank. The selected testing method should reflect the bank’s experience with business resumption for its current environment in the context of size, complexity, and nature of its business. Some banks have addressed the inherent tradeoffs in testing methods by performing an annual functional drill test and benchmarking their results against formally defined recovery time and point objectives.
Business Resumption Testing Documentation
Banks should document the following when performing any test:
- Date/time of testing
- Locations tested
- Business processes tested
- A summary comparing testing objectives with actual testing results
- Identification of material deviations from test plans, including whether or not intended participation levels were achieved
- Issues identified during testing, including remediation plans
- Evaluation by a qualified independent party not involved in the testing
For testing results to have meaning, senior bank management should review the results and provide a report on its assessment of the results to the board, audit function, functional business units, and the IT function. Consistent with conducting testing at least annually, reporting should also be performed at least annually. The reporting that is presented to the board should provide enough information to allow the board to determine if the business resumption plans meet the objectives embodied in the business impact analysis.
When there are material changes to the environment either from a business process or technology perspective, bank examiners expect that the business resumption plans will be updated to reflect the new environment and tested to determine that the plans are still valid. Examples include regulatory changes (such as data retention requirements), mergers and acquisitions activity, changes in vendor relationships, and changes to the IT infrastructure.
Typical Business Continuity and Disaster Recovery Planning Deficiencies Noted by Examiners
Typical deficiencies noted during examinations have included the following:
- Business continuity/disaster recovery test plans and/or testing not completed or updated in a timely manner
- Business impact analyses that do not
- Identify critical business processes
- Identify supporting systems, maximum allowable downtime, recovery time objectives, or recovery point objectives
- Inadequate staff training
- Testing inadequacies
- Failure to demonstrate recovery capability
- Failure to test alternate site relocation, including connectivity tests
- Failure to test all critical systems at least annually
- Inadequate or infrequent annual reporting of test results to the bank’s board of directors, including the failure to provide timely information about
- Overall program status
- Testing and training results
- Lessons learned
- Test results against recovery time and point objectives
Business resumption concerns have the potential to go to the very heart of a community bank’s ability to serve its key stakeholders, including customers, vendors, and business partners, as well as its ability to maintain appropriate liquidity levels. Therefore, when a bank’s senior management reviews its business resumption program, bank management should make sure that there is a well-defined and comprehensive process incorporating appropriate real-world scenarios and corresponding response plans based on those scenarios. The process should transcend business resumption planning for just the IT function and embrace all lines of the bank’s business. In the final analysis, examiners need the bank to demonstrate that it has an appropriate recovery mechanism for the entire bank and has the wherewithal to maintain ongoing operations and support key stakeholders when a disruptive event occurs.
Back to top
- 1 See Supervision and Regulation letter 07-18, “FFIEC Guidance on Pandemic Planning,” available at www.federalreserve.gov/boarddocs/srletters/2007/SR0718.htm.
- 2 See the Federal Financial Institutions Examination Council (FFIEC) Cybersecurity Awareness website, available at www.ffiec.gov/cybersecurity.htm.
- 3 Bank senior management should not view business continuity and disaster recovery as one and the same. The goal of business continuity planning is to restore essential business processes. Disaster recovery is a subset of business continuity planning that focuses on bringing information systems back online.
- 4 While a business resumption examination is traditionally performed by information technology (IT) examiners, business resumption planning should extend beyond the bank’s IT area and include all bank functions and departments.
- 5 See the discussion of the Business Continuity Planning Process (page 3) in the FFIEC Business Continuity Planning IT Examination Handbook, available at http://ow.ly/STGbe.
- 6 See the discussion of the Business Impact Analysis (page 6) in the FFIEC Business Continuity Planning IT Examination Handbook, available at http://ow.ly/STGbe.
- 7 Recovery time objective is the amount of time it takes to recover from a disruptive event.
- 8 Recovery point objective is the acceptable amount of data loss measured in time that can be lost from a disruptive event.
- 9 Prioritization should reflect a continuum of disruptions. For example, if a rural bank is located near a railroad track, the bank should perform a risk assessment that would include a train derailment and chemical spill representing a low-probability/high-impact disruption in contrast to a temporary weather-related power outage representing a high-probability/low-impact disruption.
- 10 Note that some aspects of development and maintenance could be outsourced, such as IT and documentation generation and updating; however, the bank is better positioned to address other aspects, such as succession planning and the identification of critical personnel.
- 11 Note that this may indicate that a bank and not the servicer should perform the development and maintenance function.
- 12 See Appendix J, “Strengthening the Resilience of Outsourced Technology Services,” in the FFIEC Business Continuity Planning IT Examination Handbook, available at http://ow.ly/SUk9o.
- 13 See the FFIEC Cybersecurity Assessment Tool, which helps in the evaluation of the cyber-related risk profile and the maturity of the control environment for an institution, available at http://ow.ly/SUmAc.
- 14 See the discussion of Action Summary items in the FFIEC Business Continuity Planning IT Examination Handbook, available at http://ow.ly/STGbe.
- 15 These test methods are also commonly referred to as “structured walk-through test,” “simulation test,” “parallel test,” and “full-scale test,” respectively.