Disaster Recovery Plan - SpeedyWriters

Information Technology Risk Management

• Reading(s), from Gibson
•  Chapter 14: Mitigating Risk with a Disaster Recovery Plan
Book:
• Information systems security & assurance series
• Jones & Barlett Learning
• Managing Risk in Information Systems – Darril Gibson-second edition

“Disaster Recovery Plan” Please respond to the following:
• Read the Disaster Recovery Plan (DRP) below. Next, explain its relationship to MIT’s BCP listed below from the previous week’s discussion. Then, based on the established relationship, assess whether or not there is anything missing from the DRP. Provide a rationale for your response.
Bottom of Form

Please:
• List all references.
• Cover page is not needed.
• Number of pages needed: 4 + 1 full page summary-Total 5 pages
• Please use additional info (slides script) on this attachment when elaborating on this subject.

COMPANY

Disaster Recovery Plan (DRP) for [PRODUCT]

(Note: this DRP document is somewhat specific to SharePoint, but can be adapted for other products/technologies as
needed)

Plan and related Business Processes

Business Process
Feature
Relevant Technical Components



December 29, 2014

Table of Contents
1. Purpose and Objective 3
Scope 3
2. Dependencies 3
3. Disaster Recovery Strategies 5
4. Disaster Recovery Procedures 5
Response Phase 5
Resumption Phase 6
Data Center Recovery 6
Internal or External Dependency Recovery 8
Significant Network or Other Issue Recovery (Defined by quality of service guidelines) 8
Restoration Phase 9
Data Center Recovery 9
Internal or External Dependency Recovery 10
Significant Network or Other Issue Recovery (Defined by quality of service guidelines) 10
Appendix A: Disaster Recovery Contacts – Admin Contact List 12
Appendix B: Document Maintenance Responsibilities and Revision History 12
Appendix C: Server Farm Details 12
Appendix D: Glossary/Terms 12
1. Purpose and Objective

COMPANY developed this disaster recovery plan (DRP) to be used in the event of a significant disruption to the features listed in the table above. The goal of this plan is to outline the key recovery steps to be performed during and after a disruption to return to normal operations as soon as possible.

Scope

The scope of this DRP document addresses technical recovery only in the event of a significant disruption. The intent of the DRP is to be used in conjunction with the business continuity plan (BCP) COMPANY has. A DRP is a subset of the overall recovery process contained in the BCP. Plans for the recovery of people, infrastructure, and internal and external dependencies not directly relevant to the technical recovery outlined herein are included in the Business Continuity Plan and/or the Corporate Incident Response and Incident Management plans COMPANY has in place.

The link to the specific BCP document related to this DRP can be found here: LINK TO BCP

This disaster recovery plan provides:
◦ Guidelines for determining plan activation;
◦ Technical response flow and recovery strategy;
◦ Guidelines for recovery procedures;
◦ References to key Business Resumption Plans and technical dependencies;
◦ Rollback procedures that will be implemented to return to standard operating state;
◦ Checklists outlining considerations for escalation, incident management, and plan activation.

The specific objectives of this disaster recovery plan are to:
◦ Immediately mobilize a core group of leaders to assess the technical ramifications of a situation;
◦ Set technical priorities for the recovery team during the recovery period;
◦ Minimize the impact of the disruption to the impacted features and business groups;
◦ Stage the restoration of operations to full processing capabilities;
◦ Enable rollback operations once the disruption has been resolved if determined appropriate by the recovery team.

Within the recovery procedures there are significant dependencies between and supporting technical groups within and outside COMPANY. This plan is designed to identify the steps that are expected to take to coordinate with other groups / vendors to enable their own recovery. This plan is not intended to outline all the steps or recovery procedures that other departments need to take in the event of a disruption, or in the recovery from a disruption.

2. Dependencies

This section outlines the dependencies made during the development of this SharePoint disaster recovery plan. If and when needed the DR TEAM will coordinate with their partner groups as needed to enable recovery.
Dependency
Assumptions
User Interface / Rendering

Presentation components
• Users (end users, power users, administrators) are unable to access the system through any part of the instance (e.g. client or server side, web interface or downloaded application).
• Infrastructure and back-end services are still assumed to be active/running.
Business Intelligence / Reporting
Processing components
• The collection, logging, filtering, and delivery of reported information to end users is not functioning (with or without the user interface layer also being impacted).
• Standard backup processes (e.g. tape backups) are not impacted, but the active / passive or mirrored processes are not functioning.
• Specific types of disruptions could include components that process, match and transforms information from the other layers. This includes business transaction processing, report processing and data parsing.
Network Layers
Infrastructure components
• Connectivity to network resources is compromised and/or significant latency issues in the network exist that result in lowered performance in other layers.
• Assumption is that terminal connections, serially attached devices and inputs are still functional.
Storage Layer
Infrastructure components
• Loss of SAN, local area storage, or other storage component.
Database Layer
Database storage components
• Data within the data stores is compromised and is either inaccessible, corrupt, or unavailable
Hardware/Host Layer
Hardware components
• Physical components are unavailable or affected by a given event
Virtualizations (VM’s)
Virtual Layer
• Virtual components are unavailable
• Hardware and hosting services are accessible
Administration
Infrastructure Layer
• Support functions are disabled such as management services, backup services, and log transfer functions.
• Other services are presumed functional
Internal/External
Dependencies
• Interfaces and intersystem communications corrupt or compromised

In addition assumptions within the Business Continuity Plan for this work stream still apply.

3. Disaster Recovery Strategies

The overall DR strategy of xyz is summarized in the table below and documented in more detail in the supporting sections. These scenarios and strategies are consistent across the technical layers (user interface, reporting, etc.)

4. Disaster Recovery Procedures

A disaster recovery event can be broken out into three phases, the response, the resumption, and the restoration. These phases are also managed in parallel with any corresponding business continuity recovery procedures summarized in the business continuity plan.

Response Phase

The following are the activities, parties and items necessary for a DR response in this phase. Please note these procedures are the same regardless of the triggering event (e.g. whether caused by a Data Center disruption or other scenario).
Response Phase Recovery Procedures – All DR Event Scenarios

Step
Owner
Duration
Components
Identify issue, page on call / Designated Responsible
Individual (DR TEAM)
DR TEAM
x minutes
• Issue communicated / escalated
• Priority set
Identify the team members needed for recovery
DR TEAM
x minutes
Selection of core team members required for restoration phase from among the following groups:
• Operations

Establish a conference line for a bridge call to coordinate next steps
DR TEAM
or Ops
x minutes
Primary bridge line: NUMBER
Secondary bridge line: NUMBER
Alternate / backup communication tools: email, communicator
Communicate the specific recovery roles and determine which recovery strategy will be pursued.
DR TEAM
x minutes
• Documentation / tracking of timelines and next decisions
• Creation of disaster recovery event command center / “war room” as needed

This information is also summarized by feature in Appendix A: Disaster Recovery Contacts – Admin Contact List.

Resumption Phase

During the resumption phase, the steps taken to enable recovery will vary based on the type of issue. The procedures for each recovery scenario are summarized below.

Data Center Recovery

Full Data Center Failover

Step
Owner
Duration
Components
Initiate Failover
DR TEAM
TBD
• Restoration procedures identified
• Risks assessed for each procedure
• Coordination points between groups defined
• Issue communication process and triage efforts established
Complete Failover
DR TEAM
TBD
• Recovery steps executed, including handoffs between key dependencies
Test Recovery
DR TEAM
TBD
• Tests assigned and performed
• Results summarized and communicated to group
Failover deemed successful
DR TEAM
TBD


Below is a sample timeline for recovery actions associated with the failover the technical components between different data centers to provide geo-redundant operations. Coordination of recovery actions is crucial. A timeline is necessary in order to manage recovery between different groups and layers.

Reroute critical processes to alternate Data Center

Step
Owner
Duration
Components



Operate at deprecated service level – prioritize critical feeds

Step
Owner
Duration
Components



Take no action – monitor for Data Center recovery
This recovery procedure would only be the chosen alternative in the event no other options were available to (e.g. the cause and recovery of the Data Center is fully in the control of another department or vendor).
Step
Owner
Duration
Components
Track communication and
status with the core recovery team.
DR TEAM
As needed

Send out frequent updates to core stakeholders with the
status.
DR TEAM
As needed

Internal or External Dependency Recovery

Reroute operations to backup provider

Step
Owner
Duration
Components




Execute available recovery procedures

Step
Owner
Duration
Components
Inform other teams about
technical dependencies
DR TEAM
As needed

DR TEAM
As needed

Take no action – monitor status
This recovery procedure would only be the chosen alternative in the event no other options were available to (e.g. the cause and recovery of the disruption is fully in the control of another department or vendor).
Step
Owner
Duration
Components
Track communication and status with the core recovery
team.
DR TEAM
As needed
• Provide feedback about SharePoint service availability
Send out frequent updates to core stakeholders with the
status.
DR TEAM
As needed

Significant Network or Other Issue Recovery (Defined by quality of service guidelines)

Reroute operations to backup provider

Step
Owner
Duration
Components




Execute available recovery procedures

Step
Owner
Duration
Components
Inform other teams about technical dependencies
DR TEAM
As needed
• Hardware (CPU, Memory, Hard disk, Network requirements)

teams
DR TEAM
As needed

DR TEAM
As needed


Take no action – monitor status
This recovery procedure would only be the chosen alternative in the event no other options were available to (e.g. the cause and recovery of the internal or external dependency is fully in the control of another department or vendor).
Step
Owner
Duration
Components
Track communication and status with the core recovery
team.
DR TEAM
As needed
• Provide feedback about SharePoint service availability
Send out frequent updates to core stakeholders with the
status.
DR TEAM
As needed

Restoration Phase

During the restoration phase, the steps taken to enable recovery will vary based on the type of issue. The procedures for each recovery scenario are summarized below.

Data Center Recovery

Full Data Center Restoration

Step
Owner
Duration
Components
Determine whether failback to original Data Center will be
pursued
DR TEAM
TBD
• Restoration procedures determined
Original data center restored
DR TEAM
TBD
• Server Farm level recovery
Complete Failback
DR Team
TBD
• Failback steps executed, including handoffs between key dependencies
Test Failback
DR Team
TBD
• Tests assigned and performed
• Results summarized and communicated to group
• Issues (if any) communicated to group
Determine whether failback was successful
DR TEAM
TBD
• Declaration of successful failback and communication to stakeholder group.
• Disaster recovery procedures closed.
• Results summarized, post mortem performed, and DRP updated (as needed).
The following section contains steps for the restoration procedures.

Full Server Farm Recovery
This section describes the process for recovering from a farm-level failure, for a: three-tier server farm consisting of a database server, an application server that provides Index, Query, InfoPath Forms Services and Excel Calculation Services, and two front-end Web servers that service search queries and provide Web content.

Front-End Web Servers (Queries, Content)

Application Server (Excel Calculation Services, Indexing Services
Dedicated SQL Server

The following diagram describes the recovery process for a farm.
Full Farm Recovery Process
Prepare servers and install
Prepare to restore Restore backups
Redeploy customizations

SQL Server

1. Install Operating System and patches
Install SQL Server and patches

7a. Restore Office SharePoint Server or SQL Server backups, highest priority first.

Application Server
2. Install Operating System and patches
Install Office SharePoint Server (type: Complete)
Run SharePoint Products and Technologies Wizard to create Central Administration site, Configuration database

7b. Restore Office
4. Start Search services SharePoint Server
5. Recreate Web backups for Search,
applications other applications, highest priority first.

11. Optional. Reconfigure alternate access mappings

12. Restart timer jobs.
Front-End Web Server

3. Install Operating System

10. Reconfigure IIS settings

and patches
6. Run SharePoint 9. Optional. Set one or

Install IIS, ASP.NET, and the
Products and more front-end Web
13. (Optional) Redeploy

.NET framework 3.0
Technologies Wizard servers to serve queries
solutions and reactivate

Install Office SharePoint

features or restore 12 hive

Server (type: front-end Web)

files

Overview of a farm-level recovery Add Farm restore steps

Internal or External Dependency Recovery

Execute available recovery procedures

Step
Owner
Duration
Components

DR TEAM
As needed


DR TEAM
As needed

Take no action – monitor status
This recovery procedure would only be the chosen alternative in the event no other options were available to (e.g. the cause and recovery of the disruption is fully in the control of another department or vendor).
Step
Owner
Duration
Components
Track communication and status with the core recovery
team.
DR TEAM
As needed
• Provide feedback about SharePoint service availability
Send out frequent updates to core stakeholders with the
status.
DR TEAM
As needed

Significant Network or Other Issue Recovery (Defined by quality of service guidelines)

Execute available recovery procedures

Step
Owner
Duration
Components

DR TEAM
As needed

Take no action – monitor status
This recovery procedure would only be the chosen alternative in the event no other options were available to (e.g. the cause and recovery of the internal or external dependency is fully in the control of another department or vendor).
Step
Owner
Duration
Components
Track communication and status with the core recovery
team.
DR TEAM
As needed

Send out frequent updates to core stakeholders with the
status.
DR TEAM
As needed

Appendix A: Disaster Recovery Contacts – Admin Contact List

The critical team members who would be involved in recovery procedures for feature sets are summarized below.

Feature Name
Contact Lists

For the key internal and external dependencies identified, the following are the primary contacts.

Dependency Name
Contact Information

In addition the key BCP individuals are:


Appendix B: Document Maintenance Responsibilities and Revision History

This section identifies the individuals and their roles and responsibilities for maintaining this Disaster Recovery Plan.

Primary Disaster Recovery Plan document owner is:
Primary Designee:

Alternate Designee: [NAME]

Name of Person Updating Document
Date
Update Description
Version #
Approved By

Appendix C: Server Farm Details

Farm Configuration (status as of date) Table

Appendix D: Glossary/Terms

Standard Operating State: Production state where services are functioning at standard state levels. In contrast to recovery state operating levels, this can support business functions at minimum but deprecated levels.

Presentation Layer: Layer which users interact with. This typically encompasses systems that support the UI, manage rendering, and captures user interactions. User responses are parsed and system requests are passed for processing and data retrieval to the appropriate layer.

Processing Layer: System layer which processes and synthesizes user input, data output, and transactional operations within an application stack. Typically this layer processes data from the other layers. Typically these services are folded into the presentation and database layer, however for intensive applications; this is usually broken out into its own layer.

Database Layer: The database layer is where data typically resides in an application stack. Typically data is stored in a relational database such as SQL Server, Microsoft Access, or Oracle, but it can be stored as XML, raw data, or tables.
This layer typically is optimized for data querying, processing and retrieval.

Network Layer: The network layer is responsible for directing and managing traffic between physical hosts. It is typically an infrastructure layer and is usually outside the purview of most business units. This layer usually supports load balancing, geo-redundancy, and clustering.

Storage Layer: This is typically an infrastructure layer and provides data storage and access. In most environments this is usually regarded as SAN or NAS storage.

Hardware/Host Layer: This layer refers to the physical machines that all other layers are reliant upon. Depending on the organization, management of the physical layer can be performed by the stack owner or the purview of an infrastructure support group.

Virtualization Layer: In some environments virtual machines (VM’s) are used to partition/encapsulate a machine’s resources to behave as separate distinct hosts. The virtualization layer refers to these virtual machines.

Administrative Layer: The administrative layer encompasses the supporting technology components which provide access, administration, backups, and monitoring of the other layers.

MIT BUSINESS CONTINUITY PLAN

This is an external release of the MIT Business Continuity Plan.

For information on the plan or Business Continuity Planning at MIT, call Jerry Isaacson MIT Information Security Office at (617) 253-1440 or send e-mail to [email protected]

To Page the BCMT Duty Person:

Duty Person To just leave phone number To leave an 80 character message Number to call back dial: call and give PIN #
1

For recorded disaster recovery status reports and announcements during the emergency
call:

Table of Contents

Part I. Introduction

1Introduction to This Document 1 Part II. Design of the Plan 3
Overview of the Business Continuity Plan 3 Purpose 3
Assumptions 3

Development 4

Maintenance 4

Testing 4

Organization of Disaster Response and Recovery 4 Administrative Computing Steering Committee 4 Business Continuity Management Team 5 Business Continuity Management Team 5
Institute Support Teams: 6 Disaster Response 7
Disaster Detection and Determination 7 Disaster Notification 8
Initiation of the Institute’s Business Continuity Plan 8 Activation of a Designated Hot Site 8
Dissemination of Public Information 9 Disaster Recovery Strategy 9
Scope of the Business Continuity Plan 11
Category I Critical Functions 11 Category II Essential Functions 11 Category III – Necessary Functions 11 Category IV – Desirable Functions 11 Part III. Team Descriptions 12 Institute Support Teams 14
Business Continuity Management Team 14 Damage Assessment/Salvage 15
Campus Police 16

MIT News Office – Public Information 17 Insurance 19
Telecommunications 20

Part IV. Recovery Procedures 21 Notification List 21
To reach the BCMT Duty Person: 22

Business Continuity Management Team Coordinator 25 Damage Assessment/Salvage 26
Salvage Operations 27

Campus Police 28

MIT News Office – Public Information 29 Insurance Team 31
Telecommunications 32

Appendix A – Recovery Facilities 33
Emergency Operations Centers 33

Appendix B – Category I, II & III functions 34 Appendix C – Plan Distribution List 35 Business Continuity Management Team 37 BCMT Duty Person Procedures 38
GUIDE TO BCMT ACTIVATION 39
Part I. Introduction
Part I contains information about this document, which provides the written record of the Massachusetts Institute of Technology Business Continuity Plan.

Introduction to This Document

Planning for the business continuity of MIT in the aftermath of a disaster is a complex task. Preparation for, response to, and recovery from a disaster affecting the administrative functions of the Institute requires the cooperative efforts of many support organizations in partnership with the functional areas supporting the “business” of MIT. This document records the Plan that outlines and coordinates these efforts, reflecting the analyses by representatives from these organizations and by the MIT Information Security Officer, Gerald I. Isaacson.

For use in the event of a disaster, this document identifies the computer recovery facilities (hot sites and shell sites – see Page 33) that have been designated as backups if the functional areas are disabled.

How To Use This Document

Use this document to learn about the issues involved in planning for the continuity of the critical and essential business functions at MIT, as a checklist of preparation tasks, for training personnel, and for recovering from a disaster. This document is divided into four parts, as the table below describes.

Part Contents

I Information about the document itself.

II Design of the Plan that this document records, including information about the overall structure of business continuity planning at MIT.

III General responsibilities of the individual Institute Support Teams that together form the Business Continuity Management Team, emphasizing the function of each team and its preparation responsibilities.

IV Recovery actions for the Institute Support Teams and important checklists such as the notification list for a disaster and an inventory of resources required for the environment. [Note: If a “disaster” situation arises, Section IV of the Plan is the only section that needs to be referenced. It contains all of the procedures and support information for recovery.]

Audience

This document addresses several groups within the MIT central administration with differing levels and types of responsibilities for business continuity, as follows:
◦ Administrative Computing Steering Committee
◦ Business Continuity Management Team
◦ Institute Support Teams
◦ Functional Area Recovery Management (FARM) Teams

It should be emphasized that this document is addressed particularly to the members of the Business Continuity Management Team, since they have the responsibility of preparing for, responding to, and recovering from any disaster that impacts MIT. Part III of this document describes the composition of the Business Continuity Management Team in detail.

Distribution

As the written record of the Institute’s Business Continuity Plan, this document is distributed to each member of the Business Continuity Management Team, including members of the Institute Support Teams.( Appendix C – Distribution List Page -33)

It is also distributed to members of the Administrative Computing Steering Committee, FARM Team Coordinators, Information Systems Directors and others not primarily involved with the direct recover effort..
Part II. Design of the Plan
Part II describes the philosophy of business continuity planning at MIT generally, and the kind of analysis that produced this Plan. It also provides an overview of the functions of the Business Continuity Management Team in implementing this Plan.

Overview of the Business Continuity Plan

Purpose

MIT increasingly depends on computer-supported information processing and telecommunications. This dependency will continue to grow with the trend toward decentralizing information technology to individual organizations within MIT administration and throughout the campus.

The increasing dependency on computers and telecommunications for operational support poses the risk that a lengthy loss of these capabilities could seriously affect the overall performance of the Institute. A risk analysis which was conducted identified several systems as belonging to risk Category I, comprising those functions whose loss could cause a major impact to the Institute within hours. It also categorized a majority of Institute functions as Essential, or Category II – requiring processing support within week(s) of an outage. This risk assessment process will be repeated on a regular basis to ensure that changes to our processing and environment are reflected in recovery planning.

MIT administration recognizes the low probability of severe damage to data processing telecommunications or support services capabilities that support the Institute. Nevertheless, because of the potential impact to MIT, a plan for reducing the risk of damage from a disaster however unlikely is vital. The Institute’s Business Continuity Plan is designed to reduce the risk to an acceptable level by ensuring the restoration of Critical processing within hours, and all essential production (Category II processing) within week(s) of the outage.

The Plan identifies the critical functions of MIT and the resources required to support them. The Plan provides guidelines for ensuring that needed personnel and resources are available for both disaster preparation and response and that the proper steps will be carried out to permit the timely restoration of services.

This Business Continuity Plan specifies the responsibilities of the Business Continuity Management Team, whose mission is to establish Institute level procedures to ensure the continuity of MIT’s business functions. In the event of a disaster affecting any of the functional areas, the Business Continuity Management Team serves as liaison between the functional area(s) affected and other Institute organizations providing major services. These services include the support provided by Physical Plant, security provided by the Campus Police, and public information dissemination handled by the MIT News Office, among others.

Assumptions
The Plan is predicated on the validity of the following three assumptions:

◦ The situation that causes the disaster is localized to the data processing facility of Operations and Systems in ; the building or space housing the functional area; or to the communication systems and networks that support the functional area. It is not a general disaster, such as an earthquake or the “Blizzard of ’78,” affecting a major portion of metropolitan Boston.

It should be noted however, that the Plan will still be functional and effective even in an area- wide disaster. Even though the basic priorities for restoration of essential services to the community will normally take precedence over the recovery of an individual organization, the Institute’s Business Continuity Plan can still provide for a more expeditious restoration of our resources for supporting key functions.

◦ The Plan is based on the availability of the hot sites or the back-up resources, as described in Part IV. The accessibility of these, or equivalent back-up resources, is a critical requirement.
◦ The Plan is a document that reflects the changing environment and requirements of MIT. Therefore, the Plan requires the continued allocation of resources to maintain it and to keep it in a constant state of readiness.

Development

MIT’s Information Security Officer, with assistance from key Institute support areas, is responsible for developing the Institute’s Business Continuity Plan. Development and support of individual FARM Team Plans are the responsibility of the functional area planning for recovery.

Maintenance

Ensuring that the Plan reflects ongoing changes to resources is crucial. This task includes updating the Plan and revising this document to reflect updates; testing the updated Plan; and training personnel. The Business Continuity Management Team Coordinators are responsible for this comprehensive maintenance task.

Quarterly, the Business Continuity Management Team Coordinators ensures that the Plan undergoes a more formal review to confirm the incorporation of all changes since the prior quarter. Annually, the Business Continuity Management Team Coordinators initiates a complete review of the Plan, which could result in major revisions to this document. These revisions will be distributed to all authorized personnel, who exchange their old plans for the newly revised plans. At that time the Coordinators will provide an annual status report on continuity planning to the Administrative Computing Steering Committee.

Testing

Testing the Business Continuity Plan is an essential element of preparedness. Partial tests of individual components and recovery plans of specific FARM Teams will be carried out on a
regular basis. A comprehensive exercise of our continuity capabilities and support by our designated recovery facilities will be performed on an annual basis.

Organization of Disaster Response and Recovery
The organizational backbone of business continuity planning at MIT is the Business Continuity Management Team. In the event of a disaster affecting an MIT organization or its resources, the Business Continuity Management Team will respond in accordance with this Plan and will initiate specific actions for recovery. The Business Continuity Management Team is called into action under the authority of the Administrative Computing Steering Committee which has the responsibility for approving actions regarding Business Continuity Planning at MIT.

Administrative Computing Steering Committee

◦ Senior Vice President, Chairman of the Committee. Manages and directs the recovery effort. Provides liaison with senior MIT management for reporting the status of the recovery operation.
◦ Vice President for Financial Operations. Provides liaison with the Committee for support of critical business functions affected by the disaster.
◦ Vice President for Information Systems. Coordinates all data processing and telecommunications systems recovery, including operational restoration of Building O&S and operations at the designated hot site.
◦ Vice President for Research Provides liaison with the Committee for support of critical business functions affected by the disaster.
◦ Vice President for Resource Development Provides liaison with the Committee for support of critical business functions affected by the disaster.
◦ Executive Vice President Alumni Association Provides liaison with the Committee for support of critical business functions affected by the disaster.
◦ Assistant to Provost Provides liaison with the Committee for support of critical business functions affected by the disaster.

Business Continuity Management Team

For the business continuity of MIT systems, two organizations are primary: the Business Continuity Management Team, with its Institute Support Teams, and the Functional Area Recovery Management (FARM) Team for the area affected. In the event of a disaster, the BCMT provides general support, while the FARM Team is concerned with resources and tasks integral to running the specific functional area.

This section provides general information about the organization of recovery efforts and the role of the Business Continuity Management Team. Part III of this document describes the Business Continuity Management Team and the responsibilities of each Institute Support Team in detail.

Business Continuity Management Team.
◦ The Business Continuity Management Team is composed of upper-level managers in MIT administration. The following is a list of each position on the Business Continuity Management Team, and a brief overview of each member’s responsibilities:
◦ Information Security Officer. As Co-Coordinator of the Business Continuity Management Team, with the Coordinator of the O&S -FARM team, provides liaison between the Institute’s operational and management teams and the FARM teams in affected areas. Also responsible for ongoing maintenance, training and testing of the Institute’s Business Continuity Plan. Coordinates the Institute Support Teams under the auspices of the Business Continuity Management Team.
◦ Director, Operations and Systems. Coordinates support for data processing resources at the main data center and the designated recovery sites.
◦ Director, Telecommunications Systems. Provides alternate voice and data communications capability in the event normal telecommunication lines and equipment are disrupted by the disaster. Evaluates the requirements and selects appropriate means of backing up the MIT telecommunications network.
◦ Chief, Campus Police. Provides for physical security and emergency support to affected areas and for notification mechanisms for problems that are or could be disasters. Extends a security perimeter around the functional area affected by the disaster.
◦ Director, Physical Plant. Coordinates all services for the restoration of plumbing, electrical, and other support systems as well as structural integrity. Assesses damage and makes a prognosis for occupancy of the structure affected by the disaster.
◦ Director of Insurance and Legal Affairs. Provides liaison to insurance carriers and claims adjusters. Coordinates insurance program with continuity planning programs.
◦ Director, MIT News Office. Communicates with the news media, public, staff, faculty, and student body who are not involved in the recovery operation.
◦ Personnel Department. Provides support for human resources elements of recovery and staff notification through the emergency broadcast service.
◦ Director, Distributed Computing & Network Services. Provides network support for Administrative and Academic Computing and other distributed services and networks.
◦ Assistant to the Vice President, for Information Systems. Represents the Office of the President. Liaison to FARM Teams in the President’s Office.
◦ Associate Comptroller, Comptroller’s Accounting Office . Represents the Vice President for Financial Operations. Liaison to Financial Operations FARM Teams.
◦ Manager, Audit Division. Provides audit support during the emergency. Makes recommendations on changes to the normal control procedures necessitated by the recovery process.
◦ Safety Office – Coordinates risk reduction and avoidance activities and emergency response with the BCMT
◦ Emergency Response Team – This unit, headed by the Physical Plant Mechanical Engineering Manager, provides the initial response to the majority of campus emergencies.

Institute Support Teams:

Under the overall direction of the Business Continuity Management Team, support is provided to assist a functional area’s recovery by Institute Support Teams. These teams, described below,
work in conjunction with the FARM Team of the area affected by the problem condition to restore services and provide assistance at the Institute level. In many cases, the organizations comprising these support teams have as their normal responsibility the provision of these support services. This support is generally documented in a procedures manual for the organization. The Business Continuity Plan is an adjunct to that documentation and highlights, in particular, the interfaces between the campus level service and the individual FARM Team operations requirements. In cases where the documentation in this Plan and the organization’s documents differ, the organization’s documentation has precedence.

◦ · Damage Assessment/Salvage Team. Headed by the Administrative Officer for Physical Plant and activated during the initial stage of an emergency, the team reports directly to the Business Continuity Management Team, evaluates the initial status of the damaged functional area, and estimates both the time to reoccupy the facility and the salvageability of the remaining equipment. This team draws members from the Physical Plant Office, from Operations and Systems, Telecommunications Systems, Distributed Computing & Network Services and from the FARM team of the affected area as well as appropriate vendors supporting our environment.
◦ Following the assessment of damage, the team is responsible for salvaging equipment, data and supplies following a disaster; identifying which resources remain; and determining their future utilization in rebuilding the data center and recovery from the disaster. The members of the Damage Assessment Team become the Salvage Team
◦ Transportation Team. A temporary Institute Support Team headed jointly by the Computer Operations Manager in Operations and Systems and by the Associate Director of Operations for Physical Plant, responsible for transporting resources personnel, equipment, and materials to back-up sites as necessary. This team draws members from two organizations: Information Systems personnel who normally operate the shuttle bus between and Physical Plant personnel who normally transport heavy equipment within the Institute.
◦ Public Information The interface with the media, the general public and faculty, staff and students who are not participating in the recovery process is handled by the MIT News Office, working closely with the Personnel Department.
◦ Telecommunications Team Headed by the Director of the Information Systems Telecommunications Department, is responsible for establishing voice and data communications between the affected site and the remainder of the campus.

Disaster Response

This section describes six required responses to a disaster, or to a problem that could evolve into a disaster:

1. Detect and determine a disaster condition

2. Notify persons responsible for recovery

3. Initiate the Institute’s Business Continuity Plan
4. Activate the designated hot site

5. Disseminate Public Information

6. Provide support services to aid recovery

Each subsection below identifies the organization(s) and/or position(s) responsible for each of these six responses.

Disaster Detection and Determination

The detection of an event which could result in a disaster affecting information processing systems at MIT is the responsibility of Physical Plant Operations (PPO), Campus Police, Information Systems, or whoever first discovers or receives information about an emergency situation developing in one of the functional areas , Building other building on campus housing major information processing systems or about the communications lines between these buildings.

Disaster Notification

PPO will follow existing procedures and notify the individuals who are acting as the Business Continuity Management Team Duty Persons (DP)). The DP on call will monitor the evolving situation and, if appropriate, will then notify the Business Continuity Management Team representative based upon a predefined set of notification parameters. (Page – 22)

When a situation occurs that could result interruption of processing of major information processing systems of networks on campus, the following people must be notified:

• Normally, Physical Plant Operations and /or the Campus Police receive the initial notice through their alarm monitoring capabilities. If the problem does not activate a normal alarm system, immediately notify these two areas.

• Chairman of the Administrative Computing Steering Committee

• Vice President for Information Systems

• The Business Continuity Management Team Coordinator (Information Security Officer)

• The Operations and Systems FARM Team Coordinator

• The Telecommunications and Distributed Computing & Network Services FARM Team Coordinators (if the situation affects the data or voice transmission lines or facilities)

Initiation of the Institute’s Business Continuity Plan
Initiation of this Plan is the responsibility of the Business Continuity Management Team Coordinator or any member of the Business Continuity Management Team or the Administrative Computing Steering Committee.

Activation of a Designated Hot Site

The responsibility for activating any of the designated hot sites or back-up resources is delegated to the Vice President for Information Systems. In the absence of the Vice President, responsibility reverts to the Director of Information Systems Operations & Systems or the Coordinator of the O&S Functional Area Recovery Management Team. Within hours of the occurrence, the Vice President for Information Systems, or alternate, determines the prognosis for recovery of the damaged functional area through consultation with the Information Security Officer and the Damage Assessment Team, headed by Physical Plant, which also includes representatives from Operations and Systems, Telecommunications Systems and the functional areas affected.

If the estimated occupancy or recovery of the damaged functional area cannot be accomplished within hours, the usual occupants of the designated back-up site are notified of the intention to occupy their facility.

Dissemination of Public Information

The Director of the MIT News Office is responsible for directing all meetings and discussions with the news media and the public, and in conjunction with the Personnel Department, with MIT personnel not actively participating in the recovery operation. In the absence of the MIT News Office representative, the responsibility reverts to the senior official present at the scene.

Recovery Status Information Number (617) has been established as a voice mail information number for posting recovery status and information notices. All reports will be placed by the Continuity Planning Coordinators or the Telecommunication FARM team leader.

Provision of Support Services to Aid Recovery

During and following a disaster, Institute Support Teams, as described on page 14, are responsible for aiding the FARM Teams. They operate under the direction of the Business Continuity Management Team through the Recovery Coordinator (the Information Security Officer).

Disaster Recovery Strategy

The disaster recovery strategy explained below pertains specifically to a disaster disabling the main data center. This functional area provides mainframe computer and major server support to MIT’s administrative applications. Especially at risk are the critical applications those designated as Category I (see below) systems. The O&S FARM Team Plan provides for recovering the capacity to support these critical applications within hours. Summarizing the provisions of the O&S Plan, subsections below explain the context in which the Institute’s Business Continuity
Plan operates. The Business Continuity Plan complements the strategies for restoring the data processing capabilities normally provided by Operations & Systems.

This section addresses three phases of disaster recovery:

• Emergency

• Backup

• Recovery

Strategies for accomplishing each of these phases are described below. It should be noted that the subsection describing the emergency phase applies equally to a disaster affecting the Adminstration Building or other building on campus, the functional area that provides support for the maintenance of the critical system.

Emergency Phase

The emergency phase begins with the initial response to a disaster. During this phase, the existing emergency plans and procedures of Campus Police and Physical Plant direct efforts to protect life and property, the primary goal of initial response. Security over the area is established as local support services such as the Police and Fire Departments are enlisted through existing mechanisms. The BCMT Duty Person is alerted by pager and begins to monitor the situation.

If the emergency situation appears to affect the main data center (or other critical facility or service), either through damage to data processing or support facilities, or if access to the facility is prohibited, the Duty Person will closely monitor the event, notifying BCMT personnel as required to assist in damage assessment. Once access to the facility is permitted, an assessment of the damage is made to determine the estimated length of the outage. If access to the facility is precluded, then the estimate includes the time until the effect of the disaster on the facility can be evaluated.

If the estimated outage is less than hours, recovery will be initiated under normal Information Systems operational recovery procedures. If the outage is estimated to be longer than hours, then the Duty Person activates the BCMT, which in turn notifies the Chairman of the Administrative Computing Steering Committee and Vice President for Information Systems and the Business Continuity Plan is activated. The recovery process then moves into the back-up phase.

The Business Continuity Management Team remains active until recovery is complete to ensure that the Institute will be ready in the event the situation changes.

Back-up Phase
The back-up phase begins with the initiation of the appropriate FARM Team Plan(s) for outages enduring longer than hours. In the initial stage of the back-up phase, the goal is to resume processing critical applications. Processing will resume either at the main data center or at the designated hot site, depending on the results of the assessment of damage to equipment and the physical structure of the building.

In the back-up phase , the initial hot site must support critical (Category I) applications for up to
weeks and as many Category II applications as resources and time permit. During this period, processing of these systems resumes, possibly in a degraded mode, up to the capacity of the hot site. Within this -week period, the main data center will be returned to full operational status if possible.

However, if the damaged area requires a longer period of reconstruction, then the second stage of back-up commences. During the second stage, a shell facility (a pre-engineered temporary processing facility that we have contracted to use for this purpose) is assembled on the parking lot and equipment installed to provide for processing all applications until a permanent site is ready. See Page 33 for a list of the designated recovery sites.

Recovery Phase

The time required for recovery of the functional area and the eventual restoration of normal processing depends on the damage caused by the disaster. The time frame for recovery can vary from several days to several months. In either case, the recovery process begins immediately after the disaster and takes place in parallel with back-up operations at the designated hot site. The primary goal is to restore normal operations as soon as possible.

Scope of the Business Continuity Plan
The object of this Plan is to restore critical (Category I) systems within hours, and Essential (Category II) systems within week(s) of a disaster that disables any functional area and/or essential equipment supporting the systems or functions in that area.

The initial Risk Assessment of the computer applications that support MIT administration assigned systems to Category I Critical. This risk category identifies applications that have the highest priority and must be restored within hours of a disaster disabling a functional area. Specifically, each function of these systems was evaluated and allocated a place in one of four risk categories, as described below.

Category I – Critical Functions Category II – Essential Functions Category III – Necessary Functions Category IV – Desirable Functions
Note: Category IV functions are important to MIT administrative processing, but due to their nature, the frequency they are run and other factors, they can be suspended for the duration of the emergency.

The administrative systems in Categories I – IV are those that provide Institute wide services. There are many departmental and laboratory systems as well as non-information processing systems (such as ) that are also either essential for the Institute or the local area(s) they support. Recovery for these systems too must be based upon an assessment of the impact of their loss and the cost of their recovery. See the Departmental FARM Team Plan document for further information on assessing risk at the departmental level.
Part III. Team Descriptions
Part III describes the organization and responsibilities of the Business Continuity Management Team. Composed of sub-teams (the Institute Support Teams), the Business Continuity Management Team as a whole plans and implements the responses and recovery actions in the event of a disaster disabling either a functional area, Central Administration or the main data center. It’s primary role is to provide Institute level support services to any functional area affected by the problem.

• Information Security Officer. As Business Continuity Management Team Co-coordinator, provides liaison between the Institute’s operational and management teams and the FARM teams in affected areas. Also responsible for ongoing maintenance, training and testing of the Business Continuity Plan. Coordinates the Institute Support Teams under the auspices of the Business Continuity Management Team. The Co-coordinator of the BCMT is the Coordinator of the O&S FARM Team, who will take responsibility for recovery in the absence of the Information Security Officer.

• Director, Operations and Systems. Provides for support for data processing resources with primary responsibility for restoration for O&S processing. Recovery plans for the computing facilities are the responsibility of the Coordinator of the O&S FARM Team and are described in the O&S FARM Team plan

• Director, Telecommunications Systems. Provides alternate voice and data communications capability in the event normal telecommunication lines and equipment are disrupted by the disaster. Evaluates the requirements and selects appropriate means of backing up the MIT telecommunications network. Recovery plans for the primary 5ESS telephone switching equipment in and satellite facilities in other buildings on campus are described in the Telecommunications FARM Team plan.

• Chief, Campus Police. Provides for physical security and emergency support to affected areas and for notification mechanisms for problems that are or could be disasters. Extends a security perimeter around the functional area affected by the disaster. Provides coordination with public emergency services (Cambridge Police, etc.) as required.

• Director, Physical Plant. Coordinates all services for the restoration of plumbing and electrical systems and structural integrity. Assesses damage and makes a prognosis for occupancy of the structure affected by the disaster.

◦ Director, Safety Office. Coordinates safety and hazardous materials related issues with other organizations involved in recovery planning and response as well as governmental and other emergency services.

Director, Personnel Department. Coordinates all activities of the recovery process with key attention to the personnel aspects of the situation. This includes releasing staff from areas
affected, initiating emergency notification systems and working with the MIT News office on dissemination of information about the recovery effort

◦ Director, Distributed Computing & Network Services. Coordinates all services in support of the restoration of network services and support facilities. This icludes support for Athena communications services and external network service support.

• Director, MIT News Office. Communicates with the news media, public, staff, faculty, and student body who are not involved in the recovery operation.

• Assistant to the Vice President, for Information Systems. Represents the Office of the President.

• Associate Comptroller, Comptroller’s Accounting Office. Represents the Vice President for Financial Operations.

• Audit Manager, Audit Division Provide consultation on compensating controls and suggestions on maintaining the appropriate level of controls during the recovery process.

Institute Support Teams
Business Continuity Management Team

1. Function

To oversee the development, maintenance and testing of recovery plans addressing all Category I and II business functions. In the event of a “disaster” to manage the backup and recovery efforts and facilitate the support for key business functions and restoration of normal activities.

2. Organization

The BCMT is co-chaired by the MIT Information Security Officer and the Coordinator of the O&S FARM Team, who serves in the absence of the Security Officer. The Team is composed of key management personnel from each of the areas involved in the recovery process.

3. Interfaces

The team interfaces with and is responsible for all business continuity plans and planning personnel at MIT.

Preparation Requirements

On a quarterly basis, the team will meet to review FARM Team plans that have been completed in the last quarter.
On an annual basis, the Team will review the overall status of the recovery plan, and report on this status through the Information Security Officer, to the Administrative Computing Steering Committee.

Individual Team members will prepare recovery procedures for their assigned areas of responsibility at MIT. They will ensure that changes to their procedures are reflected in any interfacing procedures.

The BCMT will ensure that continuing levels of support are available for the FARM Teams that require it.

The BCMT will also review and approve FARM Team plans as they are submitted, re-evaluate the criticality of MIT operating functions at regular intervals and provide for awareness and training in recovery planning. They will also participate in emergency preparedness drills initiated by the Safety Office or other appropriate campus organizations.

Damage Assessment/Salvage

1. Function

To report to the Business Continuity Management Team (BCMT), within two to four hours after access to the facility is permitted, on the extent of the damage to the affected site, and to make recommendations to the BCMT regarding possible reactivation and/or relocation of data center or user operations. Existing Physical Plant emergency procedures are documented in a manual known as the “Black Book” maintained by Physical Plant. The Business Continuity Plan procedures supplement, and are subordinate to those in the Black Book, which takes precedence in the case of any difference. Following assessment of the damage, the team is then responsible for salvage operations in the area affected.

2. Organization

Headed by the Administrative Officer for Physical Plant and activated during the initial stage of an emergency, the team reports directly to the Business Continuity Management Team, evaluates the initial status of the damaged functional area, and estimates the time to reoccupy the facility and the salvageability of the remaining equipment. During an emergency situation, the individual designated in the Black Book will take operational responsibility for implementation of damage assessment. This team draws members from the Physical Plant Office, from Operations and Systems, and from the FARM team of the affected area. Following assessment, the team is responsible for salvaging equipment, data, and supplies following a disaster; identifying which resources remain; and determining their future utilization in rebuilding the data center and recovery from the disaster.

3. Interface

The Damage Assessment/Salvage Team will interface with other Physical Plant operations groups, the Campus Police and Information Systems operations functions, including vendor and
insurance representatives, to keep abreast of new equipment, physical structures, and other factors relating to recovery.

4. Preparation Requirements

Identification of all equipment to be kept current. A quarterly report will be stored off-site. The listing will show all current information, such as engineering change levels, book value, lessor, etc. Configuration diagrams will also be available. Emergency equipment, including portable lighting, hard hats, boots, portable two-way radios, floor plans and equipment layouts will be maintained by Physical Plant.

A listing of all vendor sales personnel, customer engineers and regional sales and engineering offices is to be kept and reviewed quarterly. Names, addresses and phone numbers (normal, home, and emergency) are also to be kept.

Campus Police

1. Function

To provide for all facets of a positive security and safety posture, to assure that proper protection and safeguards are afforded all MIT employees and Institute assets at both the damaged and backup sites.

2. Organization

The team will consist of the Campus Police Department Supervisor and appropriate support staff. The team will report through the Chief who is a member of the Business Continuity Management Team.

3. Interfaces

The Campus Police Team will interface with the following teams or organizational units, relative to security and safety requirements:

Personnel Physical Plant Safety office
Environmental Medical Services MIT News office
Other appropriate departments as required
4. Preparation Requirements

Provide emergency medical services, if necessary.

Identify the number of Campus Police personnel needed to provide physical security protection of both the damaged and backup sites.

Identify the type of equipment needed by Campus Police personnel in the performance of their assigned duties.

Coordinate and arrange for additional security equipment and manpower, as applicable, if needed.

Identify and provide security protection required for the transport of confidential information to and from both off-site and backup sites. Coordinate with the appropriate MIT Department.

Periodically review the level of security needed at both the damaged and backup sites.

MIT News Office – Public Information

1. Function

The most difficult time to maintain good public relations is when there is an accident or emergency. Public relations planning is required so that when an emergency arises, inquiries from the news media, friends and relatives of staff, faculty, and students can be handled effectively. While we cannot expect to turn a bad situation into a good one, we can assist in making sure facts presented to the public are accurate and as positive as possible given the situation.

It is in our best interest to cooperate with the media as much as possible, so that they will not be forced to resort to unreliable sources to get information that could be untrue and more damaging to the Institute than the facts.

Therefore, it is the policy of MIT in time of emergency, to:

Have the MIT News Office serve as the authorized spokesperson for the Institute. All public information must be coordinated and disseminated by their staff.

Refrain from releasing information on personnel casualties until families have been notified. Once families have been notified, names of those personnel should be released quickly to alleviate the fears of relatives of others.

Provide factual information to the press and authorities as quickly as facts have been verified, and use every means of communications available to offset rumors and misstatements.
Avoid speculating on anything that is not positively verified, including cause of accident, damage estimates, losses, etc. (Fire Officials normally release their own damage estimates.)

Emphasize positive steps taken by the Institute to handle the emergency and its effects.

Situations calling for implementation of the Emergency Public Information Plan may include, but are not limited to:

Systems malfunctions disrupting the normal course of operations. Accidents, particularly when personal injury results.
Natural disasters, such as fires, floods, tornadoes and explosions. Civil disorders, such as riots and sabotage.
Executive death.

Scandal, including embezzlement and misuse of funds. Major litigation initiated by or against the Institute.
2. Organization

The Director of the MIT News Office, a member of the Business Continuity Management Team, will act as the Public Information Officer for the Institute. The News Office alternates are listed in Appendix A. In their absence the responsibility will revert to the Senior Manager on the scene.

3. Interfaces

The MIT News Office will be the interface between MIT and the public or news media. Copies of all status reports to the Business Continuity Management Team or Administrative Computing Steering Committee will be forwarded to the Public Information Officer for potential value in information distribution for good public relations. They will work with the Personnel Department in dissemination of information to staff.

4. Preparation Requirements

Existing relationships with local media will be utilized to notify the public of emergency and recovery status. The Public Information Officer will maintain up-to-date contact information for the media and other required parties.

A facility will be identified to be used as a press room. Arrangements will be made to provide the necessary equipment and support services for the press. Coordination with the Telecommunications Team for additional voice communication, if required, will also be made.
Insurance

1. Function

To provide for all facets of insurance coverage before and after a disaster and to ensure that the recovery action is taken in such a way as to assure a prompt and fair recovery from our insurance carriers.

2. Organization

The team will consist of the Director of Insurance and Legal Affairs and required staff and insurance carrier personnel. The team reports through the Business Continuity Management Team, of which it is a member.

3. Interfaces

The Insurance Team will interface with the following teams, relative to insurance matters: MIT News Office
Campus Police

Damage Assessment/Salvage Information Systems Operations Appropriate FARM Teams
This team will be activated upon the initial notification of a disaster.

4. Preparation Requirements

Determine needs for insurance coverage. Identify the coverage required for both hardware, media, media recovery, liability and extra expense.

Prepare procedure outlining recommended steps to be followed by Damage Assessment/Salvage Team during initial stage of disaster (Appendix A)

List appropriate contacts in (Appendix B).

Arrange for availability of both still and video recording equipment to record the damage.

Ensure that an equipment inventory is available, to include model and serial number of all devices.
Evaluate all new products and services offered by MIT for potential liability in the event of a disaster.

Telecommunications

1. Function

To provide voice and data communications to support critical functions. Restore damaged lines and equipment.

2. Organization

The team will consist of appropriate Telecommunications Systems staff. Telecommunications Systems will also coordinate with and supervise outside contractors as necessary. The team will report through the Director of Telecommunications Systems, who is a member of the Business Continuity Management Team.

3. Interfaces

The Telecommunications Systems team will interface with the following teams or organizational units, relative to telecommunications requirements:

Physical Plant Campus Police
Distributed Computing & Network Services

Other Information Systems departments as necessary

Other MIT departments requiring emergency telecommunications Outside contractors and service providers as necessary
4. Preparation Requirements

Provide critical voice and data communications services in the event that normal telecommunications lines and equipment are disrupted or relocation of personnel is necessary.

Consult with outside contractors and service providers to ensure that replacement equipment and materials are available for timely delivery and installation.

Utilize available resources, such as the MIT Cable Television network and voice mail system, to broadcast information relevant to the disaster.
Part IV. Recovery Procedures
Notification List
This appendix contains the names and telephone numbers of managers and personnel who must be notified in the event of a disaster. The Business Continuity Management Team Coordinator is responsible for keeping this notification list up-to-date.

Administrative Computing Steering Committee Chairman
Members

Business Continuity Management Team

Two individuals are assigned responsibility for the interface with other campus organizations, such as Physical Plant Operations, to monitor emergencies as they occur. These Early Warning Duty people are then responsible for activation of the full Business Continuity Management Team and necessary Functional Area Recovery Management Teams.

The BCMT Duty People are equipped with Pagers, activated either by Physical Plant Operations or they can be paged directly.

In addition, each Duty Person is equipped with a cellular phone for emergency use.

To reach the BCMT Duty Person:

By Pager:

Duty Person To leave phone number To leave an 80 character text Number call: message call:

and give PIN # of pager 1
2

By Cellular Phone:

2
Note: these numbers are to be used only in emergencies or for testing.

The people on duty will monitor the situation and determine if it has the potential to impact our processing ability. [See Duty Person procedure for details]

Coordinators Members
I/S Operations & Systems Telecommunications Campus Police
MIT News Office – Public Information

Insurance Physical Plant:
Emergency Response Team Operations Center
Safety Office

President’s Office

Comptrollers Accounting Office Personnel Office
Distributed Computing & Network Services

BCMT Liason Housing: Nuclear Reactor
Plasma Fusion Lab Medical Department
FARM Team Coordinators Bursar’s Office Category
Financial Planning & Management Category Freshman Admissions Category
Operations & Systems Category Payroll Category
Physical Plant Category Property Office Category Purchasing & Stores Category Registrar’s Office Category Resource Development Category
Technology Licensing Office Category Telecommunications Category
Business Continuity Management Team Coordinator

This appendix contains instructions to the Business Continuity Management Team Coordinators for overseeing disaster response and recovery efforts.

Action Procedures Player Action
Coordinator Ensure entire Business Continuity Management Team (BCMT) has been notified. Then notify Vice President for Information Systems and Chairman of Administrative Computing Steering Committee.

Coordinator Activate the Emergency Operations Center (See Page 33) and notify staff to meet there.

Coordinator Meet with Damage Assessment Team to review their findings and present results to BCMT.

Coordinator Present recommendations to BCMT for next steps in recovery effort.
Coordinator Begin notification of all recovery teams. Check to ensure all recovery participants have been notified.

Coordinator Monitor the activities of the recovery teams. Assist them as required in their recovery efforts.

Coordinator Report to BCMT on a regular basis on the status of recovery activities. Report to Administrative Computing Steering Committee as appropriate on recovery status.

Coordinator On an hourly basis, or other appropriate interval, update the Recovery Status information message on .

Damage Assessment/Salvage

This appendix contains instructions to the Damage Assessment/Salvage Team for disaster response and recovery efforts.

Action Procedures Player Action
Building Services Notify team members, and vendors to report to the site for initial damage assessment and clean-up.

Physical Plant AO Notify insurance representative

Operations Center Issue Work Orders and call appropriate personnel.

Team Leader Request permission to enter site from Fire Department (if required).

Take a service representative from each of the appropriate vendors, the insurance claims representative and appropriate Physical Plant and Information Systems personnel into the site.

Team Members Review and assess the damage to the facility. List all equipment and the extent of damage. List damage to all support systems (power, A/C, fire suppression, communications, etc.).

Team Leader Notify the BCMT as to the severity of the damage and what can potentially be salvaged.

Team Leader Notify the BCMT if the area be restored to the required level of operational capability in the required time frame.

Salvage Operations

Player Action
Team Leader Initiate the Emergency Notification List and have all members report to the Staging Area.

Salvage Team Have the Building Services Supervisor determine which equipment and furniture can be salvaged. Photograph all damaged areas as soon as possible for potential insurance claims.

Salvage Team Important ** Prior to performing any salvage operation contact Insurance Team to coordinate with possible insurance claims requirements and appraisals.

Have the Physical Plant Supervisor and staff start salvaging any furniture and equipment.

Based upon advice from Insurance Team and customer engineering, contact computer hardware refurbishers regarding reconditioning of damaged equipment

Team Leader Meet with the Business Continuity Management Team Coordinator to provide status on salvage operations.

Configuration List

A sample of the configuration and full equipment inventory report from the Fixed Asset Control Systems or other automated equipment inventories should be inserted here. The Continuity Plan Masters in off-site storage will contain the full listing.

Blueprints

Complete sets of blueprints of the buildings housing critical processing and the data center are maintained at [ ] and in off-site storage.

Campus Police

This appendix contains instructions to the Campus Police for disaster response and recovery efforts.

Action Procedures Player Action
Campus Police Duty Sgt. An MIT Police Case Report will be completed upon stabilization of the disaster situation. As per standard police procedure, this report will detail the names of all victims, witnesses, injuries, facility damage description, etc., as well as list all notifications

Campus Police Duty Sgt. Initiate the notification listing of appropriate Campus Police Department Command Staff and personnel (App. A)
Campus Police Day/Night Notify the Business Continuity Management Team if the emergency affects Data Processing or Telecommunications operations in any way.

Campus Police Duty Sgt. Assign Campus Police personnel to both the damaged and backup sites, as required.

Campus Police Duty Sgt. Ensure that all Campus Police personnel are properly equipped at each affected location and the recovery sites. (Page 33)

Campus Police Duty Sgt. Coordinate the need for additional manpower and equipment as required.

Campus Police Command Periodically submit status reports to the Staff Continuity Coordinator at the Emergency Control Center.

Campus Police Command Ensure that all facets of security protection Staff are afforded, relative to entry/exit controls, transportation of information, etc. at both the damaged and backup sites.

MIT News Office – Public Information

Action Procedures Player Action
Campus Police Notify MIT News Office when an emergency occurs.

Public Information Officer Assess the public relations scope of the emergency, in consultation with senior management if necessary, and determine the appropriate public relations course of action.

In instances where media are notified immediately, due to fire department or police involvement, the Public Information Officer will proceed to the scene at once to gather initial facts. Emphasis must be placed upon getting pertinent information to the news media as quickly as possible.

PIO Staff Assistant Maintain a log of all incoming calls to ensure a quick response to media and other requests.

Public Information Officer Maintain a log of all information which has been released to the media.

Public Information Officer When appropriate, prepare news releases on a periodic basis for distribution to the local media list.

Public Information Officer If employee injuries or fatalities are involved, notify Personnel to send appropriate management personnel to the homes of the involved families.
Personnel Notify Public Information Officer as soon as families have been informed. This will permit the release of names and addresses of victims so that families of those not involved can be relieved of anxiety.

Public Information Officer Contact the public relations director(s) at the hospitals where injured have been taken to coordinate the release of information.

Public Information Officer In cases where long-term media coverage is anticipated, establish a Press Room in the ( location to be selected) Provide for telephone requirements of the press.

Public Information Officer Schedule periodic press conferences, taking into consideration Management personnel who will be participating.

Public Information Officer If media wants to photograph physical damage, Clear request with Campus Police prior to approving request. Then accompany all photographers.

Public Information Officer Coordinate follow-up news releases after the immediate emergency has passed to present the Institute in as positive light as possible. Possible topics could include: What has been done to prevent recurrence of this type of emergency?

What are plans for reconstruction?

What has been done to express gratitude to the community for it’s help? What has been done to help employees, students and faculty?
Insurance Team

This appendix contains instructions to the Insurance Team Coordinator for disaster response, salvage and recovery efforts.

Action Procedures Player Action
Insurance Team Leader Contact appropriate Insurance people upon first advice of disaster. Insurance Team Leader Meet with Damage Assessment/Salvage team at site.
Insurance Team Leader Go through disaster scene with Damage Assessment/Salvage team and advise on matters relating to insurance and claims. Ensure that nothing is done to compromise recovery from insurance carrier. Photograph all applicable areas.

Insurance Team Leader File all appropriate claims forms with all involved insurance carriers. Report status of claims activity to the Business Continuity Management Team.
Telecommunications

This appendix contains instructions to the Telecommunications Systems team for disaster response and recovery efforts.

Action Procedures Player Action
HELP Line Personnel or Receives report of disaster from Physical

after-hours Duty Person Plant or Campus Police and notifies appropriate telecommunications Systems and other personnel.

Director, Telecommunications Systems Oversees assessment of damage to telecommunications facilities. Directs contingency and recovery efforts. Provides updates to Business Continuity Management Team and MIT administration.

Operations and Customer Service Arranges for voice and dial-up data communications services to support critical functions. Procures stock to repair or replace damaged equipment. Restores full services in a timely manner.

Transmission Services Provides data communications facilities or circuits to support critical functions. Assists with restoration of cable and wire plant, as needed. Assists Information Systems and other departments with relocation and restoration of data facilities.

Appendix A – Recovery Facilities

The following facilities have been identified as designated recovery sites for restoration of processing under the MIT Business Continuity Planning strategy.

Emergency Operations Centers

The Emergency Operations Center is the location to be used by the Business Continuity Management Team and their support staff as a location from which to manage the recovery process. As such, the specific location will be selected by the Coordinator at the time of the occurrence. The following are the locations available:

Emergency Operations Center is located in

Central Administration building out of service – Immediately after evacuation of building, the BCMT will convene in Building to coordinate intial response to the event. If the problem appears to be long term – or affects the local area, the BCMT will activate the primary EOC in
.

Hot Sites (Operational data centers providing emergency computing resources)
Facilities provided: (See O&S FARM Team Plan)

Shell Sites (Computer conditioned space available to install equipment)

Facilities provided: (See O&SFARM Team Plan)

Appendix B – Category I, II & III functions

For details about each of these functions see the appropriate FARM Team Plan

Appendix C – Plan Distribution List

PLAN DISTRIBUTION MATRIX

ORGANIZATION RECIPIENT LOCATION MIT PLAN FARM
COPIES TEAM
COPIES

Business Continuity Management Team

Coordinators 2 1
Audit Division 2 1
Campus Police 2 1
Comptrollers 2 1
Accounting Office
CAO – Payroll 2
Emergency 2 1
Response Team
Insurance 2 1
I/S Operations & 2 1
Systems
MIT News Office 2 1
Personnel Office 2 1
Physical Plant 2 1
President’s 2 1
Office
Safety Office 2 1
Telecommunications
2
1

Distributed Computing & Network Services

1
Administrative Computing Steering Committee

Chairman
2
1

FARM Team Coordinators

Bursars Office
1

Comptrollers Accounting Office
1

CAO – Payroll
1

Freshman Admissions Office
1

Lincoln Fiscal Office
1

Office of Financial Planning & Management
1

Purchasing & Stores
1

Office of the Registrar
1

Technology Licensing Office
1

Academic Computing Services
1

Administrative Systems Development
1

Computing Support
1

Services

Documentation & Training Services
1
1
I/S VP Office
1
1

Business Continuity Management Team

EARLY WARNING DUTY PROCEDURES
For information call:

BCMT Duty Person Procedures

This booklet contains instructions for the individuals currently assigned to be the active Business Continuity Management Team contact for emergency situations that may develop. The Duty Person is on call 24 hours a day for the one month assignment. The two people assigned as Duty Persons (DP) will be equipped with a pager and a cellular phone – both to be used for BCMT testing and emergencies only. Each person will pass the equipment to the next person on the Duty Person roster when the one month assignment ends. The equipment information is as follows:

Duty Person To just leave phone number To leave an 80 character message Number to call back dial: call and give PIN #
1

To reach by cellular phone: 1
2

Preparation Procedures

Upon receipt of the equipment, read the directions for the equipment and familiarize yourself with the pager and the phone. Ensure that phone batteries are charged properly (see instructions). Note: the pager takes one AAA battery, which lasts about a month.

Call the other duty person to ensure the phone is operable. Send a page to your own unit to ensure it is also functioning correctly.

At the end of your assignment, pass the equipment and documentation to the next person on the duty roster. Notify the BCMT coordinators, and by e-mail that the duty has been transferred. If an individual cannot serve, for a temporary period (i.e.. going to a conference) it is their responsibility to provide a trained alternate as their replacement. The BCMT Coordinators and the other person on duty are to be notified in advance about the replacement.

If there is a need to contact all the people on the Duty Roster send e-mail to:

, an Athena mail list maintained by the Information Security Officer for this purpose.

GUIDE TO BCMT ACTIVATION

1. The first indication of a problem will probably be a page alert from Physical Plant Operations. This will be a short text message outlining the problem. Unless it’s obvious that the problem is long term and severe, wait 30 minutes (for things in the Operations Center to quiet down) and call them at . Tell them you’re calling for the BCMT and get the latest status about the problem reported by the page.

2. Does the problem prevent normal access, occupation or usage of any part of any of the areas listed under the FARM Team Contact List, or does the disaster disrupt service provided by telephones, the network, or the mainframe computers?

If no, go back to sleep!

If yes, continue.

3. Will expected recovery of the affected area last into normal business hours?

If no, go back to sleep!

If yes, continue.

4. Does the FARM Team Coordinator of the affected service indicate that the disaster will affect that service? The FARM Team Contact List below provides the phone numbers of the FARM Team coordinators and the buildings their functions operate in.

If no, go back to sleep!

If yes, continue.

5. ACTIVATE THE BCMT!
Call the coordinators first:

If they can’t be reached, call the BCMT members directly. The numbers are on the list attached. The BCMT has three possible assembly points:

If the problem is related, meet in the meeting room.

If related, meet in the Conference Room

All other problems, meet in the Emergency Operations Center

Business Continuity Management Team Duty Roster

Name MIT Home From To Pager No
Phone ID

1
24

2
10

FARM Team Contact List

# Area(s) FARM Team Contact Ext. Home E-mail Phone

Business Continuity Management Team

BCMT Contact Office Ext. Home Phone E-mail # BCMT 04 Coordinator BCMT 05 Coordinator Physical 02 Plant Campus 03 Police Operations 40 Center Supervisor Emergency 41 Response
Team 42 Safety 43 Office Safety 44 Office DCNS 45 DCNS 11 CAO 46 I/S O & S 06
Telecomm 47 14 MIT News 48 Office 49 Insurance 50 Physical 51 Plant

Cellular Phone Memory Assignments

# Contact Phone

Slides Script From Lecture

CIS527 Week #9_ P1 IT Risk Management Mitigating Risk with a Disaster Recovery Plan
Slide #
Slide Title
Slide Narration
Slide 1
Introduction
Welcome to IT Risk Management.

In this lesson we will discuss Mitigating Risk with a Disaster Recovery Plan

Next slide

Slide 2
Topics

The following topics will be covered in this lesson:

What a disaster recovery plan (DRP) is;

What critical success factors are;

What the elements of a DRP are;

How a DRP mitigates an organization’s risk; and

What best practices for implementing a DRP are

Next slide

Slide 3
What is a Disaster Recovery Plan?
A disaster recovery plan is a plan to restore a critical business process or system to operation after a disaster. The DRP can be used to respond to a wide range of disasters, including weather events like hurricanes, tornadoes, or floods. Natural events also include earthquakes. The DRP is also used to rebuild systems after hardware or software failures following critical system crashes where all operations cease. The plan is written and used for the purpose of recovering the systems to restore operations.

You may see disaster recovery planning described with many different terms, but the terms are basically synonymous. Instead of the term “DRP”, you may see:

Contingency planning,
Business resumption planning,
Corporate contingency planning,
Business interruption planning, and
Disaster preparedness

Earlier chapters that have been studied presented several terms related to DRPs, which include:

Critical Business Function (CBF),
Maximum Acceptable Outage (MAO),
Recovery Time Objectives (RTO),
Business Impact Analysis (BIA), and
Business Continuity Plan (BCP)

Next slide
Slide 4
What is a Business Disaster Recovery Plan? (continued)
Every organization that has a critical mission needs to plan for disaster because if the operations that support the mission stop, the business stops. Unless an organization can do without critical business systems for a long period of time, a DRP is needed.

The time to plan for a disaster is before the disaster, not during it, because once the disaster occurs, it is too late to determine which systems are critical, which critical systems are more important than others, and what the best methods to restore and recover the most important systems are. If the BCP identifies the critical systems and the DRP provides details on how to recover these systems, the organization is ready to respond. Without the DRP, the organization may not be able to recover.

Most DRPs include a purpose statement that helps identify the goals of the DRP. The DRP often has multiple goals or purposes, which include:

Saving lives,
Ensuring business continuity, and
Recovering after a disaster.

DRPs are often written to target different phases of the disaster. For instance, one phase identifies the steps and procedures to restore CBFs as soon as a disaster strikes. Another phase identifies the steps and procedures to normalize operations. This phase returns operations to the original location. A single DRP can address both phases.

Next slide

Slide 5
Critical Success Factors
The success of a DRP depends on several critical success factors (CSFs). A CSF is an element necessary for the plan’s success. Elements that are critical to the success of a DRP include:

Management support;
Knowledge and authority for DRP developers;
Identification of primary concerns, such as recovery time objectives and alternate location needs; and
A disaster recovery budget.

DRPs require the support of management just as risk management and security plans do. Some support can be as simple as publicly endorsing the plan, while support can be
much more material, such as providing funds.
Without management support, a DRP is likely to fail. If management does not support the DRP, others within the organization won’t support it either.

The primary resource that management provides is labor.
Personnel are needed to create, test, and update the DRP.
These personnel can be in-house employees or outside consultants that are specialists in disaster recovery. Financial support from management is also needed.

Leadership must be supported by management in any
DRP. Leaders understand the importance of the DRP and
know that its success can only be achieved with combined
teamwork. Leaders also help disaster recovery and business
continuity teams recognize the value of the DRP.

Management helps teams identify project priorities and lead by example. If management wants others to support the requirements of the DRP, it must support it also. If management wants others to give the DRP their time and attention, managers must provide their time and attention to it when needed.

Next slide

Slide 6
Critical Success Factors (continued)
The developers of the DRP need specific knowledge and authority to succeed. Subject Matter Experts (SMEs) cannot simply be expected to have the ability to write a DRP, nor can disaster recovery experts be expected to fully understand the organization. Personnel need combined skills, so developers need to have an understanding of disaster recovery, along with an understanding of how the organization functions.
The DRP team must have an understanding of disaster recovery in general and should understand what a BIA and BCP is. They should also understand how a DRP fits in with a BIA and a BCP. The BCP includes both BIAs and DRPs. DRPs provide details to restore specific systems and some DRPs address system recovery of CBFs immediately after a disruption. Other DRPs address system recovery of all business functions after a disaster has passed. If the DRP developer doesn’t understand the purpose of the DRP, or how it fits in the overall business continuity plan, the focus of the DRP is lost.
It is also critical to understand how the organization functions to write the DRP. For example, the DRP for a military base will be much different from the DRP for a small business. The military base would require some support 24 hours a day, seven days a week while a small business may require support only from nine a.m. to five p.m., Monday through Friday. There is a vast difference in these two types of support.
DRP developers need authority when creating the DRP. Before the DRP can be written, the developer needs to gather data. In order to gather the data, the DRP developer needs authority to interview experts who understand the systems. If the DRP crosses departmental lines, the DRP author needs to make decisions that can affect multiple departments, thus needs management support to make the initial decisions.
Next slide
Slide 7
Critical Success Factors (continued)
The DRP developer must have a clear idea of the primary concerns as the DRP is being developed. One important concern is recovery time objectives. These objectives identify the critical nature of the DRP. Some systems need to be restored almost immediately, while others can be offline for days before they are recovered. Consideration of which off-site resources are needed is also important – for instance, at a minimum, a copy of backups needs to be stored off-site. The DRP may also address the use of alternate locations for operations.
The recovery time objectives identify when a system must be recovered and are derived from the maximum acceptable outage identified in the BIA. Outages longer than the MAO will have a significant negative effect on the organization. If the outage isn’t resolved within the RTO, the mission is impacted.
Performing backups of critical data is an integral part of any recovery plan. It is certain that at some point, data will be lost. If the data cannot be restored, the result can be catastrophic to the organization. Backup plans are often included as a part of the DRP derived from backup policies. The backup policy identifies details, such as what data should be backed up and how long to keep backup data, and identifies the steps to take to back up and restore the data.
The backup policy identifies details, such as what data should be backed up and how long to keep backup data. The backup plan identifies the steps to take to back up and restore the data and is primarily focused on data.
Another critical element of backups is ensuring that copies of backups are stored off-site. All of the backups should not be stored in the same location as the servers. Consider that if a fire destroys the building, it destroys the servers and also destroys all of the backups, but if copies of backups are stored in a separate location, the ability to restore the data will be there.
There also is not always the ability to perform traditional backups to ensure copies of the data. In this event, other technical processes can be used, such as database applications Oracle and Microsoft SQL Server that support data replication.
The following two terms identify different types of redundant transfers, and are often used as part of an overall disaster recovery plan:
Electronic vaulting -This method transfers the backup data to an off-site location. Electronic vaulting transfers the data over wide area network (WAN) links or tunnels through the Internet.
Remote journaling-This method starts with full copies of the data at the remote location. It then sends a log of the changes from the primary location to the secondary location. These changes are applied as a batch to the secondary location. After they are applied, the secondary location is up to date.
Next slide
Slide 8
Critical Success Factors (continued)
Many companies need to ensure that their business stays in operation even if a significant disruption occurs. Consider that businesses working along the San Andreas Fault in California have a constant threat of an earthquake. If an earthquake hits, the business location may not be able to function in the same location, so an alternate location needs to be in place.
An alternate location can be in a different building, city, or state. There are three types of alternate locations: cold sites, warm sites, and hot sites. Each site must have the capability to host the critical data and programs of the primary location. Each location has different costs and initial capabilities as well.
A cold site is an available building that has electricity, running water, and restrooms, but none of the equipment or data needed for critical operations. The cold site may have raised floors if needed to support a server environment, but it does not include any equipment, data, or applications.
A hot site includes all the equipment and data necessary to take over business functions. A hot site will be able to assume operations within hours and sometimes within minutes. The site usually has personnel at the location twenty-four hours a day, seven days per week. Hot sites are expensive to maintain, but some newer technologies such as cloud computing and virtualization can make the sites a little easier to manage.
A warm site is a compromise between a cold site and a hot site. A warm site includes most or all of the equipment needed, but data is not usually kept up to date. The equipment is maintained in an operational state. If a disaster occurs, the systems are updated with current data and brought online.
Warm sites are often fully functioning sites for non-critical business functions, meaning they are used for regular business. When a disaster occurs, non-critical functions stop and the site is used for critical business functions.
Next slide
Slide 9
Critical Success Factors (continued)
A redundant backup site makes it possible to outsource a data recovery site. Instead of maintaining an alternate location, a third-party vendor can be contracted to host the data and services in a redundant backup site. Fully redundant backup sites have the ability to host all the data and service that can be used as both primary and secondary environments.
It is important that users have access to data and services if the move to an alternate location takes place. To meet this need, an understanding of how the users actually use the data and services must be obtained. If users access services over the Internet, the alternate location must include Internet access.
Management may also need access to data and services during the disaster. Management often represents another user, so when providing user access, management access is also provided.
Customers need to have the access they require. Customer access needs vary from one organization to another, and this depends on how customers normally access the organization’s network, and what the customer expectations are during a disaster.
Next slide
Slide 10
Critical Success Factors (continued)
The last critical success factor to consider for a DRP is money. A budget ensures that there is enough money to pay for preparation and for executing a DRP if a disaster strikes. There are multiple costs to consider when preparing for disasters, which include:
Backups;
Alternate locations;
Fuel costs during a disaster;
Food and water during a disaster; and
Emergency funds during a disaster
In addition to establishing the budget, identification of who can release the funds needs to be determined.
Next slide
Slide 11
Check Your Understanding

Slide 12
Elements of a DRP
There are no specific rules that identify what must be in a DRP. Many elements are commonly included, which are:
Purpose;
Scope;
Disaster/emergency declaration;
Communications;
Emergency response;
Activities;
Recovery steps and procedures;
Critical business operations;
Recovery operations; and
Critical operations, customer service, and operations recovery.
In addition to including these elements in the DRP, the DRP also needs to be tested and maintained. Testing verifies that the procedures are valid. The DRP should be regularly reviewed and updated to ensure it stays current.
Next slide
Slide 13
Elements of a DRP (continued)
The purpose of the DRP needs to be defined early in the process and included in the final product. When considering the purpose, the following activities should be included:

Recovery,
Sustaining business operations, and
Normalization

The scope of any project helps identify the boundaries and helps all parties understand what is and is not covered. Without an identified scope, well-meaning people can cause the project to grow and expand, which is known as scope creep.

The purpose of the DRP drives the scope. When developing the scope, the following areas should be considered:

Hardware,
Software,
Data, and
Connectivity

When a disaster or emergency occurs or is imminent, the DRP is implemented. As a reminder, the DRP is a part of the BCP. Usually, the overall BCP is activated first. Then, based on what the DRP does, the DRP is activated to support the overall BCP.

Several communications elements are important to the success of a DRP. These include:

Recall,
Users,
Customers, and
Communications plan

Next slide

Slide 14
Elements of a DRP (continued)
The DRP could include an emergency response element which is used for short-fused disasters. Consider the fact that an earthquake will strike without warning while a tornado strikes with very little warning. Similarly, a fire could result in a disaster that requires an emergency response.

Emergency response steps could include:

Recall and notification of personnel;
Damage assessment;
Plan activation; and
Implementation of specific steps and procedures

The Emergency Response section identifies several emergency response steps to take, such as recalling personnel and assessing the damage. The DRP also identifies other activities
to take in response to the disaster.

A primary activity of any DRP is ensuring that personnel safety has been addressed. Ensuring personnel safety and the protection of life should always be at the forefront
of any DRP.

The recovery steps and procedures describe all the specific actions required to recover systems or functions. This section often includes multiple procedures. Recovery steps and procedures usually include specific recovery plans and backup plans.

Next slide

Slide 15
Elements of a DRP (continued)
The recovery plan identifies the steps for rebuilding and recovering a system after a disaster. This plan often includes steps for recovering a system from scratch, which includes installing the operating system and all applications. If data is needed, the plan specifies how to restore the data.

The backup plan identifies several elements, which include:

The data to back up;
Backup procedures for data;
Length of time to keep the data;
Types of backups, such as regular, electronic vaulting,
or remote journaling;
Off-site storage location, including how to retrieve
a backup during a disaster;
Testing of restore procedures and schedules; and
Disaster restore procedures

The RPOs identify the amount of data loss that is acceptable for any data. The RPO is considered in the backup plan. If the RPO is a short period of time, such as minutes instead of hours or days, backups must be performed more frequently.

Next slide
Slide 16
Elements of a DRP (continued)
The DRP also identifies critical business operations that are supported by critical business functions. Specific servers and services in turn support CBFs. By identifying critical business functions, the DRP helps ensure that the critical servers
and services continue.

Recovery procedures entail specific procedures that are identified for all of the servers and services in the DRP. Separate written documents can be created for each procedure, as procedures will be different for each of the systems being recovered. These procedures are one of the most important elements of the DRP to test. Although many of the steps in the DRP may be generic in nature, these recovery procedures are often very technical.

Critical operations, customer service, and operations recovery are main components of the DRP.

The DRP identifies critical business operations and CBFs to support, but it is important to specify other elements of the business to recover that include these elements.

Testing DRPs to ensure they perform as expected is critical. The DRP is written to restore CBFs. Testing of the DRP should not affect operations of these CBFs and the goal of the tests is to identify any problems or omissions in the DRP.

Like with a BCP, different testing methods can be used, including:

Desktop exercise,
Simulation, and
Full-blown DRP test

Next slide

Slide 17
Elements of a DRP (continued)
The DRP needs to be regularly reviewed and updated to ensure that it is ready when needed. IT systems are regularly updated and upgraded, and any of these changes to the IT systems could affect the usability of the DRP.

Change management processes should be in place, which
ensure that changes to systems are reviewed before the changes occur. The changes need to be documented. DRP developers should be involved in this process to ensure they are aware of system changes.

The review should include the following elements:

Systems,
Critical business functions,
Alternate sites, and
Contacts

Next slide
Slide 18
How Does a DRP Mitigate an Organization’s Risk?
DRPs reduce risk by reducing the impact of disasters. The disaster is the threat, which cannot be stopped. If an earthquake, tornado, or hurricane is going to hit, it cannot be stopped, but the impact of the threat can be reduced by being prepared. The DRP helps an organization prepare and mitigate both the short-term and long-term damage.

With a disaster recovery plan, the organization will be much better prepared to recover critical business functions. A cold site takes a lot of work to put together when it is planned for. All of the equipment needs to be set up and configured. With some extra resources and preplanning, a warm site may be decided to be used. With no planning, the job will be twice as difficult.

Next slide
Slide 19
Best Practices for Implementing a DRP for Your Organization
When implementing DRPs, several different best practices can be used which include:

Ensure BIAs have been completed
Start with a clear purpose and scope
Review and update the DRP regularly
Test the DRP

A checklist can be used to ensure that all relevant concerns are addressed.

Next slide
Slide 20
Check Your Understanding

Slide 21
Summary
We have reached the end of this lesson. Let’s take a look at what we’ve covered.

First we defined what a disaster recovery plan (DRP) is, which is a plan to restore a critical business process or system to operation after a disaster.

Next we discussed all of the critical success factors that play a role in the DRP plan. The CSFs are used to identify the critical business operations and critical servers and services.

The elements of a DRP were then presented and reviewed individually. We looked at purpose, scope, disaster/emergency declaration, communications, emergency response, activities, recovery steps and procedures, critical business operations, recovery options, critical operations, customer service, and operations recovery.

Next we considered how a DRP mitigates an organization’s risk by reducing the impact of disasters.

Finally, we looked at Best practices for implementing a DRP. These included ensuring the BIAs are completed, starting with a clear purpose and scope, reviewing and updating the DRP plan regularly, and testing the DRP.

This completes this lesson.