Creating an Actionable Disaster Recovery Plan

Size: px
Start display at page:

Download "Creating an Actionable Disaster Recovery Plan"

Transcription

1 Creating an Actionable Disaster Recovery Plan

2 Presentation Outline Plan Justification Disaster Definitions & Facts Costs of a Disaster Benefits of Planning Building an Actionable Disaster Recovery Plan Program Initiation Risk Assessment Detailed Risk Assessment Disaster Recovery Plan Maintenance Plan Test Plan & Results 2

3 Plan Justification 3

4 What s a disaster? A disaster is an occurrence that disrupts the functioning of an organization resulting in the loss of data, loss of personnel, loss of business or loss of time Hiatt

5 Disaster Facts Common IT disasters: Power outages 28% Storm damage 12% Floods 10% Hardware error 8% Bombing 7% Hurricanes 6% Fires 6% Software errors 5% Power surge/spike 5% Earthquake 5% 5

6 Terms Business Continuity Planning Advance planning and preparations to ensure continuity of critical business functions Disaster Recovery Advance planning and preparations to minimize loss and facilitate recovery of core IT assets 6

7 Tangible and Intangible Costs Patient care and patient safety Paying staff who are idle Added work, related to manual operations Other hard cash costs Lost business Lost customer loyalty your reputation! 7

8 Recovery Planning Benefits Reducing legal liability Minimizing potential economic loss Decreasing potential exposure to disaster Reducing the probability of a disaster occurrence Reducing disruption to normal operations Ensuring organizational stability Ensuring orderly, systematic, and timely recovery 8

9 Recovery Planning Benefits Minimizing insurance premiums Reducing reliance on key individuals Increasing asset protection Ensuring the safety of personnel and patients Complying with legal, statutory, and regulatory requirements 9

10 Why have the stakes risen? ebusiness transitioned many businesses from 8am- 5pm to a 24 x 7 x 365 model. Patient care could be compromised without information systems. Operations are running too lean to transition to manual processes and be able to conduct business as usual. Technology companies are not maintaining inventories as they once did to provide quick disaster shipment capabilities. New exposures: viruses, cyber-crime, terrorism 10

11 Getting Approval & Funding Historical data The National Climactic Data Center (NCDC) is the Nation s Scorekeeper in terms of addressing severe weather events in their historical perspective ( National initiatives Hospital Incident Emergency Command System (HIECS) Regulatory audit compliance HIPAA JCAHO 11

12 Building an Actionable Disaster Recovery Plan 12

13 A Practical Approach Initiation Risk Assessment Detailed Assessment Plan Development Testing & Maintenance Phasing: 1. Initiation 2. Risk Assessment 3. Detailed Assessment 4. Plan Development 5. Testing & Maintenance 13

14 Program Initiation Initiation Risk Assessment Detailed Assessment Plan Development Testing & Maintenance 14

15 Strategic Objectives & Scope Objective: Develop overall Strategic Objectives and Scope for DRP Program Practical Approach: Develop high-level Business Case to support DRP Program Gather and review existing documentation related to DRP Identify areas of alignment with other Organization Initiatives Define Program Objectives and Scope Deliverables: DRP Program Definition 15

16 Organizational Structure Objective: Develop DRP Program Organizational Structure Practical Approach: Identify Sponsorship, Stakeholders and Program Manager Define Program Organization, Roles and Responsibilities Dedicate existing Staff and supplement with External Resources Deliverables: Identification of Sponsor(s), Stakeholders and Program Manager Definition of Program Organization, Roles and Responsibilities Initial staffing of Core Team(s) 16

17 Communication Strategy Objective: Establish ongoing Communication Strategy Practical Approach: Define Communication Objectives, Approach and Channels (e.g. Status Reports, Company Publications, etc.) For each Channel, define Audience, Message, Mechanism, Tactics, Measures and Timing Recommendations Deliverables: DRP Communication Strategy and Timing Recommendations 17

18 Program Plan & Budget Objective: Define High-Level DRP Program Plan and Budget Practical Approach: Define and obtain consensus on Approach and Plan for the overall DRP Program Estimate DRP Program Cost and Resource Requirements Deliverables: High-level DRP Approach, Plan and Budget Assessment 18

19 Kick-Off Meeting Objective: Facilitate Program Kick-Off Meeting Practical Approach: Host Program Kick-Off Meeting, obtaining stakeholder consensus on Program Scope, Objectives, Communication Strategy, Plan and Budget Deliverables: Program Kick-Off Meeting Presentation / Agenda Kick-Off Meeting 19

20 Risk Assessment Initiation Risk Assessment Detailed Assessment Plan Development Testing & Maintenance 20

21 Process Risk Analysis Objective: Perform Business Process Risk Analysis Practical Approach: Interview Business and IT Subject Matter Experts (SMEs) to define disaster scenarios, create an inventory of the major business processes, define the impact of an interruption and the tolerance for downtime, and prioritize major business processes Complete Risk Assessment for Business Process Deliverables: High-Level Business Process Current State Definition Business Process Risk Assessment 21

22 Business Process Inventory Process Dependencies Business Line Business Function Primary Inputs Primary Processing Primary Outputs Impact of Interruption Downtime Tolerance Applications Used Patient Care Order Entry ADT Orders Order requisition to ancillary system H 0 Application A Patient Care Lab ADT, Orders Labs Results to HIS H 0 Application B 22

23 Technology Inventory Objective: Perform Technology Inventory and Risk Assessment Practical Approach: Interview IT Subject Matter Experts (SMEs) to identify Technology Assets, define interdependencies and prioritize according to time sensitivity and criticality Audit existing, relevant processes and procedures Complete Risk Assessment for Technology Assets Deliverables: High-Level Technology Asset Current State Definition Technology Risk Assessment 23

24 Technology Inventory Technology Assets Quantity Location Interdependencies Downtime Tolerance Criticality Applications (1) Application 1 Supported Desktops (1) Desktop config 1 Networking Infrastructure (1) Network device 1 PBX / Telephony (1) Telephony device 1 Total Valuation 24

25 Detailed Assessment Initiation Risk Assessment Detailed Assessment Plan Development Testing & Maintenance 25

26 Detailed Assessment Objective: Perform Business Process Gap Analysis and identify Remediation Approaches Practical Approach: Identify opportunities to prevent a disaster and other quick-hits Evaluate existing Policies, Workflow, and IT systems Complete Detailed Current State Definition Define and assess Remediation Options Develop Recommendations and select Remediation Solution Define Future State based on implementation of selected Solution Define and obtain consensus on the Objectives, Scope, Approach Plan and Budget for Remediation Approach 26

27 Detailed Assessment Deliverables: Detailed Current State Definition Remediation Options and Recommended Solution(s) Future State Definition Remediation Estimates and Plan 27

28 Downtime Tolerance Costs $$$$ Implementation Costs $$$ $$ $ Downtime Tolerance (hours) 28

29 Disaster Recovery Planning Initiation Risk Assessment Detailed Assessment Plan Development Testing & Maintenance 29

30 Plan Development Objective: Develop DRP plan Practical Approach: Deploy quick-hit solutions Develop high-level recovery strategies and recovery phases Define roles and responsibilities including line of command Define disaster assessment and declaration definitions and procedures Develop emergency/ evacuation procedures that incorporate DRP activities Document organization, staff and system functions and recovery requirements and procedures 30

31 Plan Development Establish recovery locations and document steps to make functional during a disaster Develop business partner and vendor agreements Develop communications plan and identify alternative communication tools Create contingency plans for missing people, failed procedures Document insurance information and procedures Build maintenance schedule and procedures Deliverables: Actionable Disaster Recovery Plan 31

32 Plan Structure Section 1: Objective: Content: Plan Information To provide information that will enable the reader or user of this plan to execute it while fully understanding the intentions and parameters with which it was created. Scope, Approach, Objectives, Team Organization, Pre-Disaster Action Checklist 32

33 Plan Structure Section 2: Objective: Content: Actionable Recovery Steps ***Disaster: Start Here*** To provide a step by step checklist of activities that will be performed in the event of a disaster. This section contains the detail for each disaster level, by business line, by recovery option. Evacuation Checklist, Disaster Declaration Checklist, Recovery Team Activation Checklist, Level 1 4 Recovery Steps for all teams and for all recovery options 33

34 Plan Structure Section 3: Objective: Content: Addendums To provide one place to access key information and resources required to efficiently and knowledgeably carry out the actionable recovery steps. Phone list, Insurance Information, Legal Considerations, Key Communication Messages, Facilities Considerations, Security Considerations, Transportation Options 34

35 Section 1 - Plan Information Goal: Enable the user to execute the Plan while fully understanding the intentions and parameters with which it was created Contents: Scope Approach Objectives Team Organization Plan Activation Process Distribution Communication Strategies Contingency Plans (missing people or failed procedures) 35

36 Disaster Event Types Event Types Event Level 1 Event Level 2 Event Level 3 Event Level 4 36

37 Recovery Strategies Event Types Event Level 1 Event Level 2 Event Level 3 Event Level 4 Staff Facility Process Technology 37

38 Recovery Strategies Event Level 1 Business Area 1 Event Level 1 Strategies App 1 App 2 App 1 App 2 App 1 App 2 1 Execute manual procedures Restore from backup Failover to redundant systems 1 2 Business Area 2 Business Area 3 Event Level 2 Business Area 1 Event Level 2 Strategies App 1 App 2 App 1 App 2 App 1 App 2 1 Strategy Strategy Strategy Business Area 2 Business Area 3 38

39 Recovery Team Structure Structured using the team approach Each team has separate section of the Plan within each outage event level Recovery teams = operational and technical groups responsible for restoring specific functions Each team only has the authority to carry out the procedures contained in their section of the Plan The teams are: Command Team Administrative Recovery Team Operational Recovery Teams Technical Recovery Teams 39

40 Recovery Team Structure Command Team Operational Recovery Team Administrative Team Technical Recovery Team Business Function 1 Business Function 2 Business Function 3 Phones Applications Infrastructure 40

41 Plan Activation Process Outage Alert Command Team Receive initial alert Determine disaster level Activate recovery teams Operational Receive notification Evacuate area Notify team members Activate plan Administrative Receive notification Evacuate area Notify team members Activate plan Establish command center Determine disaster level Supervise recovery steps Technical Receive notification Evacuate area Notify team members Activate plan Authority to declare a disaster crucial element of plan: Assigned to restricted number of individuals Only group authorized to declare a disaster is the Command Team 41

42 Section 2 - Recovery Steps Goal: To provide a step by step checklist of activities that will be performed in the event of a disaster. This section contains the detail for each disaster level, by business line, by recovery option In the event of a disaster, start here Contents: Evacuation Checklist (OSHA) Recovery Locations Disaster Declaration Checklist Recovery Team Activation Checklist Level 1 4 Recovery Steps for Command, Administrative, Business Lines and Technical Recovery Teams and for all recovery options 42

43 Command Team Checklist # Start Day Start Time 1. 1 E+ 00: E+ 00: E + 00: E+. 00:40 End End Day Time 1 E+ 00:15 1 E + 00:30 1 E+ 00:40 1 E+ 01:00 Activity Team / Owner Complete Date/Time Comments Execute emergency response ALL Refer to your facility (fire, tornado, etc.) emergency action plan Determine the disaster level Command based on the Event Level Definitions below and proceed to Initiate Activation Checklist Notify Administration Command accordingly Notify and activate the Command Recovery Team Leads what disaster level is being declared: - Operational Team - Administrative Team - Technical Team 43

44 Command Team Questions Goal: Remind staff about key action items that don t necessarily belong in another checklist Examples: Need Risk Management? Need Safety Team? Questions about safety procedures, personal injury. Need Purchasing Team? Need to purchase supplies, furnitrue, computers, etc. Need Facilities Team? Issues with HVAC, security, parking, restrooms, coffee? Need Communications Team? Issues with reporters, announcements, etc. 44

45 Recovery Locations DRP Locations Hospital Command Center Conf. Room G & H Data Center B900 - Basement Who Goes Here? DRP Command Team Communication Administrative Support What Happens Here? Disaster Level Decisions Issue Management Activity Direction Status Reporting Communication Who Goes Here? SysAdmins Network OPS Telecommunications What Happens Here? Server / Network / Systems Assessments Backup restorations Failover Activities System Monitoring Medical Office Building Help Desk / Desktop Services MOB 215 Resource Center MOB 605b Who Goes Here? Help Desk Desktop Services What Happens Here? Help Desk 1st Level Support Desktop Deployment / Support Who Goes Here? Application Support Interface Team DBAs What Happens Here? Application Assessments / Recovery Interface Assessments / Recovery Database Assessments / Recovery 45

46 Section 3 - Addendums Goal: To provide one place to access key information and resources required to efficiently and knowledgeably carry out the actionable recovery steps Contents: Phone List (staff, emergency contact, vendor) Insurance Checklist Transportation Checklist Legal Checklist Key Communication Messages Security Checklist 46

47 Test Plan & Results Initiation Risk Assessment Detailed Assessment Plan Development Testing & Maintenance 47

48 Testing Objective: Perform testing Practical Approach: Perform conference room test (passive testing) Perform full test (active testing) Deliverables: Passive Test Plan and Test Results Active Test Plan and Test Results 48

49 Plan Structure Section 1: Objective: Content: Testing Plan Information To provide information that will enable the reader or user of this plan to execute it while fully understanding the intentions and parameters with which it was created. Scope, Approach, Objectives, Roles and Responsibilities, Testing Environment and Locations, Assumptions, Known Risks and Issues 49

50 Plan Structure Section 2: Objective: Actionable Testing Scenarios and Steps To provide step by step conference room testing activities that address all levels of disasters that are represented in the Plan. Content: Testing Checklists for Level 1 4 Disasters, Issue Management Process 50

51 Plan Structure Section 3: Objective: Content: Testing and Maintenance Schedule To provide a schedule that will ensure that the Plan is tested and executed in a conference room setting at least two times per year. And, to develop a maintenance schedule that will ensure that the plan is current and relevant. Testing Activities and Schedule, Participants, Start Dates, End Dates; Maintenance Schedule, Owners, Due Dates 51

52 Walk-Through Test Intended to orient and educated stakeholders with the organization and content of the Plan Intended to evaluate the Plan for completeness and accuracy, assuring all information is up-to-date Should include all stakeholders of the BCP and take 1-2 hours to execute Example: Walk-through Test Script 52

53 Conference Room Tests Objectives: Intended to evaluate the detailed checklists of the DRP By creating scenarios (Level 1, 2, 3) to test different levels of the Plan, all stakeholders will have the opportunity to review individual checklists in addition to evaluating interdependencies between the checklists Should include all stakeholders of the DRP and take 2-4 hours Approach: For each level, develop a scenario For each scenario, define Type of Test, Participants, Type of Disaster, Day and Time of Disaster Event, Disaster Incident Description, Impact 53

54 Discussion Items To start the scenario Who does what at that time? How long does it take? When is it finished? What were the disaster event discovery procedures? What notifications need to occur? What documentation needs to be prepared? How should a system outage be handled? What notification should occur? How do you validate the outage? How do you evaluate the impact on related systems? How do you document the process? 54

55 Discussion Items What do you do in the meantime? How long do you continue manual processes? What if it is a hardware related problem? The vendor says it will be three days before it can be resolved. What do you do? How long can they be used? Do they have adequate staff? How will they operate without access to the web? What should be communicated internally and externally? What decisions need to be made and how quickly? Company personnel need to use their temporary operating procedures What steps need to be taken? 55

56 Active Tests Intended to evaluate the execution of the checklists and ensure everyone is comfortable executing their tasks Should include all stakeholders of the DRP and take 4-8 hours 56

57 Test Results Imperative to track test problems in a Test Problem Log Problem Number, Problem Description, Assigned To, Action Items Intended to ensure action is taken on problems or issues that arose during the testing so that each iteration brings you closer to a complete plan 57

58 Maintenance Plan Initiation Risk Assessment Detailed Assessment Plan Development Testing & Maintenance 58

59 Ongoing Maintenance Objective: Insure DRP plans are maintained on an on-going basis Practical Approach: Maintain DRP command team and recovery team roles Maintain Vendor List and Supply List Perform periodic Internal Audits/Reviews Insure change management processes incorporate DRP plan maintenance Deliverables: Actionable DRP plans 59

60 Timeline and Activities 2 Weeks Prior to Test: DRP Coordinator sends a message to all Command Team and Recovery Team Leads indicating the time of the testing and requesting Recovery Team Leads make checklist updates Recovery Team Leads update checklists and distribute to BCP Coordinator 1 Week Prior to Test: DRP Coordinator updates BIA, Recovery Strategies, DRP and Test Plan 60

61 Timeline and Activities Testing: Testing occurs over ½ day DRP Coordinator facilitates all testing activities 3 Weeks After Test: Updates and other action items identified during testing complete 4 Weeks After Test: New DRP compiled and distributed to all Command Team and Recovery Team Leads and Executive Management 61

62 Summary 62

63 Summary Disaster Recovery Planning is essential Your approach needs to be practical and the plan needs to be executable Test much and test often Ensure the plan is maintained 63

64 Jonathan Thompson StoneBridge Group 701 Xenia Ave. South, Suite 170 Minneapolis, MN (763) (763) fax 64