PLANNING TO SUCCEED: EXECUTING A BC/DR STRATEGY DURING A DIGITAL TRANSFORMATION

Size: px

Start display at page:

Download "PLANNING TO SUCCEED: EXECUTING A BC/DR STRATEGY DURING A DIGITAL TRANSFORMATION"

Kerry Rogers
5 years ago
Views:

1 WHITEPAPER PLANNING TO SUCCEED: EXECUTING A BC/DR STRATEGY DURING A DIGITAL TRANSFORMATION By Nick Cavalancia

2 The age old concept of a business being defined by its location is long dead. Employees and the technology they use no longer need to exist within the bricks and mortar of a traditional (or dare I say legacy) business. Organizations today are undergoing a fundamental change the digital transformation. Once defined by 9-to-5, bricks & mortar, pen & paper, and hand & tool, business is replacing these old concepts with the need to be always-on, the leveraging of interconnected devices and users, all with an expectation of 24/7 access to data, applications, and systems and zero downtime. The transformed business operates completely differently from its legacy predecessor. A legacy business looks much like the simple shoe or watch repair shop you have in your city they re open 9-5, probably have no web presence, and their product is little more than skilled labor. But the transformed business relies heavily upon technology to deliver its services. This creates an expectation by employees, partners, and customers alike to deliver a fast, anytime/anywhere/ any device interaction with your organization s internal and external services with the user s productivity in mind. So what does this transformation look like and, more importantly, how does it impact IT when it comes to backup and recovery? OPERATING A TRANSFORMED BUSINESS: BRAVE NEW WORLD In many ways, the transformed business changes everything. Legacy businesses are transforming into software companies, offering access to applications, services, and data where customer, partner, and employee interaction with the software must delight or the company risks losing business to a competitor. Take the example of a modern-day airport. While the business of the airport itself seems to be about providing a way for passengers to get on planes, it s much more. When you consider all the systems that make up an airport flight information, routing of baggage, HVAC, security checkpoints, and more an airport servicing millions of customers annually has changed from one predominantly managed by paper tickets and people, to one based on countless software-based services that go unnoticed until an outage occurs and passengers are delayed. The increased expectation on a business is passed on to IT to deliver all the services that automate everything about the business and that includes backup and recovery. [ 1 ]

3 The challenge with the digital transformation is that it puts additional requirements on IT, in some cases, never before seen. That previously mentioned expectation of 24/7 access can, for some organizations, literally mean 24/7 100 percent uptime, without interruption, none, zero, nada. Roll the clock back 10 years and we were still talking five 9 s of availability, but without the context of how long you d actually be down should a recovery need to take place. That percent was more about how reliable the technology was, not so much a statement about whether you could actually be operational with services running in any disaster scenario. Lastly, the transformation has changed the conversation for IT when it comes to keeping the business operational. It s no longer about recovery or even recoverability of data, applications, and systems as you ll see, because of the increase in expectations from customers and employees, the business is changing their expectations of IT. All this has an impact on what constitutes a proper BC/DR strategy. What used to be simply a focus on having backup copies with a plan on how and where to restore is now shifting to one concerned with minimizing and mitigating downtime. THE COST OF DOWNTIME One major factor in the pressure to provide highly available services is the simple cost of downtime itself. The YOU SEE, DATA DEFINES A BACKUP EVERYTHING ON THAT SERVER, ALL OF YOUR ON-PREMISES EXCHANGE ENVIRONMENT, EVERY DOMAIN CONTROLLER. average of a single data center outage is approximately $740, But the costs go well beyond just the downtime and cost to recover. Recently, Delta Airlines was in the news due to a complete outage of their Atlanta-based datacenter. The data center was down for five hours. Delta had to issue refunds, and suffered cancellations that put the cost of the outage at $150 million. While your outage may not be as costly, you do need to take into consideration some of the following more intangible losses: Damage to or loss of mission-critical data Loss of organizational productivity Costs associated with recovering systems and remediating business processes Legal and regulatory impact, including litigation defense cost Lost confidence and trust among key stakeholders Diminishment of marketplace brand and reputation 1 Ponemon, Cost of Data Center Outages Report (2016) [ 2 ]

4 All of these add up to the ever-present risk: loss of revenue. So, how do you build and execute a BC/DR plan that actually keeps your business running? The first step is to understand the organization s detailed requirements and determine IT s ability to meet them. PROTECTING THE TRANSFORMED BUSINESS: BACKUP VS. AVAILABILITY OK, so you ve got backups. That s important, but there s a factor missing here: Your organization is looking for availability not recovery. You see, data defines a backup everything on that server, all of your on-premises Exchange environment, every domain controller. But the business defines the availability of those systems, applications, and environments. Your organization may need Exchange (and, therefore Active Directory) to be available with a very small window of downtime say, 15 minutes. In contrast, they also may only require that file servers be available within four hours. It s all about looking at the business of your organization, and allowing the organization to define which data sets, systems, and applications need to be available (and to what degree). The digitally transformed business focuses its BC/DR strategy on availability. You need to as well. With availability, the transformed business looks to achieve three goals: Enterprise continuity that delivers recovery time and recovery point objectives of less than 15 minutes for ALL applications and data, and Disaster Recovery orchestration to keep your business up and running when a disaster strikes and that aligns with your business and regulatory requirements. Workload Mobility that provides availability for all your workloads - virtual or physical wherever those workloads reside in the public, private cloud, your main data center or a remote office. This allows you to maximize IT investments and increase flexibility. Maintain Compliance & Visibility so your businesses can proactively monitor data and application availability to alert you of issues before they occur, using automated testing & documentation to ensure business and regulatory requirements are met. To achieve these goals, the business needs to determine how it defines availability. After all, it s not simply keep everything up all the time. It s far more detailed than that. DEFINING AVAILABILITY To put some tangible terms on defining availability, your organization should focus on the following BC/DR factors for each critical workload: Recovery Time Objective (RTO) The amount of time the business is [ 3 ]

5 willing for a recovery operation to take. Recovery Point Objective (RPO) The amount of data the business is willing to lose. Maximum Tolerable Period of Disruption (MTPoD) The total amount of time a recovery operation can take before the business starts feeling some of those outage losses previously mentioned. Service Level Agreement (SLA) This sums up the other factors in this list and defines what availability IT needs to deliver. The problem here is a gap exists. The gap between what the business defines as availability and what IT can deliver. Think about it you make backups of a given tier 1 application that will get you back up and running within a few hours, but the business needs that application back up in one hour. That s your Backup-Availability gap. The challenge is that legacy backup is not built to meet the demands of today s application environments, nor its use of hybrid cloud. Legacy backup cannot meet RPO and RTO SLAs, failing far too often (17 percent on the average), yielding too much downtime (an average of 15 hours each year and growing), and demonstrate the existence of a backup-availability gap recognized by 84 percent of CIOs 2. By getting the C-suite, application owners, line of business owners, and IT THE CHALLENGE IS THAT LEGACY BACKUP IS NOT BUILT TO MEET THE DE MANDS OF TODAY S APPLICATION ENVIRONMENTS, NOR ITS USE OF HYBRID CLOUD. together to outline on a system-bysystem, application-by-application basis what the availability expectation is using the terms above, IT can determine how big the backup-availability gap is per system or application. ACHIEVING AVAILABILITY As we all know, determining the problem doesn t help without a solution. In some cases, the answer to availability may be more frequent backups. But that may not be feasible, given the size of the data set. So you re going to need to take the availability definitions and look to a number of technologies to make availability happen. The goal is to identify the capabilities and limitations of each in an effort to match the right technology with the specific availability need. Backups It s important to first acknowledge you re not going to throw out your existing backups. They are a part of the puzzle. What you will need to do is to identify all the backup (and recovery) methods your current backup solution 2 Veeam, Availability Report (2016) [ 4 ]

6 supports (e.g. file-level, image-level, bare metal restores, etc.). Virtualization While somewhat a given at this point, the use of virtualization as both a production and recovery environment standard gives you more availability options. For example, performing image-level continuous recovery where, as an image is backed up, that same data set is recovered to an alternate server facilitates the ability to almost immediately failover to the alternate server with almost no loss of data. Replication As with continuous recovery (via your backup solution), you also have replication of images built into your virtualization be faster than restoring a VM, and then an application database, for instance. Snapshots are a key factor when considering converged or hyperconverged infrastructures as part of the solution. Cloud Sure, it seems obvious, but the cloud can be leveraged in a number of ways as part of your availability strategy, including storage, replication, redundancy, and recovery. It s important to realize there is no single right answer here. It will truly depend on how the organization defines availability for a given system or application, and you ll need to decide which technology is going to best help you achieve the availability you need. YOU VE GOT TO BEGIN TO THINK ABOUT PLANNING AS MORE OF AN EXECUTION STRATEGY THAN A PLAN FOR SOME TIMEFRAME IN THE FUTURE. platform. Identifying the limitations of each (e.g. how often are images copied in each case, how much data is lost, etc.) will help determine when this is a more appropriate option. Snapshots Being able to go back to a specific point in time may also be an option. In some cases, it may PLANNING FOR AVAILABILITY It s assumed many of you already have a BC/DR plan of some kind. But, if you were to be honest, and were to answer the question what s it worth? you may answer that it s not even worth the paper it s written on. Why? Because that plan is likely more along the lines of what you re going to do as opposed to what you are doing. Here s the issue Backup planning tends to be if X happens, we ll do Y to recover. It s a plan that requires a recovery event in order to be put into action. Availability planning is more let s do these things every day/week/month to ensure should something happen we are ready. You ve got to begin to think about planning as more of an execution strategy than a plan for some timeframe in the future. [ 5 ]

7 BUILDING THE PLAN Because your existing plan probably fits the description above, you re going to need to build out a new availability plan that incorporates the following: The organization s definitions of availability for each application/ system/data set. Fitting technologies that overcome the backup-availability gap you currently have for each application/ system/data set to be made available. Considerations around the scalability of your plan, since the organization plans to grow. Your plan should focus more on the steps IT needs to take today, and then incorporate steps needed when a recovery event occurs. For example, let s consider the simple availability plan for a server responsible for processing credit card transactions off of your ecommerce website. The organization says that system can be down no more than 10 minutes and lose no more than 10 minutes of data. Your plan will define the recovery objectives, the daily method of availability used (in this case, you will probably rely on either continuous recovery or replication), to what degree this meets the recovery objectives, the daily/weekly/monthly testing or validation of the standby system, and what the failover and failback plan is at the time of a recovery event. TESTING THE PLAN Another reason some IT organization s plans are not worth much is that they ve never been tested. Keeping with the theme that your plan is likely more a should something bad happen set of steps, if the badness never happens, you probably haven t put the plan to the test. Given that an availability plan is far YOUR PLAN SHOULD FOCUS MORE ON THE STEPS IT NEEDS TO TAKE TODAY, AND THEN INCORPORATE STEPS NEEDED WHEN A RECOVERY EVENT OCCURS. more daily in nature, there are a few ways you can test out your plan. (And if you re still in legacy BC/DR plan mode, you can use these same testing methods as well!) Table-top testing Sit down with all the relevant application and line of business owners, along with pertinent members of IT and walk through the steps you ll be taking and allow for questions and objections along the way. This talking it out method at very least will reveal any glaring issues with the plan. Backup and Replica testing It s one thing to perform a backup or create a replica. It s another to know if it s actually good. Performing a test restore will proactively let you know if you are [ 6 ]

8 ready. Some backup solutions also include this where image-level backups can be restored to a system and automatically test to see if the logon screen in Windows comes up. It s yet another way of validating a backup is good. BUT WITH A FAILBACK, YOU NEED TO PLAN, SINCE THERE IS NO ACCEPTABLE OUTAGE TIMEFRAME (BECAUSE EVERYTHING IS WORKING). Virtual lab Now we re getting into some real testing, where your concern is greater than just a given backup, and focuses on whether you can get an entire application along with its dependencies back up and running. An added benefit of a virtual lab is its use in a number of other parts of your business including, the development of applications, and the testing of patches just prior to releasing to production Failover, failback while not a testing method, it s important to make sure your test includes both steps, as the failover includes a compelling event a system failure so it s acceptable to have some level of outage. But with a failback, you need to plan, since there is no acceptable outage timeframe (because everything is working). Whether manually done, using advanced DR orchestration services, or something in between, the process of testing both failover and failback should be a part of the testing plan. EXECUTING AVAILABILITY At this point, you already realize execution of availability is not about the moment of recovery. Availability is about a consistent state of being operational. So, your execution of the plan needs to also be done in a consistent manner. Execution of the plan should include: Visibility into backups ensuring you have notifications around the success or failure of backup jobs is the first step. Daily/Weekly/Monthly Testing using any of the methods mentioned previously, test IT s ability to deliver against the objectives. Even if you ve tested the recovery of a given application once to your cloud-based recovery location, remember, as the company grows, so does your data and the complexity of your system integrations, making it necessary to continually test. Ensuring ability to meet SLAs The testing will also prove out whether IT can meet availability SLAs or not. It also provides a way to communicate this to key stakeholders. [ 7 ]

9 Documentation to prove compliance Many compliance standards have a contingency aspect built in. By documenting the daily availability work being done, you will easily be able to demonstrate to auditors that you have tested DR plans in place and that they are updated, tested and documented regularly. These steps all ensure you have an ability to respond appropriately according to your SLA and recovery objectives when the moment of recovery happens, meeting your organization s definition of availability. EXECUTING A SUCCESSFUL BC/DR PLAN Up until now, you ve spent your planning focused on the what needed to be backed up, which also determines what can be recovered (and how, and in what timeframe). But, it s evident in the midst of a digital transformation, the transformed business needs its BC/DR sights set on maintaining availability over just having an ability to recover. By achieving availability, the transformed business will, in turn, achieve an ability to recover from a disaster and maintain business continuity. By focusing first on the organization s definition of availability for each critical system or application, IT shifts its planning from one of what application needs to be backed up to one of how do I keep this application running continuously. Multi-faceted BY ACHIEVING AVAILABILITY, THE TRANSFORMED BUSINESS WILL, IN TURN, ACHIEVE AN ABILITY TO RECOVER FROM A DISASTER AND MAINTAIN BUSINESS CONTINUITY. testing of backups and the planned recovery process should be performed frequently, ensuring IT s readiness, and increasing the confidence of the plan s ability to succeed. Availability is about the organization running every minute of every day. By executing a plan to achieve availability, IT shifts its thinking from being about a specific moment of turmoil, to an on-going state of readiness. When you achieve this state of readiness complete with visibility into backups and testing you will know you have executed a successful availability plan, and are ready for what may come. Nick Cavalancia is founder & chief techvangelist at Techvangelism. Nick has 20 years of enterprise IT experience, and is an accomplished consultant, speaker, trainer, writer, and columnist. He has several certifications including MCSE, MCT, Master CNE and Master CNI. He has authored, co-authored and contributed to over a dozen books on Microsoft technologies. [ 8 ]