Data Center Disaster Recovery Planning: A Comprehensive Guide

Summarize with:

read in < 1 min

Data centers are often at the core of business operations, including and especially business-critical operations. That means, if a disaster strikes them, the consequences can be severe. For that reason, it’s important to incorporate data center disaster recovery planning into your IT management. Here is a comprehensive guide to what you need to know.

Risk assessment

A comprehensive risk assessment lays the foundation for an effective strategy to safeguard your data center’s operations. The aim of the risk assessment is to identify risks and to assess both their statistical likelihood and their potential impact.

Ideally, this impact should be stated in measurable terms. In the context of business, this typically means financial terms. This is likely to require a business to find a way to assign financial value to intangible concepts such as industry reputation. One way to do this is to assess their impact on your sales and hence your revenue and bottom line.

Once you have identified the risks you are facing, you can then look for ways to eliminate or mitigate them. You will need to evaluate the cost of these measures. This will include the cost of any disruption they cause in themselves. For example, security measures will typically slow down operations. This will reduce productivity.

Continuity planning

The next step in data disaster recovery planning is continuity planning. This is based on a Business Impact Analysis (BIA). As the name suggests a BIA identifies and defines business functions and processes. It then assesses how critical they are to business operations.

This assessment lays the groundwork for defining Recovery point objectives (RPOs) and Recovery time objectives (RTOs). RPOs set out the point from which data must be recoverable. This means they essentially define the acceptable data loss threshold. RTOs specify the maximum allowable downtime for data center operations post-disaster.

Once these have been defined, the business can then move on to create a Disaster Recovery Plan (DRP). This is essentially a manual for responding to a disaster. As such, it should contain everything anyone could need to know in a disaster-recovery scenario. It should also be available in a disaster situation. This is likely to mean having offline copies of it.

Data backup solutions

The fundamental rule of data backups is known as the 3-2-1 rule. It means that there needs to be at least 3 copies of data, held in at least 2 locations of which at least 1 must be offsite. Offsite does not have to mean offline. It is permissible to keep a data backup in the cloud. It just needs to be a different cloud from the one you use for regular operations.

This leaves two key points to address. These are your data backup strategies and your data backup locations. These two points will often influence each other.

There are three main types of data backup strategies. These are full, incremental, and differential. Full backups backup all data. Incremental backups backup all data since the last backup. Differential backups backup all data since the last full backup. Full backups are the most resource-intensive and incremental backups are the least resource-intensive. Differential backups are in between the two.

For completeness, it’s worth highlighting that the term “data” extends beyond the data you need for business operations. It also includes system data such as server configurations. If you’re using virtual machines then it’s advisable to take regular snapshots of your configurations and back these up.

Your choice of data storage locations are on-site (including in your regular cloud), off-site and offline (i.e. in a secure storage facility) and offsite and online (i.e. in another cloud). Some businesses may choose to do all three.

Redundancy and high availability

In addition to backing up data, you should also back up your key infrastructure. This would typically mean power and cooling, network connections, and core hardware such as servers. To make the most of this redundancy and high availability, you need mechanisms to manage workloads effectively and failover mechanisms.

You should also consider whether you need or would benefit from distributing your operations across geographically diverse locations. This approach can feed into other areas of your business. For example, using content delivery networks (CDNs) can speed up content delivery during business-as-usual operations. It can also serve as a backup option in disaster situations.

Testing your disaster recovery plan

No matter how good a disaster recovery plan looks on paper (or on screen), it needs to be tested in the real world. In fact, it needs to be regularly tested in the real world because it will almost certainly need to be updated from time to time (if not regularly).

Common types of disaster recovery planning tests include the following.

Disaster recovery plan checklist tests: These tests involve a review of your plan to ensure all critical steps are documented and organized.

Parallel tests: In a parallel test, you run your primary and backup systems side by side to see if they perform as expected.

Simulation tests: You simulate a specific disaster event to see how your team and systems respond.

Full interruption tests: You temporarily switch to your backup systems to mimic a real disaster and assess how well your plan works.

Related Resources:

How To Decide If Managed DRaaS Is Right For You

What You Need To Know About DraaS Pricing