Our Chief Technology Architect, Rus Healy, was one of the cloud computing experts featured in a recent CloudEndure blog post exploring the pros and cons of cloud-based disaster recovery and best practices moving into 2018.
Here is what Rus had to say:
1. What are your disaster recovery predictions for 2018?
DR has shown consistently higher demand over the past several years, and I think that trend will continue. Based on the rate of growth we’ve seen, I expect that we will double the number of cloud DR customers in 2018. More of our larger customers will protect a larger percentage of their workloads in the public cloud, and those customers will also seek a multi-cloud approach to DR and production. DR will continue to pull through production workloads as well as dev and QA workloads by demonstrating the value of the public cloud.
2. What are the advantages and disadvantages of using the cloud for disaster recovery?
Cloud disaster recovery has several key advantages. Probably the most important one is much lower cost than on-premises DR. You eliminate a whole set of infrastructure that is expensive to procure, the recurring hardware and software maintenance costs, the physical space, power, cooling, and the hours of maintenance and support for all of that infrastructure. You also gain compliance and elasticity. You can test DR at will, paying only for the run time of the machines you bring up. And you can use those DR instances for more than just DR and DR testing — you can test in-place upgrades, provide isolated access to a vendor with production data (but not in a production-impacting way), and perform many other valuable business operations.
Disadvantages, for most end users, include the effort and time needed to get started with the public cloud, to learn the security, configuration, and management methods that are unique to the cloud environment, and to make the business case for using the public cloud. Many companies still have not taken these steps.
That’s where good partners come in — to help customers along in that journey. Understanding how the cloud is different than on-premises infrastructure, and to design for those differences, is another challenge that many companies face. Finally, understanding cloud networking and connectivity to on-premises data centers is a new challenge for many customers — even those who have been working in the public cloud for various workloads for a period of time.
3. Is the public cloud “tough enough” (to be used as a target) for disaster recovery?
Absolutely, yes — but you must choose cloud providers carefully. We feel that AWS has a clear and substantial advantage over other cloud providers in that regard. Their scale, architecture, and durability offer strong advantages over other cloud providers, especially in the US. You have to be diligent in evaluating cloud providers to understand where their weaknesses are in performance, downtime, cost, and other factors, before making a decision.
4. What are the biggest disaster recovery challenges for enterprises and large organizations?
The complexity of on-premises workloads is significant even in mid-size organizations. Enterprises face many challenges, including prioritizing protection for workloads and understanding all of the dependencies between their workloads. DR involves many teams, and an evaluation and design process that they are probably not familiar with. The good news is that you can start small, get protection in place, and work through the challenging areas while still being able to recover to the public cloud in a DR event.
5. What would you include in a disaster recovery plan checklist?
A good evaluation of on-premises workloads and their dependencies, as well as an understanding of what cannot be protected easily (or at all) to the cloud.
A successful pilot project, run in cooperation with a good services partner.
Replication software that allows for very low RPO, low RTO, and supports failback to on-premises or to another cloud provider.
A solid set of partners with great support and performance history.
Good documentation — a real DR run book for your DR environment, which also captures the configuration of your environment so that it can serve you best in a DR event.
A team of people who can support you, internally and through managed services, when you need them (DR events and DR testing).
6. What is your best (or worst) IT disaster recovery story?
I was working just outside a data center once when a component in one of the UPS units exploded. The fire suppression system was triggered and would have destroyed millions of dollars’ worth of equipment if we had not been there to disable it before it went off. We were fortunate, but we also realized how crippling a complete outage of a data center can be, and how important it is to get your machines and data protected off-premises.