In today’s world, it is possible and we have all the technology needed to guarantee very high availability rates. The introduction of cloud-computing allows us to remove the complexity of maintaining highly available workloads without having to understand the intricate details behind the curtains.
Nevertheless, companies are still confused and are often unversed on where to start when they need to invest on Disaster Recovery and Business Continuity. Here’s a quick framework to provide a survey of this process.
Business Impact Analysis
Start by looking at what you currently have. Depending on the size of your organization, this mayt be a long-term project, since the multitude of applications a large company has can be overwhelming. However, it is an important task. You need to know what you have, how important those workloads are, and what the impact of not having them available would cause to your business. Evaluate things like:
|Financial impact||Lost sales or delayed income, increased expenses, regulatory fines, penalties or customer dissatisfaction|
|Disruption duration and Timing||Depending on the duration of the disruption, the effects can be easily mitigated. On the other hand, a customer store that suffers a disruption in the weeks before the holidays can have serious losses.|
|Questionnaire||Use a questionnaire to inquire people with the most knowledge of the business, asking them to identify the impact of a disruption in the manufacturing process occurs, for example. This will allow you to identify critical processes needed for the continuity of the business.|
|Reporting||Make sure to report the detailed impact resulting from the disruption of the processes. The costs of mitigations should be compared with the assessed recovery costs.|
Security and Compliance Assessment
Once you have determined the main priorities among your applications, an assessment of the security impact of having your data somewhere else other than your company is a great next step. What are the risks? Is it possible? Is this going to affect your compliance? Are you allowed to have your data stored where your provider will store the data?
Most disaster recovery solutions will allow you to continue normal operations should an event occur. For that to be successful, certain things need to be in place for this scenario to be possible, prior to engaging the solution.
|To allow for the configuration of your Cloud disaster recovery solution, you need to have your tenancy established and aspects like Subscription accesses, governance and access need to be in place.|
|Once you can use your cloud resources, basic networking needs to be in place, like a compatible virtual network (and test networks) as well as connectivity to your on-premises infrastructure.|
|With both items above stablished, you need to review whether services like Active Directory and DNS (and maybe others) are available in your cloud/hybrid environment.|
With all requisites above in place, there’s one final step before you can actually deploy your solution: compatibility check. If your solution won’t support your hypervisor platform or won’t be able to replicate the OS version you currently have, the process cannot continue. At this point, it is important to review all applications, hypervisor layers, and operating systems, as well as specific configuration of basic services like SQL, since certain cluster configurations won’t be compatible or will require configuration to allow for the replication to occur.
Another aspect you need to review regarding workloads is current usage. Some applications will require very fast storage to work properly, or some specific networking feature, like load balancers or external IPs. More basic items like the current CPU load and memory need to be documented and assessed to properly configure the recovery environment.
Once you get to this point, proper setup of the DR solution is paramount. Using the proper storage types, machine sizes and networking configuration will provide optimum operation in case of a failover. Make sure you research and learn all possible ways to implement the solution (ASM vs ARM, types of storage, VNET configuration, machine size configuration, page file drives) or hire a specialist to make the configuration.
When all technical aspects are prepared, a strategy is vital. Your application may be very simple. A single self-contained server, perhaps. But more than often, applications are composed of multiple servers and layers. The proper configuration of how those layers will failover in Azure is critical for the success of you tests and actual failover.
Each workload should be mapped, detailed and documented. For each one of them, a detailed plan needs to be built. Each plan will consist of configuring groups of servers, the order for failover and the step prior, during (in between groups) and after the plan is executed. These tasks can be manual or automated and will typically prepare services like DNS and load balancers for the new environment configuration in the Cloud.
Operating and Testing
Everything is in place and your application is ready. Testing will be an important step to guarantee the future success in case a failover is required. The tools available today will allow you to test the workloads running in the cloud with little or no impact in your existing infrastructure. This will allow for a full verification of the plan without having to actually failover the environment and disrupt production.
Another key aspect is to maintain the health of the replication. Machines that lag too much will require re-synchronization, implying more costs and time to get operations to the normal. Also, machines that change too much in a short period of time may need to be replicated more often to avoid re-sync. Monitoring the status of the replication is very important.
All done! Your enterprise is ready for the worst, and you are free to hope for the best. You have a plan, so you know where you want to get with your replication and can sleep tight at night, knowing your environment is ready.