References & useful resources
Additional information on multi-region strategies for disaster recovery (DR)
Recovery Time Objective (RTO) and Recovery Point Objective (RPO)
These terms are most often associated with Disaster Recovery (DR), which are a set of objectives and strategies to recover workload availability in the case of a disaster
- Recovery time objective (RTO) is the overall length of time that a workload’s components can be in the recovery phase, and therefore not available, before negatively impacting the organization’s mission or mission/business processes.
- Recovery point objective (RPO) is the overall length of time that a workload’s data can be unavailable, before negatively impacting the organization’s mission or mission/business processes.
Use defined recovery strategies to meet defined recovery objectives
If necessary, when architecting a multi-region strategy for your workload, you should choose one of the following strategies. They are listed in increasing order of complexity, and decreasing order of RTO and RPO. DR Region refers to an AWS Region other than the one used for your workload (or any AWS Region if your workload is on premises).
- Backup and restore (RPO in hours, RTO in 24 hours or less): Back up your data and applications into the DR Region. Restore this data when necessary to recover from a disaster.
- Pilot light (RPO in minutes, RTO in hours): Maintain a minimal version of an environment always running the most critical core elements of your system in the DR Region. When the time comes for recovery, you can rapidly provision a full-scale production environment around the critical core.
- Warm standby (RPO in seconds, RTO in minutes): Maintain a scaled-down version of a fully functional environment always running in the DR Region. Business-critical systems are fully duplicated and are always on, but with a scaled down fleet. When the time comes for recovery, the system is scaled up quickly to handle the production load.
- Multi-region active-active (RPO is none or possibly seconds, RTO in seconds): Your workload is deployed to, and actively serving traffic from, multiple AWS Regions. This strategy requires you to synchronize users and data across the Regions that you are using. When the time comes for recovery, use services like Amazon Route 53 or AWS Global Accelerator to route your user traffic to where your workload is healthy.
The bi-directional cross-region replication that you created in this lab is helpful for Pilot light, Warm standby, and Multi-region active-active strategies.
Now that you have completed the lab, if you have implemented this knowledge in your environment, you should re-evaluate the questions in the Well-Architected tool. This lab specifically helps you with REL 13 How do you plan for disaster recovery (DR)?