Lab complete!
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool
Failure injection (also known as chaos testing) is an effective and essential method to validate and understand the resiliency of your workload and is a recommended practice of the AWS Well-Architected Reliability Pillar. Here you will initiate various failure scenarios and assess how your system reacts.
Before testing, please prepare the following:
Region must be the one you selected when you deployed your WebApp
We will be using the AWS Console to assess the impact of our testing
Throughout this lab, make sure you are in the correct region. For example the following screen shot shows the desired region assuming your WebApp was deployed to Ohio region
Get VPC ID
<vpc-id>
is indicated in a commandGet familiar with the service website
This failure injection will simulate a critical problem with one of the three web servers used by your service.
Navigate to the EC2 console at http://console.aws.amazon.com/ec2 and click Instances in the left pane.
There are three EC2 instances with a name beginning with WebApp1. For these EC2 instances note:
Open up two more console in separate tabs/windows. From the left pane, open Target Groups and Auto Scaling Groups in separate tabs. You now have three console views open
To fail one of the EC2 instances, use the VPC ID as the command line argument replacing <vpc-id>
in one (and only one) of the scripts/programs below. (choose the language that you setup your environment for)
Language | Command |
---|---|
Bash | ./fail_instance.sh <vpc-id> |
Python | python fail_instance.py <vpc-id> |
Java | java -jar app-resiliency-1.0.jar EC2 <vpc-id> |
C# | .\AppResiliency EC2 <vpc-id> |
PowerShell | .\fail_instance.ps1 <vpc-id> |
The specific output will vary based on the command used, but will include a reference to the ID of the EC2 instance and an indicator of success. Here is the output for the Bash command. Note the CurrentState
is shutting-down
$ ./fail_instance.sh vpc-04f8541d10ed81c80
Terminating i-0710435abc631eab3
{
"TerminatingInstances": [
{
"CurrentState": {
"Code": 32,
"Name": "shutting-down"
},
"InstanceId": "i-0710435abc631eab3",
"PreviousState": {
"Code": 16,
"Name": "running"
}
}
]
}
Go to the EC2 Instances console which you already have open (or click here to open a new one)
Refresh it. (Note: it is usually more efficient to use the refresh button in the console, than to refresh the browser)
Observe the status of the instance reported by the script. In the screen cap below it is shutting down as reported by the script and will ultimately transition to terminated.
Watch how the service responds. Note how AWS systems help maintain service availability. Test if there is any non-availability, and if so then how long.
Refresh the service website several times. Note the following:
Load balancing ensures service requests are not routed to unhealthy resources, such as the failed EC2 instance.
Go to the Target Groups console you already have open (or click here to open a new one)
Click on the Targets tab and observe:
Status of the instances in the group. The load balancer will only send traffic to healthy instances.
When the auto scaling launches a new instance, it is automatically added to the load balancer target group.
In the screen cap below the unhealthy instance is the newly added one. The load balancer will not send traffic to it until it is completed initializing. It will ultimately transition to healthy and then start receiving traffic.
Note the new instance was started in the same Availability Zone as the failed one. Amazon EC2 Auto Scaling automatically maintains balance across all of the Availability Zones that you specify.
From the same console, now click on the Monitoring tab and view metrics such as Unhealthy hosts and Healthy hosts
Autos scaling ensures we have the capacity necessary to meet customer demand. The auto scaling for this service is a simple configuration that ensures at least three EC2 instances are running. More complex configurations in response to CPU or network load are also possible using AWS.
Go to the Auto Scaling Groups console you already have open (or click here to open a new one)
Click on the Activity History tab and observe:
The screen cap below shows that instances were successfully started at 17:25
At 19:29 the instance targeted by the script was put in draining state and a new instance ending in …62640 was started, but was still initializing. The new instance will ultimately transition to Successful status
Draining allows existing, in-flight requests made to an instance to complete, but it will not send any new requests to the instance. Learn more: After the lab see this blog post for more information on draining.
Learn more: After the lab see Auto Scaling Groups to learn more how auto scaling groups are setup and how they distribute instances, and Dynamic Scaling for Amazon EC2 Auto Scaling for more details on setting up auto scaling that responds to demand
Deploying multiple servers and Elastic Load Balancing enables a service suffer the loss of a server with no availability disruptions as user traffic is automatically routed to the healthy servers. Amazon Auto Scaling ensures unhealthy hosts are removed and replaced with healthy ones to maintain high availability.
Availability Zones (AZs) are isolated sets of resources within a region, each with redundant power, networking, and connectivity, housed in separate facilities. Each Availability Zone is isolated, but the Availability Zones in a Region are connected through low-latency links. AWS provides you with the flexibility to place instances and store data across multiple Availability Zones within each AWS Region for high resiliency. |
Learn more: After the lab see this whitepaper on regions and availability zones |
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool