The purpose of this lab is to teach you the fundamentals of using tests to ensure your implementation is resilient to failure by injecting failure modes into your application. This may be a familiar concept to companies that practice Failure Mode Engineering Analysis (FMEA). It is also a key component of Chaos Engineering, which uses such failure injection to test hypotheses about workload resiliency. One primary capability that AWS provides is the ability to test your systems at a production scale, under load.
It is not sufficient to only design for failure, you must also test to ensure that you understand how the failure will cause your systems to behave. The act of conducting these tests will also give you the ability to create playbooks to investigate failures. You will also be able to create playbooks for identifying root causes. If you conduct these tests regularly, then you will identify changes to your application that are not resilient to failure and also create the skills to react to unexpected failures in a calm and predictable manner.
In this lab, you will deploy a 3-tier resource, with a reverse proxy (Application Load Balancer), Web Application running on Amazon Elastic Compute Cloud (EC2), and MySQL database using Amazon Relational Database Service (RDS). There is also an option to deploy the same stack into a different region, which provides you the ability to progress from simpler component failure testing to failure testing under a simulated AWS regional failure.
Reduce fear of implementing resiliency testing by providing examples in common development and scripting languages
Illustrate how AWS Fault Injection Simulator (FIS) can implement chaos testing using AWS native tooling and integrations
Show how failure injection fits into the context of Chaos Engineering
Demonstrate resilience testing of EC2 instances
Demonstrate resilience testing of RDS Multi-AZ instances
Demonstrate resilience testing using Availability Zones failures
Demonstrate resilience testing of application failures.
Demonstrate resilience testing of S3 objects
Learn how to implement resiliency using those tests
Learn how to think about what a failure will cause within your infrastructure
Learn how common AWS services can reduce mean time to recovery (MTTR)
An AWS Account that you are able to use for testing, that is not used for production or other purposes.
An Identity and Access Management (IAM) user or federated credentials into that account that has permissions to create Amazon Virtual Private Cloud(s) (VPCs), including subnets, security groups, internet gateways, NAT Gateways, Elastic IP Addresses, and route tables. The credentials must also be able to create the database subnet group needed for a Multi-AZ RDS instance. The credential will need permissions to create IAM Role, instance profiles, AWS Auto Scaling launch configurations, application load balancers, auto scaling group, and EC2 instances.
An IAM user or federated credentials into that account that has permissions to deploy the deployment automation, which consists of IAM service linked roles, AWS Lambda functions, and an AWS Step Functions state machine to execute the deployment.
An IAM user or federated credentials into that account that has permissions to create experiment templates and run experiments using FIS.
This 300 level lab covers multiple failure injection scenarios. If you would prefer a simpler 200 level lab that demonstrates only EC2 failure injection, then see Level 200: Testing for Resiliency of EC2 instances. This 300 level lab here includes everything in the 200 level lab, plus additional failure simulations.
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.