You already observed that all three EC2 instances are successfully serving requests
In a new tab navigate to ELB Target Groups console
Click on the Targets tab (bottom half of screen)
Under Registered Targets observe the three EC2 instances serving your web service
Note that they are all healthy (see Status and Description)
From the Target Groups console, now click on the the Health checks tab
Copy the URL of the web service to a new tab and append
/healthcheck to the end of the URL
The new URL should look like:
Refresh several times and observe the health check on the three servers
Note the check is successful
The EC2 servers receive user requests (for a TV show recommendation) on the path
/ and they receive health check requests from the Elastic Load Balancer on the path
# Healthcheck request - will be used by the Elastic Load Balancer elif self.path == '/healthcheck': # Return a healthy code self.send_response(200) self.send_header('Content-type', 'text/html') self.end_headers()
You will now simulate a complete failure of the RecommendationService. Every request in turn makes a (simulated) call to the getRecommendation API on this service. These will all fail for every request on every server.
The RecommendationServiceEnabled parameter is used only for this lab. The server code reads its value, and simulates a failure in RecommendationService (all reads to the DynamoDB table simulating the service will fail) when it is false.
Refresh the test web service multiple times
You can observe this by opening a new tab and navigating to ELB Load Balancers console:
Click on the Monitoring tab (bottom half of screen)
Compare these metrics to those for the target group (the EC2 servers themselves)
The getRecommendation API is actually a
get_item call on a DynamoDB table. Examine the server code to see how errors are currently handled
The server code running on each EC2 instance can be viewed here
Search for the call to the RecommendationService. It looks like this:
response = call_getRecommendation(self.region, user_id)
Choose one of the options below (Option 1 - Expert or Option 2 - Assisted) to improve the code and handle the failure
You may choose this option, or skip to Option 2 - Assisted option
This option requires you have access to place a file in a location accessible via https/https via a URL. For example a public readable S3 bucket, gist (use the raw option to get the URL), or your private webserver.
If you completed the Option 1 - Expert option, then skip the Option 2 - Assisted option section and continue with 2.3.3 Error handling code
Error handlingin the comments (occurs twice). What will this code do now if the dependency call fails?
Navigate to the AWS CloudFormation console
Click on the HealthCheckLab stack
Leave Use current template selected and click Next
Find the ServerCodeUrl parameter and enter the following:
Click Next until the last page
At the bottom of the page, select I acknowledge that AWS CloudFormation might create IAM resources with custom names
Click Update stack
Click on Events, and click the refresh icon to observe the stack progress
This is the error handling code from server_errorhandling.py . The Option 2 - Assisted option uses this code. If you used the Option 1 - Expert option, you can consult this code as a guide.
After the new error-handling code has successfully deployed, refresh the test web service page multiple times. Observe:
Refer back to the newly deployed code to understand why the website behaves this way now
The Website is working again, but in a degraded capacity since it is no longer serving personalized recommendations. While this is less than ideal, it is much better than when it was failing with http 502 errors. The RecommendationService is not available, so the app instead returns a static response (the default recommendation) instead of the data it would have obtained from RecommendationService.
|Well-Architected for Reliability: Best practice|
|Implement graceful degradation to transform applicable hard dependencies into soft dependencies: When a component’s dependencies are unhealthy, the component itself can still function, although in a degraded manner. For example, when a dependency call fails, instead use a predetermined static response.|