Simulate an Application Issue

Understanding the health of your workload is an essential component of Operational Excellence. Defining metrics and thresholds, together with appropriate alerts will ensure that issues can be acknowledged and remediated within an appropriate timeframe.

In this section of the lab, you will simulate a performance issue within the API. Using Amazon CloudWatch synthetic, your API will utilize a canary monitor, which continuously checks API response time to detect an issue.

In this example, should the API take longer than 6 seconds to respond, an alert will be created, triggering a notification email.

Actions items in this section:

  1. You will run a script that will send a large amount of traffic to the API.
  2. You will observe and confirm the issue through AWS monitoring tools.

The following resources had been deployed to perform these actions.

Section3 Base Architecture

2.0 Sending traffic to the application

In this section, you will send multiple concurrent requests to the application, simulating a large surge of incoming traffic. This will overwhelm the API, which will gradually increase the response time of the application. This results in the canary monitoring exceeding the set threshold, triggering the CloudWatch Alarm to send notification.

Follow below steps to continue:

  1. From the Cloud9 terminal, run the command shown below to change directory to the working script folder:

    cd ~/environment/aws-well-architected-labs/static/Operations/200_Automating_operations_with_playbooks_and_runbooks/Code/scripts/
  2. Confirm that you have the test.json in the folder and it contains the following text:

    {"Name":"Test User","Text":"This Message is a Test!"}
  3. Go to CloudFormation console and take note of the OutputApplicationEndpoint value under Output tab of walab-ops-sample-application stack. This is the DNS endpoint of the Application Load Balancer.

    Section3 Succces Screenshot

  4. Execute the command below, replacing the ‘OutputApplicationEndpoint’ with the DNS endpoint value you recorded previously:

    bash OutputApplicationEndpoint

    This script uses the Apache Benchmark to send 60,000,000 requests, 3000 concurrent request at a time.

    When you run the command you will see the output gradually change from a consistently successful 200 response to include 504 time-out responses.

    The requests generated by the script are overwhelming the application API and result in occasional timeouts by your load balancer.

    Keep the command running in the background as you proceed through the lab.

    Section3 Succces Screenshot

    Section3 Failure Screenshot

2.1 Observing the alarm being triggered.

  1. After approximately 6 minutes, you will see an alarm which is triggered as a response to the generated activity. This will trigger an email indicating that the CloudWatch alarm has been triggered.

    Section3 Email

  2. Check and confirm the alarm by going to the CloudWatch console.

  3. Click on the Alarms section on the left menu.

  4. Click on the Alarms called mysecretword-canary-duration-alarm, which should be in an alarm state.

    Section3 Failure Screenshot

  5. Click on the alarm to display the CloudWatch metrics that the alarm data is based from.

  6. The alarm is based on the Duration metric data emitted by the mysecretword-canary CloudWatch synthetic canary monitor. The Duration metric measures how long it takes for the canary requests to receive a response from the application.

  7. The alarm is triggered whenever the value of the Duration metric is above 6 seconds within a 1 minute duration.

    Section3 Failure Screenshot

  8. On the left menu click on Synthetics and locate the canary monitor named mysecretword-canary.

    Section3 Canary

  9. Click on the canary and the select the Configuration tab.

  10. From here you will see the canary configuration and a snippet of the canary script.

  11. In the canary script section, scroll down to the section that contains let requestOptionStep1 as shown in the screenshot below. This is the configuration that controls the destination of the request (hostname, path and payload body).

    Section3 Canary

  12. Click on the Monitoring tab.

  13. From here you will see the visualization of the metrics that the canary monitor generates.

  14. Locate the ‘Duration’ metric that is being used to trigger the CloudWatch alarm.

  15. You will see the average duration value of the canary request representing the time to complete. A value above 6000ms signifies that the request has taken more than 6 seconds to receive a response from the application, indicating a performance issue in the API.

    Section3 Canary

You have now completed the second section of the lab.

You should still have the running in the background, simulating a large influx of traffic to your API. This causes the application to respond slowly and time-out periodically. The CloudWatch Alarm will be triggering and performance issue notifications sent to your System Operator to prompt them into action.

This concludes Section 2 of this lab. Click ‘Next step’ to continue to the next section of the lab where we will build an automated playbook to assist investigation of the issue.