Implement shuffle sharding

In this section we will update the architectural design of the workload and implement shuffle sharding. Shuffle sharding is a combinatorial implementation of a sharded architecture. With shuffle sharding we create virtual shards with a subset of the capacity of the workload ensuring that the virtual shards are mapped to a unique subset of customers with no overlap. By minimizing the number of Workers a single customer is able to interact with within the workload, and spreading resources in a combinatorial way, we will be able to further reduce the impact of a potential posion pill. In a shuffle sharded system, the scope of impact of failures can be calculated using the following formula:

ScopeShuffleSharding-1

The formula can be expanded to calculate the number of unique combinations that can exist given the number of workers and the number of workers per shard, also referred to as shard size. The calculation is performed using factorials.

ScopeShuffleSharding-2

ScopeShuffleSharding-3

For example if there were 100 workers, and we assign a unique combination of 5 workers to a shard, then the failure of any 1 shard will only impact 0.0000013% of customers.

ScopeShuffleSharding-4

ScopeShuffleSharding-5

ArchitectureShuffleSharding

Update the workload architecture

  1. Go to the AWS CloudFormation console at https://console.aws.amazon.com/cloudformation and select the stack that was created as part of this lab - Shuffle-sharding-lab

  2. Click on Update

    CFNUpdateStack

  3. Under Prerequisite - Prepare template, select Replace current template

    • For Template source select Amazon S3 URL
    • In the text box under Amazon S3 URL specify https://aws-well-architected-labs-virginia.s3.amazonaws.com/Reliability/300_Fault_Isolation_with_Shuffle_Sharding/shuffle-sharding.yaml

    CFNReplaceTemplateShuffleSharding

  4. Click Next

  5. No changes are required for Parameters. Click Next

  6. For Configure stack options click Next

  7. On the Review page:

    • Scroll to the end of the page and select I acknowledge that AWS CloudFormation might create IAM resources with custom names. This ensures CloudFormation has permission to create resources related to IAM. Additional information can be found here .

    Note: The template creates an IAM role and Instance Profile for EC2. These are the minimum permissions necessary for the instances to be managed by AWS Systems Manager. These permissions can be reviewed in the CloudFormation template under the “Resources” section - InstanceRole.

    • Click Update stack

    CFNIamCapabilities

This will take you to the CloudFormation stack status page, showing the stack update in progress. The stack takes about 1 minute to go through all the updates. Periodically refresh the page until you see that the Stack Status is in UPDATE_COMPLETE.

With this stack update, the architecture of the workload has been updated by introducing 6 Application Load Balancer listener rules and Target Groups. These listener rules have been configured to inspect the incoming request for a query-string name. Depending on the value provided, the request is routed to one of six target groups where each target group consists of 2 EC2 instances.

Test the shuffle sharded application

Now that the application has been deployed, it is time to test it to understand how it works. The sample application used in this lab is the same as before, a simple web application that returns a message with the Worker that responded to the request. Customers pass in a query string with the request to identify themselves. The query string used here is name.

  1. Copy the URL provided in the Outputs section of the CloudFormation stack created in the previous string. Append the query string /?name=Alpha to the URL and paste it into a web browser. The full string should look similar to this - http://shuffle-alb-1p2xbmzo541rr-1602891463.us-east-1.elb.amazonaws.com/?name=Alpha

    CFNOutputs

  2. Refresh the web browser a few times to see that responses are being returned from different EC2 instances on the back-end. Notice that after implementing shuffle sharding, you are seeing responses being returned from only 2 instances for customer Alpha’s requests. No matter how many times you refresh the page or try a different browser, customer Alpha will only receive responses from 2 EC2 instances. This is because we have created Application Load Balancer listener rules that divert traffic to a specific subset of the overall capacity of the workload, also known as a shard. Each customer has a unique combination of EC2 instances that will respond to requests with no 2 customers having the same combination. The following diagram provides a breakdown of how customers are mapped to EC2 instances.

    ShuffleShardedFlow

  3. Update the value for the query string to one of the other customers, the possible values are - Alpha, Bravo, Charlie, Delta, Echo, and Foxtrot

  4. Refresh the web browser multiple times to verify that customers are receiving responses only from EC2 instances in the shard they are mapped to

    Customer NameWorkers
    AlphaWorker-1 and Worker-2
    BravoWorker-1 and Worker-3
    CharlieWorker-1 and Worker-4
    DeltaWorker-2 and Worker-3
    EchoWorker-2 and Worker-4
    FoxtrotWorker-3 and Worker-4