Lab complete!
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool
This lab focuses on optimizing data patterns for sustainability, specifically focused on removing unneeded or redundant data, and minimizing data movement across networks.
At the end of this lab you will:
us-east-1
and AWS us-west-1
regions (referred to us-east-1
region, and us-west-1
region throughout the lab) using Redshift ra3 nodesus-east-1
and us-west-1
regions. You may be able to run this in other AWS regions of your choice where Redshift ra3 nodes are available
, but results may vary.NOTE: You will be billed for any applicable AWS resources used if you complete this lab that are not covered in the AWS Free Tier . Amazon Redshift ra3 nodes are not part of Amazon Redshift Free trial, or AWS Free Tier. When you decide to stop the lab at any point in time, please revisit the clean up instructions at the end so you stop incuring cost (e.g. for storage in Amazon S3).
Estimated time required to complete this lab is 90 minutes.
AnyCompany (an fictional event management organization) is running a central data warehouse environment on Amazon Redshift in us-east-1
region, which is used by various departments in the organization for their respective storage, analytical processing, and reporting. The marketing department is the top consumer of the data warehouse, and have data engineers, analysts, and scientists based out of US west coast. The marketing team has implemented their own Amazon Redshift cluster in us-west-1
(consumer) region, which refreshes nightly using an Amazon Redshift snapshot received from us-east-1
region (producer) and uploading to the us-west-1
(consumer) region Amazon Redshift cluster. Since the marketing team analytical processing consumes lots of resources & integrated with their west coast based on-premise hosted downstream applications, they perform their analytical processing in us-west-1
region, and other departments use us-east-1
region hosted data warehouse. This requires storing a redundant dataset in us-west-1
region, and transferring huge amounts of data (via nightly ETL feed) over the network between AWS regions.
This is not a sustainability friendly implementation, and can be optimized using AWS Well-Architected Sustainability Pillar best practices for data patterns. Also, with this approach, the insights generated by the Marketing department are not based on live data.
In this case, optimization areas include:
us-west-1
.us-east-1
and us-west-1
regions, this reduces network traffic.By introducing Amazon Redshift Data Sharing feature, the marketing department can optimize their implementation for sustainability, avoiding redundant storage & reducing data transfer between AWS regions. Data sharing enables instant, granular, and fast data access across Amazon Redshift clusters without the need to copy or move it. With data sharing, you have live access to data, so that your users can see the most up-to-date and consistent information as it’s updated in Amazon Redshift clusters.
Redshift environment before implementing Data Sharing feature
Both, producer and consumer cluster size is 640 MB each - Total storage consumed is 1280 MB:
Redshift environment after implementing Data Sharing feature
Producer cluster size is 640 MB whereas consumer cluster size is 0 MB - Total storage consumed is 640 MB:
The improvement goals of this lab are to:
This lab use case focuses on removing unneeded or redundant data, and minimizing data movement across network. For more details, refer to Sustainability Pillar Whitepaper which explains the iterative process that evaluates, prioritizes, tests, and deploys sustainability-focused improvements for cloud workloads.
To evaluate specific improvements, understand the resources provisioned by your workload to complete a unit of work. Evaluate potential improvements, and estimate their potential impact, the cost to implement, and the associated risks. To measure improvements over time, first understand what you have provisioned in AWS and how those resources are being consumed.
Refer to Sustainability Pillar Whitepaper for detailed understanding around evaluating specific improvements. At high level:
Use Proxy metrics to measure the resources provisioned to achieve business outcomes. (To derive metrics from AWS Cost and Usage reports check out this Well-Architected Lab )
For this lab, we will use these proxy metrics:
us-west-1
region Redshift cluster, and how much data is transferred over the network between producer (us-east-1
) and consumer (us-west-1
) clusters across regions for data replication:Select business metrics to quantify the achievement of business outcomes. Your business metrics should reflect the value provided by your workload, for example, the number of simultaneous active users, API calls served, or the number of transactions completed. For this lab, we will use total number of events held (business outcome) as business metric.
To calculate a sustainability key performance indicator (KPI), we will use the following formula, divide the provisioned resources by the business outcomes achieved to determine the provisioned resources per unit of work:
Our improvement goal is to:
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool