Prepare Amazon Redshift Producer Cluster

Lab 2

We will first create the producer Amazon Redshift cluster (we will refer this as the producer cluster throughout the lab) in us-east-1 region, and will also load sample dataset which we will use for our sustainability use case.

Step-1: Create Redshift Producer Cluster

  1. Login into the AWS Console (make sure us-east-1 region is selected in top right corner), and click Create Cluster.

  2. Provide Cluster name as redshift-cluster-east, and select ra3.xlplus node type -

NOTE: If you get access error launching cluster with ra3.xlplus node type, then select ra3.4xlarge node type. Please note, Amazon Redshift Data Sharing feature is not supported for previous generation dc2 node types, and Amazon Redshift only supports data sharing on the ra3.16xlarge, ra3.4xlarge, and ra3.xlplus instance types for producer and consumer clusters. Amazon Redshift ra3 nodes incurs cost as these nodes are not part of the Amazon Redshift free trial, or AWS Free Tier.

Create Cluster

  1. Select “Load Sample data”.

  2. Supply a password for Admin user.

Sample Data

Other configuration settings can be left as default.

  1. Click the Create Cluster button – it will take few minutes to create cluster, and load sample data into database.

Step-2: Connect to database using query editor

Once the cluster is created (Status = Available), using one of the Amazon Redshift query editors is the easiest way to query the Amazon Redshift database. After creating your cluster, use the query editor v2 to connect to newly created database.

Query editor

Step-3: Validate database

  1. In the query editor, click on the newly created cluster, and it will establish connection to the database. You will then see two databases created automatically – dev, sample_data_dev. The dev database has one schema called public, which holds the 7 sample tables loaded during the cluster creation. Expand the public schema under dev database, and you will see list of tables. We will refer to this as producer database throughout the lab.

Query editor 2

  1. These tables were bootstrapped during cluster creation, and can’t be shared using Amazon Redshift Data Sharing feature. For this lab, we will use these bootstrapped tables to create our own tables to test the Amazon Redshift Data Sharing feature. Go to the query editor and execute these SQL commands:
CREATE TABLE lab_users AS SELECT * FROM users;
CREATE TABLE lab_venue AS SELECT * FROM venue;
CREATE TABLE lab_category AS SELECT * FROM category;
CREATE TABLE lab_date AS SELECT * FROM date;
CREATE TABLE lab_event AS SELECT * FROM event;
CREATE TABLE lab_sales AS SELECT * FROM sales;
CREATE TABLE lab_listing AS SELECT * FROM listing;

Query editor 3

  1. Once above CREATE TABLE commands are successfully completed, then drop the bootstrapped tables using below SQL commands. This will help with estimating data storage consumed, and comparison between producer and consumer databases.
DROP TABLE users;
DROP TABLE venue;
DROP TABLE category;
DROP TABLE date;
DROP TABLE event;
DROP TABLE sales;
DROP TABLE listing;

Query editor 4

So far, we have installed & configured the producer cluster, and loaded a sample dataset into the producer database in us-east-1 region. Next, we will install and configure the Amazon Redshift consumer cluster in us-west-1 region.