Catalog the workload data

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load your data for analytics. AWS Glue provides crawlers to determine the schema and stores the metadata in the .

Create the Crawler

  1. Open the AWS Glue console, and from the left navigation pane, choose Crawlers.
  2. Select Add crawler and name the crawler well-architected-reporting, select Next.
  3. Select Next to accept the defaults for Specify crawler source type.
  4. Add the S3 path of the where you will store the extracted AWS Well-Architected data e.g. s3://well-architected-reporting-blog. Select
  5. Select No and then Next to on the Add another data store step.
  6. Select Create an IAM role and provide a name, e.g. well-architected-reporting , select Next.
  7. Select Run on demand as the schedule frequency. Select Next.
  8. Next select Add database, and fill-in a name e.g. war-reports. Select Create and then Next.
  9. Review the configuration and select Finish to create the Crawler.

Image of Crawler configuration.

Run the Crawler

  1. Find the crawler that was just created, select it, and then choose Run Crawler.
  2. Wait for the crawler to complete running, which should take approximately one minute.
  3. From the left navigation pane, choose Databases.
  4. Find the database that was created during the Crawler creation, select it and choose View Tables.
  5. In the Name field, you should see “workloadreports”. Select this and examine the metadata that was discovered during the crawler run, as shown in Figure 6. The workloadreports table details include fields for database, classification, location, last updated, input format, table properties, and more. The Schema section of the page displays columns for column name, data type, partition key, and comment.