Lab complete!
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool
We will use AWS Glue and setup a scheduled Crawler, which will run each day. This crawler will scan the CUR files and create a database and tables for the delivered files. If there are new versions of a CUR, or new months delivered - they will be automatically included.
We will use Athena to access and view our CUR files via SQL. Athena is a serverless solution to be able to execute SQL queries across very large amounts of data. Athena is only charged for data that is scanned, and there are no ongoing costs if data is not being queried, unlike a traditional database solution.
Go to the Glue console:
Click on Get started if you have not used Glue before
Ensure you are in the region where your CUR files are delivered, click on Crawlers and click Add crawler:
Enter a Crawler name starting with Cost, and click Next:
Select Data stores, and click Next:
Ensure you select Specified path in another account, and enter the S3 path of your bucket s3://(CUR bucket), expand Exclude patterns, enter the following patterns one line at a time and click next:
**.json, **.yml, **.sql, **.csv, **.gz, **.zip, **/cost_and_usage_data_status/*, aws-programmatic-access-test-object
If you replicated the objects to the Cost Management Account or if using in the same account select Specified path my account instead of Specified path in another account.
Add another data store, click Next:
Select Create an IAM role, enter a role name of Cost_Crawler, and click Next:
Click the Down arrow, and select a Daily Frequency:
The CUR refreshes multiple times a day. You can adjust your frequency if you would like to run it more than once a day.
Enter in a Start Hour and Start Minute, then click Next:
Click Add database:
Enter a Database name of cost, and click Create:
NOTE: Your Database name cannot contain - |
---|
Configure the Crawler’s output by updating the following fields then select **Next:
Review the crawler and click Finish:
Select the checkbox next to the crawler, click Run crawler:
You will see the Crawler was successful and created a table:
Click Databases
Select the cost database that Glue created:
Click Tables in cost:
Click the table name:
Verify the recordCount is not zero, if it is - go back and verify the steps above:
Go to the Athena Console:
Select the drop down arrow, and click on the new database:
A new table will have been created (named after the CUR), we will now load the partitions. Click on the 3 dot menu and select Load partitions:
You will see it execute the command MSCK REPAIR TABLE, and in the results it may add partitions to the metastore for each month that has a billing file:
NOTE: It may or may not add partitions and show the messages above.
If you are using the supplied files for this lab, check:
We will now preview the data. Click on the 3 dot menu and select Preview table:
It will execute a Select * from query, and in the results you will see the first 10 lines of your CUR file:
You have successfully setup your CUR file to be analyzed. You can now query your usage and costs via SQL.
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool