Use AWS Glue to enable access to CUR files via Amazon Athena

We will use AWS Glue and setup a scheduled Crawler, which will run each day. This crawler will scan the CUR files and create a database and tables for the delivered files. If there are new versions of a CUR, or new months delivered - they will be automatically included.

We will use Athena to access and view our CUR files via SQL. Athena is a serverless solution to be able to execute SQL queries across very large amounts of data. Athena is only charged for data that is scanned, and there are no ongoing costs if data is not being queried, unlike a traditional database solution.

  1. Go to the Glue console: Images/Glue0.png

  2. Click on Get started if you have not used Glue before

  3. Ensure you are in the region where your CUR files are delivered, click on Crawlers and click Add crawler: Images/Glue1.png

  4. Enter a Crawler name starting with Cost, and click Next: Images/Glue2.png

  5. Select Data stores, and click Next: Images/Glue3.png

  6. Ensure you select Specified path in another account, and enter the S3 path of your bucket s3://(CUR bucket), expand Exclude patterns, enter the following patterns one line at a time and click next:

         **.json, **.yml, **.sql, **.csv, **.gz, **.zip
    

Images/Glue4.png

  1. Add another data store, click Next: Images/Glue7.png

  2. Select Create an IAM role, enter a role name of Cost_MasterCrawler, and click Next: Images/Glue8.png

  3. Click the Down arrow, and select a Daily Frequency: Images/Glue9.png

  4. Enter in a Start Hour and Start Minute, then click Next: Images/Glue10.png

  5. Click Add database: Images/Glue11.png

  6. Enter a Database name of costmaster, and click Create: Images/Glue12.png

  7. Click Next: Images/Glue13.png

  8. Review the crawler and click Finish: Images/Glue14.png

  9. Select the checkbox next to the crawler, click Run crawler: Images/Glue15.png

  10. You will see the Crawler was successful and created a table: Images/Glue16.png

  11. Click Databases Images/Glue20.png

  12. Select the costmaster database that Glue created: Images/Glue21.png

  13. Click Tables in costmaster: Images/Glue22.png

  14. Click the table name: Images/Glue23.png

  15. Verify the recordCount is not zero, if it is - go back and verify the steps above: Images/Glue24.png

  16. Go to the Athena Console: Images/Glue17.png

  17. Select the drop down arrow, and click on the new database: Images/Glue18.png

  18. A new table will have been created (named after the CUR), we will now load the partitions. Click on the 3 dot menu and select Load partitions: Images/AWSBillingAnalysis_14.png

  19. You will see it execute the command MSCK REPAIR TABLE, and in the results it may add partitions to the metastore for each month that has a billing file: Images/AWSBillingAnalysis_15.png

NOTE: It may or may not add partitions and show the messages above.

If you are using the supplied files for this lab, check:

  • The folder names year and month are in S3 and the case matches
  • There are parquet files in each of the month folders
  1. We will now preview the data. Click on the 3 dot menu and select Preview table: Images/AWSBillingAnalysis_16.png

  2. It will execute a Select * from query, and in the results you will see the first 10 lines of your CUR file: Images/AWSBillingAnalysis_17.png

You have successfully setup your CUR file to be analyzed. You can now query your usage and costs via SQL.