Lab complete!
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool
With your log data now stored in S3, you will utilize Amazon Athena - a serverless interactive query service. You will run SQL queries on your log files to extract information from them. In this section, we will focus on the Apache access logs, although Athena can be used to query any of your log files. It is possible to query your log data from CloudWatch Insights, however, Athena querying allows you to pull data from files stored in S3, as well as other sources, where Insights only allows to query data in CloudWatch. Athena supports SQL querying - an industry standard language.
Open up the Athena console.
If this is the first time you are using Athena:
If this is not the first time you are using Athena:
Enter the following into the Query result location field, replacing REPLACE_ME_BUCKETNAME
with the name of the S3 bucket you created, likely wa-lab-<your-account-id>-<date>
.
s3://REPLACE_ME_BUCKETNAME/athenaqueries/
security_lab_logs
.CREATE database security_lab_logs
Query successful.
displayed in the results box.security_lab_logs
.REPLACE_ME_BUCKET
with the name of the bucket you created to your logs in S3, likely wa-lab-<your-last-name>-<date>
.REPLACE_ME_STRING.
Follow these steps to identify the path.wa-lab-<your-last-name>-<date>
.lablogs
folder.c848ff11-df30-481c-8d9f-5805741606d3
). This string is what you should use for REPLACE_ME_STRING
.CREATE EXTERNAL TABLE IF NOT EXISTS `security_lab_apache_access_logs` (
request_date string,
request_timestamp string,
request_ip string,
request_method string,
request string,
response_code int,
response_size int,
user_client_data string,
garbage string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^ ]*)T([^ ]*)Z ([^ ]*) (?:[^ ]* [^ ]*) (?:[^\"]*) \"([^ ]*) ([^\"]*)\" ([^ ]*) ([^ ]*) (?:\"[^\"]*\") (\"[^\"]*\")([^\n]*)"
)
LOCATION 's3://REPLACE_ME_BUCKET/lablogs/REPLACE_ME_STRING/apache-access-logs/'
TBLPROPERTIES (
'compressionType' = 'gzip'
);
Let’s break this down a little.
CREATE EXTERNAL TABLE...
statement creates your new table and defines its columns, such as request_date
, request_timestamp
, and so on.ROW FORMAT SERDE
statement specifies that the table rows are formatted using the RegEx SerDe (serializer/deserializer).WITH SERDEPROPERTIES
statement specifies the RegEx input format of your log files. This is how your raw log data is converted into columns.LOCATION
statement specifies the source of your table data, which is the S3 bucket containing your log files.TBLPROPERTIES
statement specifies that your log files are initially compressed using the GZIP format.'security_lab_apache_access_logs'
table in the left side menu under Tables.SELECT *
FROM security_lab_logs.security_lab_apache_access_logs limit 15
SELECT response_code,
count(response_code) AS count
FROM security_lab_logs.security_lab_apache_access_logs
WHERE response_code IS NOT NULL
GROUP BY response_code
ORDER BY count desc
This isolates the response_code and response_size columns from your table and creates a new column called count, which is the frequency of each response type.
If you needed to track different metrics for your workload, you can always use different Athena queries to do so, but for the purposes of this lab, we will just be focusing on response code frequency.
Recap: In this section, you analyzed information from your workloads’s log files using Amazon Athena. Although you only focused on the response codes and sizes in this lab, Athena can be used to query any data from S3, making it a powerful tool to analyze log files without directly accessing them. This demonstrates the best practices of “enabling people to perform actions at a distance” and “keeping people away from data. You’ve been able to minimize direct interaction with your data and instances - first by using Systems Manager for configuration, and now through S3 and Athena for log analysis.
Now that you have completed this lab, make sure to update your Well-Architected review if you have implemented these changes in your workload.
Click here to access the Well-Architected Tool