PMsquare

Services

Blogs

Andreea Stanovici, February 7, 2022

Get the Best Solution for
Your Business Today!

Alarms are very helpful in determining the state of certain processes such as Amazon Web Services (AWS) Glue workflows and jobs as part of an ETL pipeline. Ideally, there would be a team assigned to monitor the AWS Glue workflows regularly, but this is not very practical or feasible as many workflows are scheduled to run overnight or during off-hours. Setting up alarms to ensure these workflows successfully finish and don’t error out is key to troubleshooting issues early and maintaining and smooth-running pipeline.

Table of Contents

Creating S3 Files to Identify Completion of Workflow

One common method of tracking the status of completed workflows is by automatically creating an S3 file for each completed workflow. In AWS Glue, you can create a job configured to create an S3 file with a particular name. The last trigger in the workflow should automatically start this job. The S3 folder would contain a file named after each workflow once each one has finished successfully. If the file created in the last workflow is not created in S3, this would indicate the ETL pipeline has not been completed. If the workflows have not finished by the start of the workday, the team must be alerted to notify the client.

AWS has a number of helpful tools for tracking changes in S3 buckets including CloudWatch alarms, EventBridge, and S3 Event Notifications. Each of these services can be configured to set an alarm or to trigger a Lambda based on a certain pattern such as the creation of X number of files or the amount of storage used in S3. One of the limitations of these services is that you are unable to track if a specific file is missing, or if it has not been created by a certain time.

Creating an Alert When S3 Files Are Missing

EventBridge allows you to set a pattern or a schedule, but not both. If you want to check for a pattern such as 5 different files having been created in a particular bucket before the start of the workday, this is not currently possible with EventBridge. Alerts can be configured on EventBridge if an AWS Glue job has changed its state to failed, stopped, or timed out. However, EventBridge cannot set an alarm if an AWS Glue workflow did not finish completely by a certain time. If you are trying to configure a lambda function based on one of these events, you can take advantage of AWS S3 event notifications. However, if you are expecting a file to be created, and you want an alert if the file is NOT created, S3 event notifications will not be of much help to you.

Instead, you can create a Lambda function to retrieve all S3 files within a bucket, search for a file with the name that matches your target, and take action based on the file’s absence. The primary goal we want to achieve is to set up a scheduled task that checks before the start of every workday if a specific file indicative of a successful workflow has been created in S3. If the file exists, no actions should be taken. If the file has not been created, indicating the workflow has not been completed, team members or the client should be notified by email.

Here are the steps involved with creating a lambda function that publishes to an Amazon Simple Notification Service (SNS) topic if a specific S3 file is NOT created.

1) Create an SNS topic. Go to the SNS console and click the create topic button. Be sure to use standard ordering and not FIFO.

Creating an SNS Topic in Amazon

2) Enter a name for your topic and hit the “create topic” button in the bottom righthand corner. If required, the topic can be encrypted with a specific access policy. You must now create subscriptions to your topic.

Entering a topic name in Amazon

3) Click on the button to create a subscription, and select a protocol. In this example, we are using email. If you want the subscribers to be notified by email, the endpoint refers to the specific email the notification should go out to. IMPORTANT: a subscriber must confirm their subscription. If they don’t, they will not receive any emails. Make sure to log on to your email and confirm the subscription prior to going onto the next step.

Selecting a protocol for alerting about S3 Files

4) Now we are ready to create the Lambda function. Go to the Lambda service menu and click the button to create a function. You should see the following:

Creating a Lambda Function in Amazon

5) The permissions section is critical when creating a Lambda. If you don’t attach the proper permissions, regardless of the role you are signed in with, the Lambda function will get denied access. It is strongly recommended that you go to the Identity Access Manager (IAM) console and create a role for lambda, allowing for S3 access, SNS publishing permissions, and CloudWatch logging permissions.

Creating a role in Amazon using IAM

6) Back in the Lambda console, select the role you have just created with all the required permissions. If you do not select this role, Lambda will automatically attach a basic role with minimal to no privileges. This means the code most likely won’t be able to execute. Make sure to refresh the page if you don’t see the IAM role you just created in the dropdown menu.

Setting permission in Lambda Console

7) Select the create function button at the bottom of the page, and it will take you to the code as seen below:

8) The picture below displays the code, with the explanation for each piece of the code in #green. Indentation is important for the code to run correctly. Make sure the code is properly indented.

Code for creating S3 file alerts

Here is the raw code:

import boto3
import json
 
S3_BUCKET_NAME = 'test-bucket-for-lambda-function'
S3_BUCKET_PREFIX = 'first_folder/second_folder/'
WORKLOAD_FILENAME = "first_folder/second_folder/desired_file.py"
s3  = boto3.resource('s3') 
 
def lambda_handler(event, context):   
    my_bucket=s3.Bucket(S3_BUCKET_NAME)
    file_list = []
    for file in my_bucket.objects.filter(Prefix=S3_BUCKET_PREFIX ):
        file_name=file.key
        file_list.append(file_name)
    print(file_list)
    if (WORKLOAD_FILENAME in file_list):
        print('The file exists in the bucket.')
    else:
        sns = boto3.client('sns')
        response = sns.publish(
        TopicArn='arn:aws:sns:us-east-1:your_account_number:your-sns-topic',   
        Message='Hello, you are receiving this email because the file you are expecting in s3 has not yet been created. ',   
           )

9) To test the Lambda function, you can click on the orange “Test” button.

Testing a Lambda function in AWS

The following screen will pop up. The purpose of a Lambda test is to imitate a sample event that would trigger your Lambda function. For the purposes of just testing the code, you can keep the sample event. Give the test an “Event name” and hit the “Create” button.

10) Once you have your sample event, hit the orange test button again. This will execute the Lambda test and run the function code. Note, if the permissions are misconfigured, you will get an error that looks like this:

Error code on Lambda sample event

If you followed all of the steps correctly, your subscribers should receive an email that looks like this:

Receiving email about S3 File Missing

Additionally, the code execution should display a list of the files in your S3 bucket. The status of the function says “Succeeded,” indicating the Lambda is functional.

Status of files in Amazon S3 bucket

In our primary goal, we stated we wanted the team notified ONLY if the file in S3 was not created by a certain time. The way to achieve that is to set up an EventBridge rule to trigger lambda. From the console, click the “create a rule” button. Name the rule and set up a reoccurring time. In this example, the rule is set for every weekday at 7AM CST.

Notified only if Amazon S3 Bucket was not created

Select the newly created Lambda function as a target, and then hit create in the bottom righthand corner.

Selecting targets in AWS Lambda

To verify the Lambda is set as the target, return to the Lambda console and click on the Lambda function we have created. You should see the EventBridge is set as a trigger for the Lambda function.

EventBridge is set as a trigger for the Lambda function.

There you have it. Following all of the steps will create 3 resources: an SNS topic, a Lambda function, and an EventBridge rule. These 3 services allow you to get an email if a specific S3 file has not been created within a specific bucket by a certain time.

Next Steps

We hope you found this article informative. Be sure to subscribe to our newsletter for AWS technical articles, updates, and insights delivered directly to your inbox.

If you have any questions or would like PMsquare to provide guidance and support for your analytics solution, contact us today.