Scraping Google Search Results with Python and AWS Part II - Logging and Alerting
In my previous blog post, I talked about scraping Google Search results using Python within the AWS ecosystem using AWS Lambda and storing the results in DynamoDB. This was accomplished using SerpApi to scrape the results and get the data in JSON format.
In this blog post, let's dive one level deeper and explore how you can implement logging for the lambda scraper function we created, and add alerting based on the results you obtain from SerpApi within AWS.
Logging The Response
To log the response from the SerpApi scraper lambda function we wrote in the previous blog post, we will use AWS CloudWatch.
AWS CloudWatch provides an integrated solution to monitor and log Lambda function executions, offering valuable insights into performance metrics, errors, and overall function health.
To gain visibility into the function's execution, we can use CloudWatch to log all the events related to it in a central place. We can capture logs, set up custom metrics, track errors, and monitor execution duration. These logs will be invaluable in troubleshooting, understanding performance bottlenecks, and optimizing the function's behavior.
After creating the lambda function, and testing it like we did in the previous blog post, a related log group should be created in AWS Cloudwatch. This appears like this:
When you click on the log group, you can access the log stream for all your previous lambda runs. It looks like this:
Without any additional configuration and changes, this will tell you if the function ran successfully. If it produced an error, you'll be able to see the error message here as well.
However, to make this more useful, you can add response metadata logging to your lambda function. This will enable you to debug easily when needed. The search_metadata
for a search conducted using our APIs looks something like this:
"search_metadata":
{
"id": "67bf89ae717ded2b923bf512",
"status": "Success",
"json_endpoint": "https://serpapi.com/searches/f8e83f39923b5492/67bf89ae717ded2b923bf512.json",
"created_at": "2025-02-26 21:37:50 UTC",
"processed_at": "2025-02-26 21:37:50 UTC",
"google_url": "https://www.google.com/search?q=Coffee&hl=en&gl=us&sourceid=chrome&ie=UTF-8",
"raw_html_file": "https://serpapi.com/searches/f8e83f39923b5492/67bf89ae717ded2b923bf512.html",
"total_time_taken": 1.32
}
To include this in the logs for your searches, you'd need to add some basic logging steps to the existing lambda function we wrote in the previous blog post.
Let's use the logging
library to accomplish this. You can set the log level as you wish. It supports 6 log levels:
Using this library to log the search metadata looks like this:
import json, os
from serpapi import GoogleSearch
import boto3
from datetime import datetime
import logging
# Get the value of the LAMBDA_LOG_LEVEL environment variable
log_level = os.environ.get('LAMBDA_LOG_LEVEL', 'INFO')
# Configure the logger
logger = logging.getLogger()
logger.setLevel(log_level)
def lambda_handler(event, context):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('SearchResults')
params = {
"q": "coffee",
"api_key": os.environ.get('SERPAPI_API_KEY')
}
search = GoogleSearch(params)
results = search.get_dict()
logger.info(f'"Search Metadata:{results["search_metadata"]}"')
organic_results = results["organic_results"]
links = []
for result in organic_results:
links.append(result["link"])
table.put_item(
Item={
'search_id': results["search_metadata"]["id"],
'timestamp': datetime.now().isoformat(timespec='seconds'),
'links': json.dumps(links)
}
)
return {
'statusCode': 200,
'body': links
}
Add a LAMBDA_LOG_LEVEL
environment variable to the Lambda function to configure the log level. In this case, I've set the default to "INFO". This will ensure I see all the informational logs which I send from this function.
If we test the lambda function now, we should see the search metadata logged in Cloudwatch as well.
Here are the relevant logs in Cloudwatch after I run the function above:
This will track search status codes as well. I've explained more about what to expect for those below.
You can add any other relevant fields to your logs as well if you want to keep a record of each search run. These logs are easily searchable, and you can even use features like CloudWatch Logs Anomaly Detection and Live Trail to catch particular errors when they happen.
Search Status and Error Codes
All of our Search APIs use the same error response structure. This includes all of our APIs except for the Extra APIs (Location API, Account API, Search Archive API etc.).
A search status is accessible through the search_metadata.status key
. A search status begins as Processing, then resolves to either Success or Error. If a search has failed or contains empty results, the top level error key will contain an error message.
You can find more details about this on our Search API Status and Errors page.
If you're using our APIs via an HTTP GET request, you may encounter some numbered error codes. SerpApi uses conventional HTTP response codes to indicate the success or failure of an API request. In general, a 200
code indicates success. Codes in the 4xx
range usually indicate an error that failed given the information provided (e.g., a required parameter was omitted, ran out of searches, etc.). Codes in the 5xx
range usually indicate an error with SerpApi's servers.
You can find more details about this on our Status and Error Codes page.
Alerting on Results
If you're looking to alert based on results from our API, you can use AWS tools like Simple Notification Service (SNS). I'll be using SNS to send a notification to my email.
Here I'm going to demonstrate a simple example about setting up an alert to check if a particular website appears in the top 10 organic search results from our Google Search API.
For this, I will modify the lambda function to accept a domain from the user and look for that domain in the top 10 organic results we obtain from our Google Search API. Here is what this looks like:
import json, os
from serpapi import GoogleSearch
import boto3
from datetime import datetime
import logging
# Get the value of the LAMBDA_LOG_LEVEL environment variable
log_level = os.environ.get('LAMBDA_LOG_LEVEL', 'INFO')
# Configure the logger
logger = logging.getLogger()
logger.setLevel(log_level)
sns_client = boto3.client('sns')
# A helper function to send the alert to SNS
def send_alert(sns_topic_arn, message):
response = sns_client.publish(
TopicArn=sns_topic_arn,
Message=message,
Subject="SERP Alert: Domain Not in Top 10"
)
return response
def lambda_handler(event, context):
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('SearchResults')
domain_to_search_for = event['domain_to_search_for'] # Accepted from the user
params = {
"q": event['q'], # Accepted from the user
"api_key": os.environ.get('SERPAPI_API_KEY')
}
search = GoogleSearch(params)
results = search.get_dict()
logger.info(f'"Search Metadata:{results["search_metadata"]}"')
organic_results = results["organic_results"]
links = []
for result in organic_results:
links.append(result["link"])
domain_found_in_top_10 = False
for link in links:
if domain_to_search_for in link:
domain_found_in_top_10 = True
table.put_item(
Item={
'search_id': results["search_metadata"]["id"],
'timestamp': datetime.now().isoformat(timespec='seconds'),
'links': json.dumps(links)
}
)
if not domain_found_in_top_10:
sns_topic_arn = 'arn:aws:sns:us-east-2:<Account ID>:<SNS Topic>'
message = f"Alert: The domain {domain_to_search_for} is not in the top 10 results for the searched keyword."
send_alert(sns_topic_arn, message)
return {
'statusCode': 200,
'body': links,
'domain_found': domain_found_in_top_10
}
Now before this can work, we need to create an SNS topic in the AWS account and give the lambda function permissions to access the SNS topic.
Let's create the SNS topic:
Following that, we can add a "Subscription" for this SNS topic and select a preferred method of notification, such as email:
Let's now give lambda permission to access this topic now. You can do this by adding the AmazonSNSFullAccess policy to the role for your Lambda on the AWS IAM page. This allows your Lambda to talk to your SNS topic and actually send the notification.
Following this, replace the SNS topic placeholder with the name of the topic you created here and account ID with your AWS account ID - and we're ready to test.
For the test, we can add the two fields that we are accepting from the user - the query q
and the domain domain_to_search_for
, and click Test.
Here I deliberately chose a domain which wouldn't be in the top 10 results for the query so I could test the email notification:
This adds the list of links to a DynamoDB table and also send you a notification like this on your email in case your domain is not in the top 10:
DynamoDB table entry created:
Notification sent via email:
Conclusion
You've successfully set up your Lambda function to scrape data from Google Search results, add the results to a DynamoDB table. You've also added relevant search logging, and sent an email notification at the click of a button!
I hope this blog post was helpful in understanding how to use AWS's powerful features in combination with our exceptional search APIs. If you have any questions, don't hesitate to reach out to me at sonika@serpapi.com.
Relevant Posts
You may be interested in reading more about our Google Search API, and other integrations we have.