Google Ads Transparency Center is a valuable resource for anyone looking to understand the advertising landscape. Ads in the Transparency Center often contain valuable insights that brands can study before launching their own campaigns.
However, the challenge is that most of these ads are provided in image format. This means you only get a link to the ad image, not the underlying text content when you scrape them using SerpApi. For manual viewing, that’s fine - but if you want to analyze the messaging at scale, it’s a major limitation. Being able to extract text directly from those ads can unlock new opportunities to study how competitors structure their messaging and how to refine your own ad strategy.
This blog post will guide you through using SerpApi to scrape ads and then leverage an image-to-text Python library (pytesseract) to extract text from ad images.
Why Extract Text From Ad Images
If you're a market researcher, a competitor analysis expert, or even an ad copywriter, the ability to programmatically pull text from image ads opens up a world of possibilities:
- Competitive Analysis: Track how competitors are phrasing their ad copy in image-based campaigns.
- Trend Identification: Analyze common keywords, calls to action, and messaging themes across a large dataset of image ads.
- Historical Data: Build a historical archive of ad creatives and their embedded text for long-term trend analysis.
- Ad Copy Inspiration: Discover new and effective ways to phrase your own ad copy by studying successful examples.
Tools We Will Use
- SerpApi's Google Ads Transparency Center API: This API allows us to programmatically query the Transparency Center and retrieve structured data about ads.
- Pillow (PIL): A powerful image processing library for Python, useful for handling image data. We'll use it to open the image file and have it ready for text extraction.
- Tesseract OCR (via pytesseract): An open-source Optical Character Recognition (OCR) engine that can extract text from images.
Steps To Extract Ad Text
Step 1: Setup your environment
Ensure you have the necessary libraries installed.
pip install google-search-results Pillow pytersseract
We'll also need to install Tesseract OCR itself. The installation process will depend on your operating system. For MacOS, you can use brew install tesseract
.
google-search-results
is our Python library. You can use this library to scrape search results from any of SerpApi's APIs.
More About Our Python Libraries
We have two separate Python libraries serpapi
and google-search-results
, and both work perfectly fine. However, serpapi
is a new one, and all the examples you can find on our website are from the old one google-search-results
. If you'd like to use our Python library with all the examples from our website, you should install the google-search-results
module instead of serpapi
.
For this blog post, I am using google-search-results
because all of our documentation references this one.
You may encounter issues if you have both libraries installed at the same time. If you have the old library installed and want to proceed with using our new library, please follow these steps:
- Uninstall
google-search-results
module from your environment. - Make sure that neither
serpapi
norgoogle-search-results
are installed at that stage. - Install
serpapi
module, for example with the following command if you're usingpip
:pip install serpapi
Step 2: Get your SerpApi API key
To begin scraping data, first, create a free account on serpapi.com. You'll receive 250 free search credits each month to explore the API.
- Get your SerpApi API Key from this page.
- [Optional but Recommended] Set your API key in an environment variable, instead of directly pasting it in the code. Refer here to understand more about using environment variables. For this tutorial, I have saved the API key in an environment variable named "SERPAPI_API_KEY" in my .env file.
Step 3: Fetch Ads Data from Google Ads Transparency Center
Let's set up the imports we'll need and load our .env file which contains our environment variable with the API key.
import csv
from serpapi import GoogleSearch
from dotenv import load_dotenv
import os
import requests, json
from PIL import Image
import pytesseract
load_dotenv()
Some add some basic configuration steps:
serpapi_api_key = os.environ["SERPAPI_API_KEY"]
search_query = "cloud hosting"
output_image_filename = "ad_creative.png"
pytesseract.pytesseract.tesseract_cmd = r'/opt/homebrew/bin/tesseract' # Example for macOS
def create_csv():
header = ["Advertiser", "Advertiser ID", "Details Link", "Image URL", "Extracted Text"] # Specify a list of the fields you are interested in
with open("text_from_ads.csv", "w", encoding="UTF8", newline="") as f:
writer = csv.writer(f)
writer.writerow(header)
return
pytesseract.pytesseract.tesseract_cmd
variable may need to be different based on where the folder is stored. That may look like: pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
I added a simplecreate_csv()
function to create the CSV file we want to write all the details from the images to. I also chose to record the Advertiser name, ID, details link, image URL along with the extracted test from the images.
Then let's write a function to use SerpApi to get ad results.
def get_ads_from_transparency_center(query):
params = {
"api_key": serpapi_api_key,
"engine": "google_ads_transparency_center",
"advertiser_id": "AR07223290584121737217", # replace with ID of your choice
"region": "2840" # replace with region of your choice
}
search = GoogleSearch(params)
results = search.get_dict()
return results.get("ad_creatives", [])
Here is what the response looks like if you run this function:
[
{
"advertiser_id": "AR07223290584121737217",
"advertiser": "Bloomberg L.P.",
"ad_creative_id": "CR08363136799530287105",
"format": "text",
"image": "https://tpc.googlesyndication.com/archive/simgad/12342601236957871341",
"width": 380,
"height": 219,
"total_days_shown": 7,
"first_shown": 1756816460,
"last_shown": 1757374223,
"details_link": "https://adstransparency.google.com/advertiser/AR07223290584121737217/creative/CR08363136799530287105?region=US"
},
{
"advertiser_id": "AR07223290584121737217",
"advertiser": "Bloomberg L.P.",
"ad_creative_id": "CR15208021437521592321",
"format": "image",
"image": "https://tpc.googlesyndication.com/archive/simgad/1920549703762573409",
"width": 300,
"height": 250,
"total_days_shown": 28,
"first_shown": 1755006166,
"last_shown": 1757373520,
"details_link": "https://adstransparency.google.com/advertiser/AR07223290584121737217/creative/CR15208021437521592321?region=US"
},
{
"advertiser_id": "AR07223290584121737217",
"advertiser": "Bloomberg L.P.",
"ad_creative_id": "CR05075108557259538433",
"format": "text",
"image": "https://tpc.googlesyndication.com/archive/simgad/9320300039173543597",
"width": 380,
"height": 222,
"total_days_shown": 521,
"first_shown": 1710885206,
"last_shown": 1757372828,
"details_link": "https://adstransparency.google.com/advertiser/AR07223290584121737217/creative/CR05075108557259538433?region=US"
},
{
"advertiser_id": "AR07223290584121737217",
"advertiser": "Bloomberg L.P.",
"ad_creative_id": "CR07144663631446147073",
"format": "text",
"image": "https://tpc.googlesyndication.com/archive/simgad/7088743589330901667",
"width": 380,
"height": 484,
"total_days_shown": 521,
"first_shown": 1710906186,
"last_shown": 1757369131,
"details_link": "https://adstransparency.google.com/advertiser/AR07223290584121737217/creative/CR07144663631446147073?region=US"
},
...
...
...
]
Step 4: Download the Ad Images
Now let's write a function we can use to download the ad images.
def download_image(url, filename):
try:
response = requests.get(url, stream=True)
response.raise_for_status()
with open(filename, 'wb') as f:
for chunk in response.iter_content(chunk_size=8192):
f.write(chunk)
return filename
except requests.exceptions.RequestException as e:
print(f"Error downloading image from {url}: {e}")
return None
This will download each image for us and throw an error if it's unable to process it.
Step 5: Extract Text from Images
Now let's write a function to extract text from the image using Tesseract OCR
def extract_text_from_image(image_path):
try:
img = Image.open(image_path)
extracted_text = pytesseract.image_to_string(img)
return extracted_text
except Exception as e:
print(f"Error during OCR processing of {image_path}: {e}")
return None
This will give us the text from the ad images we download.
Here's an example:
[The Ad image is on the left and the generated output is on the right]


This capability unlocks new avenues for competitive analysis, market research and understanding the evolving landscape of online advertising.
Step 6: Write a main function to use the above functions and get the compiled CSV with data from all ad images
if __name__ == "__main__":
if not serpapi_api_key:
print("Please set your SERPAPI_API_KEY environment variable or replace 'YOUR_SERPAPI_API_KEY' in the script.")
else:
print(f"Searching Google Ads Transparency Center for: '{search_query}'")
ads_data = get_ads_from_transparency_center(search_query)
if not ads_data:
print("No ads found for the query.")
else:
create_csv()
print(f"Found {len(ads_data)} ads. Processing ads.")
found_image_ad = False
for ad in ads_data:
if "image" in ad and ad["image"]:
image_url = ad["image"]
print(f"\n--- Processing Ad ---")
advertiser = ad.get('advertiser', 'N/A')
advertiser_id = ad.get("advertiser_id", "N/A")
details_link = ad.get('details_link', 'N/A')
downloaded_path = download_image(image_url, output_image_filename)
if downloaded_path:
# Extract text from the image
extracted_text = extract_text_from_image(downloaded_path)
if extracted_text:
print("\n--- Extracted Text from Ad Image ---")
# Write to CSV
with open("text_from_ads.csv", "a", encoding="UTF8", newline="") as f:
print("\n--- Writing to CSV ---")
writer = csv.writer(f)
writer.writerow([advertiser, advertiser_id, details_link, image_url, extracted_text])
else:
print("Could not extract text from the image.")
found_image_ad = True
if not found_image_ad:
print("No ads with an image creative were found in the results.")
This will create a file text_from_ads.csv
and you'll see all of the data, including text extracted from the images, within it.
Here's what the output file looks like with all the ad data:

Limitations and Considerations
OCR Accuracy: Tesseract OCR is powerful, but it's accuracy can depend on the image quality, font styles and the text orientation. Highly stylized ad images or low resolution ad images might yield less accurate results.
Ad Creative Diversity: Not all Ads will be image based. Many may be video or text based. This script purely focuses on ads that have an image URL.
Image Processing: For more advances scenarios, you may want to consider adding image pre-processing steps like resizing, contrast enhancements etc to improve OCR accuracy.
Conclusion
By combining SerpApi's powerful Google Ads Transparency Center API with pytesseract
for OCR, you can programmatically extract valuable text information from image based ads. I hope this tutorial was helpful in understanding how you can use this capability.
You can find all the code in this post on my Github here:
If you have any questions, don't hesitate to reach out to me at sonika@serpapi.com.
Relevant Links
- Try out the Google Ads Transparency Center API in our playground
- Google Ads Transparency Center API Documentation
- Status page
- Plans and Pricing
Related Posts

