Extract Text from Ad Images in Google Ad Transparency Center using Python

Google Ads Transparency Center is a valuable resource for anyone looking to understand the advertising landscape. Ads in the Transparency Center often contain valuable insights that brands can study before launching their own campaigns.

However, the challenge is that most of these ads are provided in image format. This means you only get a link to the ad image, not the underlying text content when you scrape them using SerpApi. For manual viewing, that’s fine - but if you want to analyze the messaging at scale, it’s a major limitation. Being able to extract text directly from those ads can unlock new opportunities to study how competitors structure their messaging and how to refine your own ad strategy.

This blog post will guide you through using SerpApi to scrape ads and then leverage an image-to-text Python library (pytesseract) to extract text from ad images.

Why Extract Text From Ad Images

If you're a market researcher, a competitor analysis expert, or even an ad copywriter, the ability to programmatically pull text from image ads opens up a world of possibilities:

Competitive Analysis: Track how competitors are phrasing their ad copy in image-based campaigns.
Trend Identification: Analyze common keywords, calls to action, and messaging themes across a large dataset of image ads.
Historical Data: Build a historical archive of ad creatives and their embedded text for long-term trend analysis.
Ad Copy Inspiration: Discover new and effective ways to phrase your own ad copy by studying successful examples.

Tools We Will Use

SerpApi's Google Ads Transparency Center API: This API allows us to programmatically query the Transparency Center and retrieve structured data about ads.
Pillow (PIL): A powerful image processing library for Python, useful for handling image data. We'll use it to open the image file and have it ready for text extraction.
Tesseract OCR (via pytesseract): An open-source Optical Character Recognition (OCR) engine that can extract text from images.

Steps To Extract Ad Text

Step 1: Setup your environment

Ensure you have the necessary libraries installed.

pip install google-search-results Pillow pytersseract

We'll also need to install Tesseract OCR itself. The installation process will depend on your operating system. For MacOS, you can use brew install tesseract.

google-search-results is our Python library. You can use this library to scrape search results from any of SerpApi's APIs.

More About Our Python Libraries

We have two separate Python libraries serpapi and google-search-results, and both work perfectly fine. However, serpapi is a new one, and all the examples you can find on our website are from the old one google-search-results. If you'd like to use our Python library with all the examples from our website, you should install the google-search-results module instead of serpapi.

For this blog post, I am using google-search-results because all of our documentation references this one.

You may encounter issues if you have both libraries installed at the same time. If you have the old library installed and want to proceed with using our new library, please follow these steps:

Uninstall google-search-results module from your environment.
Make sure that neither serpapi nor google-search-results are installed at that stage.
Install serpapi module, for example with the following command if you're using pip: pip install serpapi

Step 2: Get your SerpApi API key

To begin scraping data, first, create a free account on serpapi.com. You'll receive 250 free search credits each month to explore the API.

Get your SerpApi API Key from this page.
[Optional but Recommended] Set your API key in an environment variable, instead of directly pasting it in the code. Refer here to understand more about using environment variables. For this tutorial, I have saved the API key in an environment variable named "SERPAPI_API_KEY" in my .env file.

Step 3: Fetch Ads Data from Google Ads Transparency Center

Let's set up the imports we'll need and load our .env file which contains our environment variable with the API key.

import csv
from serpapi import GoogleSearch
from dotenv import load_dotenv
import os
import requests, json
from PIL import Image
import pytesseract
load_dotenv()

Some add some basic configuration steps:

serpapi_api_key = os.environ["SERPAPI_API_KEY"]
search_query = "cloud hosting" 
output_image_filename = "ad_creative.png"
pytesseract.pytesseract.tesseract_cmd = r'/opt/homebrew/bin/tesseract' # Example for macOS

def create_csv():
    header = ["Advertiser", "Advertiser ID", "Details Link", "Image URL", "Extracted Text"] # Specify a list of the fields you are interested in
    with open("text_from_ads.csv", "w", encoding="UTF8", newline="") as f:
        writer = csv.writer(f)
        writer.writerow(header)
    return

💡

If you're using windows, your pytesseract.pytesseract.tesseract_cmd variable may need to be different based on where the folder is stored. That may look like: pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

I added a simplecreate_csv() function to create the CSV file we want to write all the details from the images to. I also chose to record the Advertiser name, ID, details link, image URL along with the extracted test from the images.

💡

Feel free to change the header list to specify a list of the fields you are interested in having in the resulting CSV.

Then let's write a function to use SerpApi to get ad results.

def get_ads_from_transparency_center(query):
    params = {
        "api_key": serpapi_api_key,
        "engine": "google_ads_transparency_center",
        "advertiser_id": "AR07223290584121737217", # replace with ID of your choice
        "region": "2840" # replace with region of your choice
    }
    search = GoogleSearch(params)
    results = search.get_dict()
    return results.get("ad_creatives", [])

Here is what the response looks like if you run this function:

[
    {
      "advertiser_id": "AR07223290584121737217",
      "advertiser": "Bloomberg L.P.",
      "ad_creative_id": "CR08363136799530287105",
      "format": "text",
      "image": "https://tpc.googlesyndication.com/archive/simgad/12342601236957871341",
      "width": 380,
      "height": 219,
      "total_days_shown": 7,
      "first_shown": 1756816460,
      "last_shown": 1757374223,
      "details_link": "https://adstransparency.google.com/advertiser/AR07223290584121737217/creative/CR08363136799530287105?region=US"
    },
    {
      "advertiser_id": "AR07223290584121737217",
      "advertiser": "Bloomberg L.P.",
      "ad_creative_id": "CR15208021437521592321",
      "format": "image",
      "image": "https://tpc.googlesyndication.com/archive/simgad/1920549703762573409",
      "width": 300,
      "height": 250,
      "total_days_shown": 28,
      "first_shown": 1755006166,
      "last_shown": 1757373520,
      "details_link": "https://adstransparency.google.com/advertiser/AR07223290584121737217/creative/CR15208021437521592321?region=US"
    },
    {
      "advertiser_id": "AR07223290584121737217",
      "advertiser": "Bloomberg L.P.",
      "ad_creative_id": "CR05075108557259538433",
      "format": "text",
      "image": "https://tpc.googlesyndication.com/archive/simgad/9320300039173543597",
      "width": 380,
      "height": 222,
      "total_days_shown": 521,
      "first_shown": 1710885206,
      "last_shown": 1757372828,
      "details_link": "https://adstransparency.google.com/advertiser/AR07223290584121737217/creative/CR05075108557259538433?region=US"
    },
    {
      "advertiser_id": "AR07223290584121737217",
      "advertiser": "Bloomberg L.P.",
      "ad_creative_id": "CR07144663631446147073",
      "format": "text",
      "image": "https://tpc.googlesyndication.com/archive/simgad/7088743589330901667",
      "width": 380,
      "height": 484,
      "total_days_shown": 521,
      "first_shown": 1710906186,
      "last_shown": 1757369131,
      "details_link": "https://adstransparency.google.com/advertiser/AR07223290584121737217/creative/CR07144663631446147073?region=US"
    },
    ...
    ...
    ...
]

Step 4: Download the Ad Images

Now let's write a function we can use to download the ad images.

def download_image(url, filename):
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status()
        with open(filename, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        return filename
    except requests.exceptions.RequestException as e:
        print(f"Error downloading image from {url}: {e}")
        return None

This will download each image for us and throw an error if it's unable to process it.

Step 5: Extract Text from Images

Now let's write a function to extract text from the image using Tesseract OCR

def extract_text_from_image(image_path):
    try:
        img = Image.open(image_path)
        extracted_text = pytesseract.image_to_string(img)
        return extracted_text
    except Exception as e:
        print(f"Error during OCR processing of {image_path}: {e}")
        return None

This will give us the text from the ad images we download.

Here's an example:

[The Ad image is on the left and the generated output is on the right]

This capability unlocks new avenues for competitive analysis, market research and understanding the evolving landscape of online advertising.

Step 6: Write a main function to use the above functions and get the compiled CSV with data from all ad images

if __name__ == "__main__":
    if not serpapi_api_key:
        print("Please set your SERPAPI_API_KEY environment variable or replace 'YOUR_SERPAPI_API_KEY' in the script.")
    else:
        print(f"Searching Google Ads Transparency Center for: '{search_query}'")
        ads_data = get_ads_from_transparency_center(search_query)
        if not ads_data:
            print("No ads found for the query.")
        else:
            create_csv()
            print(f"Found {len(ads_data)} ads. Processing ads.")
            found_image_ad = False
            for ad in ads_data:
                if "image" in ad and ad["image"]:
                    image_url = ad["image"]
                    print(f"\n--- Processing Ad ---")
                    advertiser = ad.get('advertiser', 'N/A')
                    advertiser_id = ad.get("advertiser_id", "N/A")
                    details_link = ad.get('details_link', 'N/A')
                    downloaded_path = download_image(image_url, output_image_filename)
                    if downloaded_path:
                        # Extract text from the image
                        extracted_text = extract_text_from_image(downloaded_path)
                        if extracted_text:
                            print("\n--- Extracted Text from Ad Image ---")
                            # Write to CSV
                            with open("text_from_ads.csv", "a", encoding="UTF8", newline="") as f:
                                print("\n--- Writing to CSV ---")
                                writer = csv.writer(f)
                                writer.writerow([advertiser, advertiser_id, details_link, image_url, extracted_text])
                        else:
                            print("Could not extract text from the image.")
                    found_image_ad = True
            if not found_image_ad:
                print("No ads with an image creative were found in the results.")

This will create a file text_from_ads.csv and you'll see all of the data, including text extracted from the images, within it.

Here's what the output file looks like with all the ad data:

Limitations and Considerations

OCR Accuracy: Tesseract OCR is powerful, but it's accuracy can depend on the image quality, font styles and the text orientation. Highly stylized ad images or low resolution ad images might yield less accurate results.

Ad Creative Diversity: Not all Ads will be image based. Many may be video or text based. This script purely focuses on ads that have an image URL.

Image Processing: For more advances scenarios, you may want to consider adding image pre-processing steps like resizing, contrast enhancements etc to improve OCR accuracy.

Conclusion

By combining SerpApi's powerful Google Ads Transparency Center API with pytesseract for OCR, you can programmatically extract valuable text information from image based ads. I hope this tutorial was helpful in understanding how you can use this capability.

You can find all the code in this post on my Github here:

If you have any questions, don't hesitate to reach out to me at sonika@serpapi.com.

Extract Text from Ad Images in Google Ad Transparency Center using Python

Sonika Arora

Why Extract Text From Ad Images

Tools We Will Use

Steps To Extract Ad Text

Step 1: Setup your environment

More About Our Python Libraries

Step 2: Get your SerpApi API key

Step 3: Fetch Ads Data from Google Ads Transparency Center

Step 4: Download the Ad Images

Step 5: Extract Text from Images

Step 6: Write a main function to use the above functions and get the compiled CSV with data from all ad images

Limitations and Considerations

Conclusion

Relevant Links

Free Plan · 250 searches / month

APIs

Easy Integrations

Features

Use Cases

Resources

Why Extract Text From Ad Images

Tools We Will Use

Steps To Extract Ad Text

Step 1: Setup your environment

More About Our Python Libraries

Step 2: Get your SerpApi API key

Step 3: Fetch Ads Data from Google Ads Transparency Center

Step 4: Download the Ad Images

Step 5: Extract Text from Images

Step 6: Write a main function to use the above functions and get the compiled CSV with data from all ad images

Limitations and Considerations

Conclusion

Relevant Links

Related Posts

Free Plan · 250 searches / month