Product listings are a treasure trove of data. While title, price and description are easily accessible, a significant amount of information is locked away in product images. Understanding product packaging and details in the product images for your category of interest can help inform your product strategy. This is especially valuable if you are planning to launch a competing product in your category.

In this blog post, I'll cover how you can scrape images for any product using SerpApi's Google Immersive Product API, and then use an Optical Character Recognition (OCR) library to extract the text from these images. Data from images can be used in various different ways. For this tutorial, we'll create a word cloud of commonly mentioned words in product images.

Why Extract Text From Product Images

Extracting text from product images on Google Immersive Product API serves many purposes:

  1. Uncover hidden product features and benefits: Sellers often mention specific keywords on product packaging, and feature callouts in pictures. You can also identify unique selling propositions included in images.
  2. Gaining a marketing advantage: Understand product messaging by extracting and analyzing text from product images
  3. Customer pain point discovery: Uncover common problems that manufacturers are visually addressing with their product features.

Tools We Will Use

  1. Google Shopping API: We can use it to get the page_token linked to the product (used for Google Immersive Product API) from Google Shopping
  2. Google Immersive Product API: Scrape product images using SerpApi's Google Immersive Product API
  3. Pillow (PIL): A powerful image processing library for Python, useful for handling image data. We'll use it to open the image file and have it ready for text extraction.
  4. Tesseract OCR (via pytesseract): An open-source Optical Character Recognition (OCR) engine that can extract text from images.
  5. nltk library: Used to remove stop words like "the" and "a" to focus on meaningful words in the descriptions.
  6. WordCloud library: Used to generate a visual representation of the most frequent words from the descriptions.
  7. matplotlib library: The script uses matplotlib to display the word cloud, with axis labels removed for a cleaner look. You can customize the word cloud’s appearance, including the color scheme, max word count, and image size as you require it.

Steps To Extract And Analyze Image Text

Step 1: Setup your environment

Ensure you have the necessary libraries installed.

pip install google-search-results Pillow pytesseract nltk wordcloud matplotlib

We'll also need to install Tesseract OCR itself. The installation process will depend on your operating system. For MacOS, you can use brew install tesseract.

google-search-results is our Python library. You can use this library to scrape search results from any of SerpApi's APIs.

More About Our Python Libraries

We have two separate Python libraries serpapi and google-search-results, and both work perfectly fine. However, serpapi is a new one, and all the examples you can find on our website are from the old one google-search-results. If you'd like to use our Python library with all the examples from our website, you should install the google-search-results module instead of serpapi.

For this blog post, I am using google-search-results because all of our documentation references this one.

You may encounter issues if you have both libraries installed at the same time. If you have the old library installed and want to proceed with using our new library, please follow these steps:

  1. Uninstall google-search-results module from your environment.
  2. Make sure that neither serpapi nor google-search-results are installed at that stage.
  3. Install serpapi module, for example with the following command if you're using pip: pip install serpapi

Step 2: Get your SerpApi API key

To begin scraping data, first, create a free account on serpapi.com. You'll receive 250 free search credits each month to explore the API.

  • Get your SerpApi API Key from this page.
  • [Optional but Recommended] Set your API key in an environment variable, instead of directly pasting it in the code. Refer here to understand more about using environment variables. For this tutorial, I have saved the API key in an environment variable named "SERPAPI_API_KEY" in my .env file.

Step 3: Scrape Product Images from Google Immersive Product API

Let's set up the imports we'll need and load our .env file which contains our environment variable with the API key.

import csv
from serpapi import GoogleSearch
from dotenv import load_dotenv
import os
import requests, json
from PIL import Image
import pytesseract
import cv2 
import matplotlib.pyplot as plt
from wordcloud import WordCloud
import nltk
from nltk.corpus import stopwords
nltk.download('stopwords')
load_dotenv()

Then add some basic configuration steps:

serpapi_api_key = os.environ["SERPAPI_API_KEY"]
output_image_filename = "downloaded_image.png"
pytesseract.pytesseract.tesseract_cmd = r'/opt/homebrew/bin/tesseract' # Example for macOS
immersive_product_page_token = <PAGE TOKEN FOR A PRODUCT - NEEDED FOR GOOGLE IMMERSIVE PRODUCT API>
💡
If you're using windows, your pytesseract.pytesseract.tesseract_cmd variable may need to be different based on where the folder is stored. That may look like: pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'

Now, let's write a function to use SerpApi to get product images from Google Immersive Product API. First, we need a page_token, which we can get from Google Shopping results.

I made this Google Shopping request in our playground and I got the page_token from the immersive_product_page_token field in the shopping_results part of the response:

You can set the value eyJlaSI6IkpqZmNhTFduQl9tWXdia1A3ZGFvdUFRIiwicHJvZHVjdGlkIjoiIiwiY2F0YWxvZ2lkIjoiMTM4MTQwNzg5ODUwMzQyNTYyNiIsImhlYWRsaW5lT2ZmZXJEb2NpZCI6IjgzMTQ2NTIyMzUwMzE1ODg5MjYiLCJpbWFnZURvY2lkIjoiNjUwODQ1MDc2OTk0NDU4MDM1MSIsInJkcyI6IlBDXzM0ODgwMTQxODc4ODE3Nzk2NTR8UFJPRF9QQ18zNDg4MDE0MTg3ODgxNzc5NjU0IiwicXVlcnkiOiJMRytPTEVEK2V2bytHNCtTZXJpZXMrU21hcnQrVFYrNEsiLCJncGNpZCI6IjM0ODgwMTQxODc4ODE3Nzk2NTQiLCJtaWQiOiI1NzY0NjI3ODM3Nzc5MTUzMTMiLCJwdnQiOiJoZyIsInV1bGUiOm51bGwsImdsIjoidXMiLCJobCI6ImVuIiwiZW5naW5lIjoiZ29vZ2xlX3Nob3BwaW5nIn0= as the page_token parameter in the configuration above.

Then we can use this token as a parameter and make a request to the Google Immersive Product API to get product details.

def get_data_from_google_immersive_product(immersive_product_page_token):
    params = {
        "api_key": serpapi_api_key,
        "engine": "google_immersive_product",
        "page_token": immersive_product_page_token
    }
    search = GoogleSearch(params)
    results = search.get_dict()
    return results.get("product_results", [])

Step 4: Download the Ad Images

Now let's write a function we can use to download the ad images.

def download_image(url, filename):
    try:
        response = requests.get(url, stream=True)
        response.raise_for_status()
        with open(filename, 'wb') as f:
            for chunk in response.iter_content(chunk_size=8192):
                f.write(chunk)
        return filename
    except requests.exceptions.RequestException as e:
        print(f"Error downloading image from {url}: {e}")
        return None

This will download each image for us and throw an error if it's unable to process it.

Step 5: Extract Text from Images

Now let's write a function to extract text from an image using Tesseract OCR.

def extract_text_from_image(image_path):
    try:
        img = cv2.imread(image_path)
        img = cv2.resize(img, None, fx=1.2, fy=1.2, interpolation=cv2.INTER_CUBIC)
        extracted_text = pytesseract.image_to_string(img, lang='eng')    
    except Exception as e:
        print(f"Error during OCR processing of {image_path}: {e}")
        return None
  • cv2.imread(image_path) → loads the image using OpenCV.
  • cv2.resize(...) → slightly enlarges the image (1.2×) to improve text clarity for OCR.
  • pytesseract.image_to_string(...) → runs Tesseract OCR on the image and extracts any text it finds.

This will give us the text from the product images we download.

Here's an example of what it looks like:

[The Product image is on the right and the generated output is on the left]

Data from these images can be used in various different ways. For this tutorial, we'll create a word cloud of commonly mentioned words in product images, so we can see what sellers are prioritizing mentioning in product images.

Step 6: Create a Word Cloud With Text Extracted From All Available Images

You can do some manual research to find common words that show up in images or you can write a small script like this to automate it and show you a word cloud of the most occurring keywords in the images. 

def create_wordcloud(all_extracted_text):
    # Combine all descriptions into one text
    all_text = " ".join(all_extracted_text)

    # Remove stopwords like "the", "a", "an", etc.
    stop_words = set(stopwords.words('english'))

    # Create a word cloud object with custom stopwords and settings
    wordcloud = WordCloud(
        stopwords=stop_words,
        background_color='white',
        width=800,
        height=600,
        max_words=100,
        colormap='coolwarm'  # Customize color scheme here
    ).generate(all_text)

    # Plot the word cloud
    plt.figure(figsize=(10, 8))
    plt.imshow(wordcloud, interpolation='bilinear')
    plt.axis('off')  # Turn off axis labels
    plt.title("Word Cloud of Most Occurring Words in Product Images", fontsize=16)
    plt.show()

Step 7: Write a main function to use the above functions and get a world cloud with data from all product images

if __name__ == "__main__":
    if not serpapi_api_key:
        print("Please set your SERPAPI_API_KEY environment variable or replace 'YOUR_SERPAPI_API_KEY' in the script.")
    else:
        all_extracted_text = []
        print(f"Searching Google Immersive Product for: '{immersive_product_page_token}'")
        product_data = get_data_from_google_immersive_product(immersive_product_page_token)
        if not product_data:
            print("No data found for the query.")
        else:
            found_image_ad = False
            if "thumbnails" in product_data:
                product_images = product_data["thumbnails"]
                print(f"Found {len(product_images)} images.")
                for image in product_images:
                    print(f"\n--- Processing Image ---")
                    downloaded_path = download_image(image, output_image_filename)
                    if downloaded_path:
                        print(f"Image downloaded to: {downloaded_path}")
                        extracted_text = extract_text_from_image(downloaded_path)
                        if extracted_text:
                            print("\n--- Extracted Text from Product Image ---")
                            all_extracted_text.append(extracted_text)
                        else:
                            print("Could not extract text from the image.") 
                create_wordcloud(all_extracted_text)
            else:
                print("No images found in the product data.")

This will create a word cloud and you'll see all of the commonly used words in the text extracted from the images.

💡
Note: Once in a while you may see a message like "Could not extract text from the image." Please keep in mind that Tesseract OCR is not perfect. It is powerful, but it's accuracy will depend on the image quality, font styles and the text orientation. Highly stylized images or low resolution images might yield less accurate results.

Here's what the output looks like:

This can help inform strategy on specific keywords sellers use on product packaging, and feature callouts in pictures. In this case, it applies for an LG OLED Smart TV, but you can use this function on whichever product you are looking to get data on.

Conclusion

Scraping product images using SerpApi's Google Immersive Product API, and then extracting the text from these images gives you a unique edge, helping uncover marketing strategies, common features mentioned on packaging, and hidden insights that simple text scraping would miss.

You can find all the code in this post on my Github here:

GitHub - sonika-serpapi/extract-text-from-google-immersive-product-api-images: Extract Text from Images in Google Immersive Product API using Python
Extract Text from Images in Google Immersive Product API using Python - sonika-serpapi/extract-text-from-google-immersive-product-api-images

If you have any questions, don't hesitate to reach out to me at sonika@serpapi.com.

Scrape Product Detail Information from Google Shopping
Learn how to scrape product information detail from Google Shopping in Python using a simple API.
Scrape Google aggregated product results in mobile
When you do a Google search in mobile, for instance about clothing, Google will neatly return grid of product from difference e-commerce stores. A search on “men’s sweater” will show below results. Google aggregate the same product from different e-commerce stores. Click on any of the item will bring you