Visualizing Amazon Search Data Using Python

The Amazon marketplace is a colossal, ever changing one. For anyone selling products, sourcing or even just researching them, the sheer volume of data is overwhelming. Manual browsing can get you basic insights on what's popular, but is biased on your search history and doesn't get you deeper competitive insights about products and sellers.

So what if we could programmatically extract this data and transform it into compelling visuals that reveal insights into pricing strategies and emerging products?

SerpApi offers an Amazon Search API which, when combined with Python's visualization libraries like Matplotlib and Seaborn, enables us to turn raw JSON data into actionable, visual insights.

Why Use SerpApi

SerpApi manages the intricacies of scraping and returns structured JSON results, which allows you to save time and effort. We take care of proxies and any CAPTCHAs that might be encountered, so that you don't have to worry about your searches being blocked.

We also do all the work to maintain our parsers. This is important, as Bing and other search engines are constantly experimenting with new layouts, new elements, and other changes. By taking care of this for you on our side, we eliminate a lot of time and complexity from your workflow.

SerpApi's Amazon Search API

SerpApi's /search?engine=amazon API endpoint allows you to scrape the results from Amazon Search with a simple GET request:

Amazon Search API - SerpApi
Scrape Amazon search engine products and categories with SerpApi’s Amazon Search API. Product links, prices, ratings, delivery information, sponsored products, and more.

It's useful for gathering lists of products, their basic details and seller information. Here are the parts you can scrape from the Amazon Search Page:

If you'd like to see a live and interactive demo of what results from the API look like, you can head over to our playground:

SerpApi Playground - SerpApi
Test SerpApi’s Google Search, Google Maps, YouTube, Bing, Walmart, Ebay, Baidu, Yandex and more APIs for free in the interactive playground!

Getting Started

You can use our APIs in multiple languages, but for the purposes of this blog post, I'm going to be using Python.

To begin scraping data, first, create a free account on serpapi.com. You'll receive one hundred free search credits each month to explore the API.

  • Get your SerpApi API Key from this page.
  • [Optional but Recommended] Set your API key in an environment variable, instead of directly pasting it in the code. Refer here to understand more about using environment variables. For this tutorial, I have saved the API key in an environment variable named "SERPAPI_API_KEY" in my .env file.
  • Next, on your local computer, you need to install thegoogle-search-results Python library: pip install google-search-results
You can use this library to scrape search results from any of SerpApi's APIs.

More About Our Python Libraries

We have two separate Python libraries serpapi and google-search-results, and both work perfectly fine. However, serpapi is a new one, and all the examples you can find on our website are from the old one google-search-results. If you'd like to use our Python library with all the examples from our website, you should install the google-search-results module instead of serpapi.

For this blog post, I am using google-search-results because all of our documentation references this one.

You may encounter issues if you have both libraries installed at the same time. If you have the old library installed and want to proceed with using our new library, please follow these steps:

  1. Uninstall google-search-results module from your environment.
  2. Make sure that neither serpapi nor google-search-results are installed at that stage.
  3. Install serpapi module, for example with the following command if you're using pip: pip install serpapi

In addition to SerpApi, we'll need to install a few more libraries:

  1. Pandas: Useful for organizing data. Install this with: pip install pandas
  2. Matplotlib: Useful for building static plots. Install this with pip install matplotlib
  3. Seaborn: Built on Matplotlib, great for statistical plots and aesthetic details. Install this with pip install seaborn

Use Case and Goals

Let's establish some goals based on the data we want to analyze and the insights we want to gather from it. For this blog post, the goal is to understand the following things for a product category:

  • Understand pricing distribution
  • Understand which products are selling and analyze ratings
  • Understand the sponsored vs organic search landscape

We'll plot these in different ways based on the data, which will help in visualizing the insights.

Let's Begin By Extracting Data From SerpApi's APIs

Let's use the google-search-results library to fetch the first 10 pages of amazon search results for Air Fryers from SerpApi:

from serpapi import GoogleSearch
import os

max_page = 10
prices = []
unit_prices = []

for page in range(1, max_page + 1):
    params = {
    "engine": "amazon",
    "k": "air fryer",
    "amazon_domain": "amazon.com",
    "api_key": os.environ["SERPAPI_API_KEY"],
    "page": page
    }

    search = GoogleSearch(params)
    results = search.get_dict()
    print(results)

This should pull all the results for the query "air fryer" and print it in JSON format. Following this, we can organize and use this data for visualizations.

Price Distribution Analysis

Pricing is one of the most important levers for success on Amazon. With this analysis we want to answer the fundamental question "Where do I fit in?". Pricing above mean price may signal premium quality but may result in fewer sales unless your branding and reviews are otherwise exceptional.

So, let's extract all the product prices from the scraped data and plot their frequency (We'll use the first 10 pages of results in this case):

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from serpapi import GoogleSearch
import os

max_page = 10
prices = []

for page in range(1, max_page + 1):
    params = {
    "engine": "amazon",
    "k": "air fryer",
    "amazon_domain": "amazon.com",
    "api_key": os.environ["SERPAPI_API_KEY"],
    "page": page
    }

    search = GoogleSearch(params)
    results = search.get_dict()
    for result in results.get("organic_results", []):
        price = result.get("extracted_price")
        if price:
            prices.append(price)

# DATA STRUCTURING: Create the DataFrame 
df = pd.DataFrame(prices, columns=["Price"])

# PLOT THE DATA
plt.figure(figsize=(10, 6))
sns.histplot(df["Price"], bins=15, kde=True)
plt.title("Distribution of Air Fryer Prices on Amazon")
plt.xlabel("Price (USD)")
plt.ylabel("Number of Products")
plt.grid(True)
plt.show()

Here's the resulting visualization, which helps shed light on pricing:

This visualization clearly shows that most listings are clustered around the $50-$150 price point, with some premium outliers.

Markets change. Running this analysis regularly would help in reacting to new competitors and overall market price shift, ensuring you remain competitive over time.

Product Demand and Ratings Analysis

While high sales are great, they often don't tell the whole story. A product might sell well due to aggressive marketing but have poor ratings, which indicate future problems with returns and brand reputation. Conversely, a high rated product with low sales might be a hidden gen. This chart helps you find the sweet spot.

We can divide the products we found above in 4 different quadrants to understand the product demand and ratings -

  • High Sales, High Ratings
  • High Sales, Low Ratings
  • Low Sales, Low Ratings
  • Low Sales, High Ratings

This will help in finding products with untapped potential, as well as see who the market leaders are.

To do this, we can use the reviews field and the bought_last_month field in the results, and plot a scatterplot of the resulting data for all products on the first page.

💡
One of the challenges here is that the units sold data in the bought_last_month field comes as text (eg: 1K+ bought in past month). The code must first parse this text into a number before it can be plotted.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from serpapi import GoogleSearch
import os
import re
from matplotlib.lines import Line2D

max_page = 1
organic_results = []

for page in range(1, max_page + 1):
    params = {
    "engine": "amazon",
    "k": "air fryer",
    "amazon_domain": "amazon.com",
    "api_key": os.environ["SERPAPI_API_KEY"],
    "page": page
    }

    search = GoogleSearch(params)
    results = search.get_dict()
    for organic_result in results.get("organic_results", []):
        organic_result["title"] = organic_result.get("title", "")[:50] + "..."
        if not organic_result.get("bought_last_month"):
            organic_result["bought_last_month"] = 0
        organic_results.append(organic_result)

# DATA PARSING FUNCTION: This function cleans the text data.
def parse_units_sold(text):
    if not isinstance(text, str):
        return 0
    # Use regular expressions to find numbers
    numbers = re.findall(r'\d+\.?\d*', text)
    if not numbers:
        return 0
    num = float(numbers[0])
    if 'K+' in text:
        num *= 1000
    elif 'M+' in text:
        num *= 1000000
    return int(num)
    
# DATA STRUCTURING & PARSING: Create the DataFrame and apply the parsing function.
df = pd.DataFrame(organic_results)
df['bought_last_month'] = df['bought_last_month'].apply(parse_units_sold)

# PLOT THE DATA ALONG WITH A LEGEND
plt.figure(figsize=(12, 8))
plot = sns.scatterplot(data=df, x='bought_last_month', y='rating', s=200, alpha=0.7)

for i, row in df.iterrows():
    label = chr(65+i)
    plot.text(row['bought_last_month']+30, row['rating'], label, fontsize=12, fontweight='bold')

legend_elements = []
for i, row in df.iterrows():
    label = chr(65+i)
    legend_label = f"{label}: {row['title']}"
    legend_elements.append(Line2D([0],[0], marker='o', color='w', label=legend_label, markerfacecolor='grey', markersize=0))
    
    plt.legend(handles=legend_elements, title='Product Key', loc='upper left', bbox_to_anchor=(1.05, 1))


plt.title('Product Demand vs. Customer Satisfaction', fontsize=16)
plt.xlabel('Estimated Units Sold in Past Month', fontsize=12)
plt.ylabel('Average Customer Rating', fontsize=12)
plt.grid(True)
plt.show()
💡
Some products may not have the bought_last_month field present, and in that case, I decided to set it to 0 for now as it generally means there is not enough data about the product. You can choose to modify the code to ignore those products if you wish, or list those separately, just with their reviews.

Here's the resulting visualization, which helps shed light on product units sold vs average customer rating:

Based on this, we can understand a few things:

  • The products in the top left are hidden gems as they are likely high quality products that customers love, but are lacking in market penetration, possibly due to poor marketing or being new in the market.
  • The products in the bottom right may be popular now, but they are at risk as their ratings decrease. The poor reviews may signal an opportunity to launch a higher quality alternative.
  • The products in the top right are undisputed market leaders. These are products that have successfully balanced high demand with high quality.

Understanding the number of sponsored results vs organic results tells us how much we need to use ads in a given category. This visualization is crucial for budgeting and channel strategy.

This can also be used as evidence to stakeholders about the competitive advertising landscape, to justify the need for a pay per click sort of advertising budget.

To create the visualization, we'll count the number of sponsored and organic results from the first two pages and then visualize those counts as proportion of a whole.

import pandas as pd
import matplotlib.pyplot as plt
from serpapi import GoogleSearch
import os

max_page = 2
all_results = []

for page in range(1, max_page + 1):
    params = {
    "engine": "amazon",
    "k": "air fryer",
    "amazon_domain": "amazon.com",
    "api_key": os.environ["SERPAPI_API_KEY"],
    "page": page
    }

    search = GoogleSearch(params)
    results = search.get_dict()
    for organic_result in results.get("organic_results", []):
        all_results.append(organic_result)
    if "product_ads" in results:
        for sponsored_result in results["product_ads"].get("products", []):
            all_results.append(sponsored_result)
    if "featured_products" in results:
        for featured_result in results["featured_products"][0].get("products", []):
            all_results.append(featured_result)
    if "video_results" in results:
        for video_result in results["video_results"][0].get("products", []):
            all_results.append(video_result)

df = pd.DataFrame(all_results)

# DATA AGGREGATION: We count the number of sponsored and organic listings.
sponsored_counts = df['sponsored'].value_counts().get(True, 0)
organic_counts = len(df) - sponsored_counts
all_counts = [organic_counts, sponsored_counts]

# PLOTTING: We create the pie chart.
plt.figure(figsize=(8, 8))
plt.pie(
    all_counts,
    labels=['Organic', 'Sponsored'],
    autopct='%1.1f%%',
    startangle=90,
    colors=['#66b3ff','#ff9999']
)
plt.title('Sponsored vs. Organic Listings on First Two Pages', fontsize=16)
plt.show()

Here's the resulting visualization:

A high percentage of sponsored results (48.4% as in this example) means the category is highly competitive. Relying on only organic SEO will be difficult and you'll likely need a significant advertising budget to be seen by customers on the first few pages on Amazon.

Conclusion

As we've explored, data visualization can be key in unlocking a distinct competitive advantage by spotting trends before competitors and optimizing ad spend. You can find all the code for the examples on Github:

GitHub - sonika-serpapi/visualizing-amazon-search-data-using-python-examples: Code examples for blog post: Visualizing Amazon Search Data Using Python
Code examples for blog post: Visualizing Amazon Search Data Using Python - sonika-serpapi/visualizing-amazon-search-data-using-python-examples

I hope you found this tutorial helpful to understand how to visualize data from SerpApi's Amazon Search API using python. If you have any questions, don't hesitate to reach out to me at sonika@serpapi.com.

We're also working on an Amazon Product API. If you're interested in it, feel free to check out the open Public Roadmap issue, and let us know of your interest:

[New API] Amazon Product Detail API · Issue #2844 · serpapi/public-roadmap
A user requested to scrape the Amazon product details. Intercom I believe it’s also worth sharing with all the users who were interested in our Amazon Search API: #811
Scrape Amazon Product Data (Python & JS Tutorial 2025)
Learn how to scrape Amazon search results to get product data, including product name, rating, price and more using a simple API.
Scraping Product Details from Walmart, Home Depot and Google Shopping with SerpApi
Understanding the link between search results and product pages.
How to Scrape Bing Shopping Results
Google has Google Shopping results and in my previous blog post, I showed you how to scrape shopping results from Google with a few lines of code. Bing from Microsoft also provides rich shopping results with Bing Shopping Results. Luckily, SerpApi offers high-quality Bing Shopping Results APIs. By scraping Bing