Movie Ratings Data Visualization with Google SERP API in Python

Google is a great source of valuable data. There are numerous use cases for scraping Google data, such as SEO, finance, local results, and more. This blog post showcases the process of scraping movie ratings data from Google as presented on platforms like IMDb and Rotten Tomatoes. We will also use the data we scrape to plot a graph for better visualization.

We will be using SerpApi's Google Search API to scrape the data from Google. It is the simplest, fastest, and most complete SERP API.

Why use an API?

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • No need to pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi takes care of everything mentioned above with fast response times under ~2.47 seconds (~1.33 seconds with Ludicrous speed) per request. The result is well-structured data in JSON with only a single API call.

Response times and success rates are shown under the SerpApi Status page.

Build

Prerequisite

Install library:

pip install serpapi

serpapi is an official SerpApi API python package.

You also need an API key, you can get free 100 searches upon sign up.

Define movies to scrape

movies = [
  "Iron Man",
  "The Incredible Hulk",
  "Iron Man 2",
  "Thor",
  "Captain America: The First Avenger",
  "The Avengers",
  "Iron Man 3",
  "Thor: The Dark World",
  "Captain America: The Winter Soldier",
  "Guardians of the Galaxy",
  "Avengers: Age of Ultron",
  "Ant-Man",
  "Captain America: Civil War",
  "Doctor Strange",
  "Guardians of the Galaxy Vol. 2",
  "Spider-Man: Homecoming",
  "Thor: Ragnarok",
  "Black Panther",
  "Avengers: Infinity War",
  "Ant-Man and the Wasp",
  "Captain Marvel",
  "Avengers: Endgame",
  "Spider-Man: Far From Home",
  "Black Widow",
  "Shang-Chi and the Legend of the Ten Rings",
  "Eternals",
  "Spider-Man: No Way Home",
  "Doctor Strange in the Multiverse of Madness",
  "Thor: Love and Thunder",
  "Black Panther: Wakanda Forever",
  "Ant-Man and the Wasp: Quantumania",
  "Guardians of the Galaxy Vol. 3",
  "The Marvels",
]

This is a list of movies that have already been released by Marvel. It would be interesting to see how they are doing from the first to the most recent movie.

Define get_rating function

from functools import reduce
import serpapi

def get_rating(title):
  response = serpapi.search(
    engine="google", 
    q=f"{title} movie rating", 
    hl="en", 
    gl="us", 
    api_key="<Paste your API key>"
  )

  if "knowledge_graph" in response and "editorial_reviews"in response["knowledge_graph"]:
    return reduce(lambda result, platform: {**result, platform["title"]: platform["rating"]}, response["knowledge_graph"]["editorial_reviews"], {})

We define a function call get_rating which accept title. This function will return an Object consisting of the platform name and its ratings. For example,

{
  "IMDb": "8.9/10",
  "Rotten Tomatoes": "89%",
  ...
}

Feel free to checkout SerpApi's playground to understand the structure of the API response.

Scrape the data

With movies and get_rating defined, we can now loop through each of the movie's titles and scrape the ratings.

result = []
for title in movies:
  ratings = get_rating(title)
  result.append({
    "title": title,
    "imdb": ratings["IMDb"],
    "rotten": ratings["Rotten Tomatoes"]
  })

Plot graph

Matplotlib is a great Python library for data visualization.

import matplotlib.pyplot as plt

title = list(map(lambda x: x["title"], result))
imdb = list(map(lambda x: float(x["imdb"].split("/")[0]), result))
rotten = list(map(lambda x: int(x["rotten"].strip("%")), result))

plt.figure(figsize=(12, 8))

# IMDb Ratings
plt.scatter(title, imdb, color='blue', label='IMDb Ratings')

# Rotten Tomatoes Ratings
# plt.scatter(title, rotten, color='red', label='Rotten Tomatoes Ratings')

plt.xticks(rotation=90)
plt.ylabel('Rating')
plt.title('Movie Ratings Comparison')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Results

IMDb
Rotten Tomatoes

Combining both platforms after adjusting Rotten Tomatoes to a base 10 rating.

Full code

from functools import reduce
import serpapi

def get_rating(title):
  response = serpapi.search(
    engine="google", 
    q=f"{title} movie rating", 
    hl="en", 
    gl="us", 
    api_key="<Paste your API key>"
  )

  if "knowledge_graph" in response and "editorial_reviews"in response["knowledge_graph"]:
    return reduce(lambda result, platform: {**result, platform["title"]: platform["rating"]}, response["knowledge_graph"]["editorial_reviews"], {})

movies = [
  "Iron Man",
  "The Incredible Hulk",
  "Iron Man 2",
  "Thor",
  "Captain America: The First Avenger",
  "The Avengers",
  "Iron Man 3",
  "Thor: The Dark World",
  "Captain America: The Winter Soldier",
  "Guardians of the Galaxy",
  "Avengers: Age of Ultron",
  "Ant-Man",
  "Captain America: Civil War",
  "Doctor Strange",
  "Guardians of the Galaxy Vol. 2",
  "Spider-Man: Homecoming",
  "Thor: Ragnarok",
  "Black Panther",
  "Avengers: Infinity War",
  "Ant-Man and the Wasp",
  "Captain Marvel",
  "Avengers: Endgame",
  "Spider-Man: Far From Home",
  "Black Widow",
  "Shang-Chi and the Legend of the Ten Rings",
  "Eternals",
  "Spider-Man: No Way Home",
  "Doctor Strange in the Multiverse of Madness",
  "Thor: Love and Thunder",
  "Black Panther: Wakanda Forever",
  "Ant-Man and the Wasp: Quantumania",
  "Guardians of the Galaxy Vol. 3",
  "The Marvels",
]

result = []
for title in movies:
  ratings = get_rating(title)
  result.append({
    "title": title,
    "imdb": ratings["IMDb"],
    "rotten": ratings["Rotten Tomatoes"]
  })

import matplotlib.pyplot as plt

title = list(map(lambda x: x["title"], result))
imdb = list(map(lambda x: float(x["imdb"].split("/")[0]), result))
rotten = list(map(lambda x: int(x["rotten"].strip("%")) / 10, result))

plt.figure(figsize=(12, 8))

# IMDb Ratings
plt.scatter(title, imdb, color='blue', label='IMDb Ratings')

# Rotten Tomatoes Ratings
plt.scatter(title, rotten, color='red', label='Rotten Tomatoes Ratings')

plt.xticks(rotation=90)
plt.ylabel('Rating')
plt.title('Movie Ratings Comparison')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()

Conclusion

Google can be a great data source for your application and using SERP APIs can help you to gather the data you need in a fast, easy, and powerful way. In SerpApi, we abstract the complexity of scraping search engines, so that you can focus on your core business offering.

If you have any questions, please feel free to reach out to me.


Join us on X | YouTube

Add a Feature Request💫 or a Bug🐞