Scrape Yahoo Shopping with Python

Through Yahoo Shopping, users can search for a wide array of items, from electronics and clothing to home goods and health products. Its interface offers users the ability to easily compare prices, product specifications, and seller ratings across multiple vendors, making it a convenient tool for online shoppers looking for the best deals.

Additionally, Yahoo Shopping incorporates reviews and ratings, both for individual products and for the vendors selling them, ensuring users can make informed decisions about their purchases.

Introduction

In recent years, Yahoo Shopping has leveraged advanced technologies, such as GraphQL for efficient data retrieval, which has resulted in faster, more responsive user experiences.

This made it challenging for some users to scrape Yahoo shopping as it's not regular HTML response scraping.

Getting Started

To begin with, let's first understand what we need:

  • Python: You should have Python 3.6 or above installed on your machine.
  • Libraries: We'll be using the requests library for making HTTP requests and the json library to work with JSON data.

You can install the requests library using pip:

pip install requests

Overview of Yahoo Shopping's GraphQL API

Yahoo Shopping employs a GraphQL API for its data, which differs from typical REST APIs. GraphQL APIs allow us to specify exactly what data we need, leading to more efficient data retrieval.

We'll use a specific GraphQL query called searchProduct to fetch product data. In this case, we are searching for "coffee."

Scripting

We create a function, shopping_yahoo_gql(), which prepares the GraphQL request we will send to the Yahoo Shopping API. The request is a dictionary containing information like the operation name (searchProduct), our search keyword (coffee), and other necessary details.

def shopping_yahoo_gql():
    headers = {
        'Content-Type': 'application/json',
    }
    post = {
        "operationName": "searchProduct",
        "variables": {
            "searchRequest": {
                "keyword": "coffee",
                "sourceTypes": ["PRODUCT"],
                "fieldSets": ["ITEMS"],
                "imageSize": '400x400',
                "pageId": 'affiliate-shop-srp',
                "siteId": 'us-shopping',
                "countryCode": 'US',
                "lang": 'en',
            },
        },
        "query": "query searchProduct($searchRequest: SearchRequest) { search(searchRequest: $searchRequest) { totalCount items { provider gtin itemId title price salePrice currency image vendor vendorId } }}"
    }

    url = 'https://shopping.yahoo.com/graphql'
    response = requests.post(url, headers=headers, data=json.dumps(post))
    return response.json()

The function scrape_yahoo_shopping() is where we parse the response from the API. We loop through the items in the data field of the response, extracting the relevant fields and appending them to the shopping_results list.

def scrape_yahoo_shopping():
    data_gql = shopping_yahoo_gql()

    if data_gql:
        shopping_results = [{
            "position": index + 1,
            "product_id": item.get('gtin'),
            "link": f"https://shopping.yahoo.com/product/{item.get('gtin')}",
            "title": item.get('title'),
            "seller": item.get('vendor'),
            "price": float(item.get('price', 0)),
            "sale_price": float(item.get('salePrice', 0)) if item.get('salePrice') != item.get('price') else None,
            "thumbnail": item.get('image'),
        } for index, item in enumerate(data_gql.get('data', {}).get('search', {}).get('items', []))]

        print(json.dumps({"shopping_results": shopping_results}, indent=2))

Finally, we call our scrape_yahoo_shopping() function in the script's main entry point. This will execute our scraping function and print the results in the console:

if __name__ == "__main__":
    scrape_yahoo_shopping()

FULL and Final script

import requests
import json

def shopping_yahoo_gql():
    headers = {
        'Content-Type': 'application/json',
    }
    post = {
        "operationName": "searchProduct",
        "variables": {
            "searchRequest": {
                "keyword": "coffee",  # we're setting the search keyword to "coffee"
                "sourceTypes": ["PRODUCT"],
                "fieldSets": ["ITEMS"],
                "imageSize": '400x400',
                "pageId": 'affiliate-shop-srp',
                "siteId": 'us-shopping',
                "countryCode": 'US',
                "lang": 'en',
            },
        },
        "query": "query searchProduct($searchRequest: SearchRequest) { search(searchRequest: $searchRequest) { totalCount items { provider gtin itemId title price salePrice currency image vendor vendorId } }}"
    }

    url = 'https://shopping.yahoo.com/graphql'
    response = requests.post(url, headers=headers, data=json.dumps(post))
    return response.json()


def scrape_yahoo_shopping():
    data_gql = shopping_yahoo_gql()

    if data_gql:
        shopping_results = [{
            "position": index + 1,
            "product_id": item.get('gtin'),
            "link": f"https://shopping.yahoo.com/product/{item.get('gtin')}",
            "title": item.get('title'),
            "seller": item.get('vendor'),
            "price": float(item.get('price', 0)),
            "sale_price": float(item.get('salePrice', 0)) if item.get('salePrice') != item.get('price') else None,
            "thumbnail": item.get('image'),
        } for index, item in enumerate(data_gql.get('data', {}).get('search', {}).get('items', []))]

        print(json.dumps({"shopping_results": shopping_results}, indent=2))


if __name__ == "__main__":
    scrape_yahoo_shopping()

The result of running the script:

The results here are identical to what we provide at SerpApi, check out our Yahoo shopping API documentation, the difference is SerpApi provides faster and captcha-solving solutions, and we provide all the filters that Yahoo shopping has and easily controllable pagination.

Ending

Do you think Yahoo implementing GraphQL could make it a suitable replacement for Google shopping? - Let us know what you think at our Twitter @serp_api

Don't miss the other blog post about Scraping Naver Video Search Results using Python

Happy scraping!