Intro

In this blog post, we'll go through the process of extracting filters, featured items, related queries and organic results plus pagination using the Walmart Search Engine Results API and the Python programming language.

You can look at the complete code in the online IDE (Replit).

What will be scraped

wwbs-walmart-search 1

📌Note: By default, Walmart returns 40 results. In this case, 8 results are displayed to make the image more compact.

Why using API?

There are a couple of reasons that you may use an API, ours in particular:

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

serpapi-status-all

Full Code

This code retrieves all the data with pagination:

from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json

params = {
    'api_key': '...',           # https://serpapi.com/manage-api-key
    'engine': 'walmart',        # SerpApi search engine	
    'query': 'coffee marker',   # the search query
    'spelling': True,           # activate spelling fix
    'sort': 'best_match',       # sorted by different options
    'min_price': 100,           # minimum price
    'max_price': 150,           # maximum price
}

search = GoogleSearch(params)   # where data extraction happens on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict

walmart_results = {
    'search_information': results.get('search_information'),
    'filters': results.get('filters'),
    'organic_results': [],
    'featured_item': results.get('featured_item'),
    'related_queries': results.get('related_queries'),
}

while 'next' in results.get('serpapi_pagination', {}):
    # add data from current page
    walmart_results['organic_results'].extend(results['organic_results'])

    # update search object
    search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))

    # get updated information from next page
    results = search.get_dict()

print(json.dumps(walmart_results, indent=2, ensure_ascii=False))

Preparation

Install library:

pip install google-search-results

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import json
Library Purpose
GoogleSearch to scrape and parse Google results using SerpApi web scraping library.
urlsplit this should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.
parse_qsl to parse a query string given as a string argument.
json to convert extracted data to a JSON object.

The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary:

params = {
    'api_key': '...',           # https://serpapi.com/manage-api-key
    'engine': 'walmart',        # SerpApi search engine	
    'query': 'coffee marker',   # the search query
    'spelling': True,           # activate spelling fix
    'sort': 'best_match',       # sorted by different options
    'min_price': 100,           # minimum price
    'max_price': 150,           # maximum price
}
Parameters Explanation
api_key Parameter defines the SerpApi private key to use. You can find it under your account -> API key
engine Set parameter to walmart to use the Walmart API engine.
query Parameter defines the search query. You can use anything that you would use in a regular Walmart search.
spelling Activate spelling fix. True (default) includes spelling fix, False searches without spelling fix.
sort Parameter defines sorting. (e.g. price_low, price_high, best_seller, best_match, rating_high, new)
min_price Lower bound of price range query.
max_price Upper bound of price range query.

📌Note: You can also add other API Parameters.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = GoogleSearch(params)   # data extraction on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict

You may have noticed that I made a mistake when passing the value to the q parameter. This was done on purpose to demonstrate that SerpApi's Walmart Spell Check API allows you to extract the corrected search term and search it:

print(results['search_information']['spelling_fix'])    # coffee marker

At the moment, the results dictionary only stores data from 1 page. Before extracting data, the walmart_results dictionary is created where this data will be added later. Since the search_information, filters, featured_item and related_queries are repeated on each subsequent page, you can extract them immediately:

walmart_results = {
    'search_information': results.get('search_information'),
    'filters': results.get('filters'),
    'organic_results': [],
    'featured_item': results.get('featured_item'),
    'related_queries': results.get('related_queries'),
}

To get all organic results, you need to apply Walmart Pagination API. This is achieved by the following check: while the next page exists in the serpapi_pagination dictionary, we fetch the data from the current page, update the JSON data in the search object, and get updated information about the next page:

while 'next' in results.get('serpapi_pagination', {}):
    # add data from current page
    # ...

    # update search object
    search.params_dict.update(dict(parse_qsl(urlsplit(results.get('serpapi_pagination', {}).get('next')).query)))

    # get updated information from next page
    results = search.get_dict()

Extending the walmart_results['organic_results'] list with new data from this page:

# add data from current page
walmart_results['organic_results'].extend(results['organic_results'])

# title = results['organic_results'][0]['title']
# thumbnail = results['organic_results'][0]['thumbnail']
# rating = results['organic_results'][0]['rating']
# reviews = results['organic_results'][0]['reviews']
# price = results['organic_results'][0]['primary_offer']['offer_price']

📌Note: In the comments above, I showed how to extract specific fields. You may have noticed the results['organic_results'][0]. This is the index of a product, which means that we are extracting data from the first product. The results['organic_results'][1] is from the second product and so on.

After all the data is retrieved, it is output in JSON format:

print(json.dumps(walmart_results, indent=2, ensure_ascii=False))

Output

{
  "search_information": {
    "location": {
      "postal_code": "60602",
      "province_code": "IL",
      "city": "Chicago",
      "store_id": "5402"
    },
    "total_results": 152051,
    "query_displayed": "coffee marker",
    "organic_results_state": "Results for exact spelling",
    "spelling_fix": "coffee maker"
  },
  "filters": null,
  "organic_results": [
    {
      "us_item_id": "622343372",
      "product_id": "363IFK4JZENM",
      "title": "Nespresso Vertuo Plus Coffee and Espresso Maker by De'Longhi, Black",
      "thumbnail": "https://i5.walmartimages.com/asr/b80b2bf3-f47c-494d-be9c-bd5b548760f9.b4bcbb88b02aaef77b5df4c697c22ab4.jpeg?odnHeight=180&odnWidth=180&odnBg=FFFFFF",
      "rating": 4.7,
      "reviews": 1603,
      "seller_id": "F55CDC31AB754BB68FE0B39041159D63",
      "seller_name": "Walmart.com",
      "fulfillment_badges": [
        "3+ day shipping"
      ],
      "two_day_shipping": false,
      "out_of_stock": false,
      "sponsored": true,
      "muliple_options_available": false,
      "primary_offer": {
        "offer_id": "8952A2034C634B9C9166D9A720E1DC5B",
        "offer_price": 127,
        "min_price": 0
      },
      "price_per_unit": {
        "unit": "each",
        "amount": ""
      },
      "product_page_url": "https://www.walmart.com/ip/Nespresso-Vertuo-Plus-Coffee-and-Espresso-Maker-by-De-Longhi-Black/622343372?athbdg=L1800",
      "serpapi_product_page_url": "https://serpapi.com/search.json?device=desktop&engine=walmart_product&product_id=622343372"
    },
    ... other results
  ],
  "featured_item": null,
  "related_queries": null
}

📌Note: Head to the playground for a live and interactive demo.

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞