Web scraping Walmart Search with Nodejs
What will be scraped
Full code
If you don't need an explanation, have a look at the full code example in the online IDE
import dotenv from "dotenv";
import { config, getJson } from "serpapi";
dotenv.config();
config.api_key = process.env.API_KEY; //your API key from serpapi.com
const resultsLimit = 40; // hardcoded limit for demonstration purpose
const engine = "walmart"; // search engine
const params = {
query: "macnook pro", // Parameter defines the search query
page: 1, // Value is used to get the items on a specific page
device: "desktop", // Parameter defines the device to use to get the results
store_id: "2280", //Store ID to filter the products by the specific store only
//other parameters: https://serpapi.com/walmart-search-api#api-parameters
};
const getResults = async () => {
const results = {
fixedQuery: null,
organicResults: [],
};
while (results.organicResults.length < resultsLimit) {
const json = await getJson(engine, params);
if (!results.fixedQuery) results.fixedQuery = json.search_information?.spelling_fix;
if (json.organic_results) {
results.organicResults.push(...json.organic_results);
params.page += 1;
} else break;
}
return results;
};
getResults().then((result) => console.dir(result, { depth: null }));
Why use Walmart Search Engine Results API from SerpApi?
Using APIs generally solve all or most problems that might be encountered while creating own parser or crawler. From a webscraping perspective, our API can help to solve the most painful problems:
- Bypass blocks from supported search engines by solving CAPTCHA or IP blocks.
- No need to create a parser from scratch and maintain it.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation if there's a need to extract data in large amounts faster.
Head to the Playground for a live and interactive demo.
Preparation
First, we need to create a Node.js* project and add npm
packages serpapi
and dotenv
.
To do this, in the directory with our project, open the command line and enter:
$ npm init -y
And then:
$ npm i serpapi dotenv
*If you don't have Node.js installed, you can download it from nodejs.org and follow the installation documentation.
-
SerpApi package is used to scrape and parse search engine results using SerpApi. Get search results from Google, Bing, Baidu, Yandex, Yahoo, Home Depot, eBay, and more.
-
dotenv package is a zero-dependency module that loads environment variables from a
.env
file intoprocess.env
.
Next, we need to add a top-level "type" field with a value of "module" in our package.json
file to allow using ES6 modules in Node.JS:
For now, we complete the setup Node.JS environment for our project and move to the step-by-step code explanation.
Code explanation
First, we need to import dotenv
from dotenv
library, and config
and getJson
from serpapi
library:
import dotenv from "dotenv";
import { config, getJson } from "serpapi";
Then, we apply some config. Call dotenv
config()
method, set your SerpApi Private API key to global config
object, and how many results we want to receive (resultsLimit
constant).
dotenv.config();
config.api_key = process.env.API_KEY; //your API key from serpapi.com
const resultsLimit = 40; // hardcoded limit for demonstration purpose
dotenv.config()
will read your.env
file, parse the contents, assign it toprocess.env
, and return an object with aparsed
key containing the loaded content or anerror
key if it failed.config.api_key
allows you declare a globalapi_key
value by modifying the config object.
Next, we write search engine
and write the necessary search parameters for making a request (get the full JSON list of supported Walmart Stores):
๐Note: I specifically made a mistake in the search query to demonstrate how Walmart Spell Check API works.
const engine = "walmart"; // search engine
const params = {
query: "macnook pro", // Parameter defines the search query
page: 1, // Value is used to get the items on a specific page
device: "desktop", // Parameter defines the device to use to get the results
store_id: "2280", //Store ID to filter the products by the specific store only
};
๐Note: Also see SerpApi Python demo project of extracting data from 500 Walmart stores and analyzing extracted data if you want to know more about scraping Walmart.
You can see all available parameters in the API documentation.
Next, we declare the function getResult
that gets data from the page and return it:
const getResults = async () => {
...
};
In this function we need to declare an object two keys: fixedQuery
is equal to null
, and empty organicResults
array, then and using while
loop get json
with results, add spelling_fix
to the fixedQuery
on the first iteration, and add organic_results
to organicResults
array (push()
method) from each page and set next page index (to params.page
value).
If there are no more results on the page or if the number of received results is more than reviewsLimit
we stop the loop (using break
) and return an array with results:
const results = {
fixedQuery: null,
organicResults: [],
};
while (results.organicResults.length < resultsLimit) {
const json = await getJson(engine, params);
if (!results.fixedQuery) results.fixedQuery = json.search_information?.spelling_fix;
if (json.organic_results) {
results.organicResults.push(...json.organic_results);
params.page += 1;
} else break;
}
return results;
And finally, we run the getResults
function and print all the received information in the console with the console.dir
method, which allows you to use an object with the necessary parameters to change default output options:
getResults().then((result) => console.dir(result, { depth: null }));
Output
{
"fixedQuery":"macbook pro",
"organicResults":[
{
"us_item_id":"121393924",
"product_id":"18F5MJ3R95JG",
"title":"Apple MacBook Air, 13.3-inch, Intel Core i5, 4GB RAM, Mac OS, 128GB SSD, Bundle: Black Case, Wireless Mouse, Bluetooth Headset - Silver",
"thumbnail":"https://i5.walmartimages.com/asr/60e5ea72-ac15-4bdc-b112-572f76776e83.77df8900f9478a7a581dad9a6698ecd5.jpeg?odnHeight=180&odnWidth=180&odnBg=FFFFFF",
"rating":3.5,
"reviews":100,
"seller_id":"F86EF73A620D4265AEE28E9FD77A4ED1",
"seller_name":"Certified 2 Day Express",
"fulfillment_badges":[
"2-day shipping"
],
"two_day_shipping":false,
"out_of_stock":false,
"sponsored":true,
"muliple_options_available":false,
"primary_offer":{
"offer_id":"53A0316C100D4D1EBD2AD8753FC4FE25",
"offer_price":349,
"min_price":0
},
"price_per_unit":{
"unit":"each",
"amount":""
},
"product_page_url":"https://www.walmart.com/ip/Apple-MacBook-Air-13-3-inch-Intel-Core-i5-4GB-RAM-Mac-OS-128GB-SSD-Bundle-Black-Case-Wireless-Mouse-Bluetooth-Headset-Silver/121393924",
"serpapi_product_page_url":"https://serpapi.com/search.json?device=desktop&engine=walmart_product&product_id=121393924"
},
... and other results
]
}
Links
- Code in the online IDE
- Walmart Search Engine Results API Documentation
- Walmart Search Engine Results API Playground
If you want other functionality added to this blog post or if you want to see some projects made with SerpApi, write me a message.
Add a Feature Request๐ซ or a Bug๐