What will be scraped

what

Full code

If you don't need an explanation, have a look at the full code example in the online IDE

const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");

puppeteer.use(StealthPlugin());

const searchParams = {
  id: "8757849604759505625", // Parameter defines the ID of a product you want to get the results for
  hl: "en", // Parameter defines the language to use for the Google search
  gl: "us", // parameter defines the country to use for the Google search
};

const URL = `https://www.google.com/shopping/product/${searchParams.id}?hl=${searchParams.hl}&gl=${searchParams.gl}`;

async function getMainProductInfo(page) {
  return await page.evaluate(() => ({
    title: document.querySelector(".LDQll")?.textContent.trim(),
    prices: Array.from(document.querySelectorAll(".jMmCy .MLYgAb")).map((el) => el.querySelector(".g9WBQb")?.textContent.trim()),
    conditions: Array.from(document.querySelectorAll(".jMmCy .MLYgAb")).map((el) => el.querySelector(".Yy9sbf")?.textContent.trim() || "New"),
    typicalPrices: {
      low: document.querySelector(".tH3hpc .KaGvqb")?.textContent.trim(),
      high: document.querySelector(".tH3hpc .xyYTQb")?.textContent.trim(),
      shownPrice: document.querySelector(".tH3hpc .FYiaub")?.textContent.trim(),
    },
    reviews: parseInt(document.querySelector(".YVQvvd .HiT7Id > span")?.getAttribute("aria-label").replace(",", "")),
    rating: parseFloat(document.querySelector(".YVQvvd .UzThIf")?.getAttribute("aria-label")),
    extensions: Array.from(document.querySelectorAll(".Qo4JI .OA4wid")).map((el) => el.textContent.replaceAll("·", "").trim()),
    description: document.querySelector(".Zh8lCd")?.textContent.trim(),
    media: Array.from(document.querySelectorAll(".TiQ3Vc img")).map((el) => el.getAttribute("src")),
    highlights: Array.from(document.querySelectorAll(".xpDPYb .KgL16d")).map((el) => el.textContent.trim()),
  }));
}

async function getProductInfo() {
  const browser = await puppeteer.launch({
    headless: true, // if you want to see what the browser is doing, you need to change this option to "false"
    args: ["--no-sandbox", "--disable-setuid-sandbox"],
  });

  const page = await browser.newPage();

  await page.setDefaultNavigationTimeout(60000);
  await page.goto(URL);

  await page.waitForSelector(".Zh8lCd");

  const product = { productId: searchParams.id, ...(await getMainProductInfo(page)) };

  await browser.close();

  return product;
}

getProductInfo().then((result) => console.dir(result, { depth: null }));

Preparation

First, we need to create a Node.js* project and add npm packages puppeteer, puppeteer-extra and puppeteer-extra-plugin-stealth to control Chromium (or Chrome, or Firefox, but now we work only with Chromium which is used by default) over the DevTools Protocol in headless or non-headless mode.

To do this, in the directory with our project, open the command line and enter:

$ npm init -y

And then:

$ npm i puppeteer puppeteer-extra puppeteer-extra-plugin-stealth

*If you don't have Node.js installed, you can download it from nodejs.org and follow the installation documentation.

📌Note: also, you can use puppeteer without any extensions, but I strongly recommended use it with puppeteer-extra with puppeteer-extra-plugin-stealth to prevent website detection that you are using headless Chromium or that you are using web driver. You can check it on Chrome headless tests website. The screenshot below shows you a difference.

stealth

Process

We need to extract data from HTML elements. The process of getting the right CSS selectors is fairly easy via SelectorGadget Chrome extension which able us to grab CSS selectors by clicking on the desired element in the browser. However, it is not always working perfectly, especially when the website is heavily used by JavaScript.

We have a dedicated Web Scraping with CSS Selectors blog post at SerpApi if you want to know a little bit more about them.

The Gif below illustrates the approach of selecting different parts of the results using SelectorGadget.

how

Code explanation

Declare puppeteer to control Chromium browser from puppeteer-extra library and StealthPlugin to prevent website detection that you are using web driver from puppeteer-extra-plugin-stealth library:

const puppeteer = require("puppeteer-extra");
const StealthPlugin = require("puppeteer-extra-plugin-stealth");

Next, we "say" to puppeteer use StealthPlugin, write the necessary request parameters and search URL:

puppeteer.use(StealthPlugin());

const searchParams = {
  id: "8757849604759505625", // Parameter defines the ID of a product you want to get the results for
  hl: "en", // Parameter defines the language to use for the Google search
  gl: "us", // parameter defines the country to use for the Google search
};

const URL = 
    `https://www.google.com/shopping/product/${searchParams.id}?hl=${searchParams.hl}&gl=${searchParams.gl}`;

Next, we write a function to get product info from the page:

async function getMainProductInfo(page) {
  ...
}

In this function, we get information from the page context (using evaluate() method) and save it in the returned object:

return await page.evaluate(() => ({
    ...
}));

Next, we need to get the different parts of the page using next methods:

    title: document.querySelector(".LDQll")?.textContent.trim(),
    prices: Array.from(document.querySelectorAll(".jMmCy .MLYgAb"))
        .map((el) => el.querySelector(".g9WBQb")?.textContent.trim()),
    conditions: Array.from(document.querySelectorAll(".jMmCy .MLYgAb"))
        .map((el) => el.querySelector(".Yy9sbf")?.textContent.trim() || "New"),
    typicalPrices: {
      low: document.querySelector(".tH3hpc .KaGvqb")?.textContent.trim(),
      high: document.querySelector(".tH3hpc .xyYTQb")?.textContent.trim(),
      shownPrice: document.querySelector(".tH3hpc .FYiaub")?.textContent.trim(),
    },
    reviews: parseInt(document.querySelector(".YVQvvd .HiT7Id > span")
        ?.getAttribute("aria-label").replace(",", "")),
    rating: parseFloat(document.querySelector(".YVQvvd .UzThIf")
        ?.getAttribute("aria-label")),
    extensions: Array.from(document.querySelectorAll(".Qo4JI .OA4wid"))
        .map((el) => el.textContent.replaceAll("·", "").trim()),
    description: document.querySelector(".Zh8lCd")?.textContent.trim(),
    media: Array.from(document.querySelectorAll(".TiQ3Vc img"))
        .map((el) => el.getAttribute("src")),
    highlights: Array.from(document.querySelectorAll(".xpDPYb .KgL16d"))
        .map((el) => el.textContent.trim()),

Next, write a function to control the browser, and get information:

async function getProductInfo() {
  ...
}

In this function first we need to define browser using puppeteer.launch({options}) method with current options, such as headless: true and args: ["--no-sandbox", "--disable-setuid-sandbox"].

These options mean that we use headless mode and array with arguments which we use to allow the launch of the browser process in the online IDE. And then we open a new page:

const browser = await puppeteer.launch({
  headless: true, // if you want to see what the browser is doing, you need to change this option to "false"
  args: ["--no-sandbox", "--disable-setuid-sandbox"],
});

const page = await browser.newPage();

Next, we change default (30 sec) time for waiting for selectors to 60000 ms (1 min) for slow internet connection with .setDefaultNavigationTimeout() method, go to URL with .goto() method and use .waitForSelector() method to wait until the selector is load:

await page.setDefaultNavigationTimeout(60000);
await page.goto(URL);
await page.waitForSelector(".Zh8lCd");

And finally, we save product data from the page in the product constant (using spread syntax), close the browser, and return the received data:

const product = { 
    productId: searchParams.id,
    ...(await getMainProductInfo(page))
   };

await browser.close();

return product;

Now we can launch our parser:

$ node YOUR_FILE_NAME # YOUR_FILE_NAME is the name of your .js file

Output

{
  "productId": "8757849604759505625",
  "title": "Apple iPhone 14 Pro Max - 128 GB - Space Black - Unlocked",
  "prices": ["$1,099.00", "$0.00 now", "$1,300.00"],
  "conditions": ["New", "New", "New"],
  "typicalPrices": {
    "low": "$1,099.00",
    "high": "$1,798.00",
    "shownPrice": "$1,099.00 at Apple"
  },
  "reviews": 748,
  "rating": 4.5,
  "extensions": [
    "Smartphone",
    "Single SIM",
    "iOS",
    "5G",
    "With Wireless Charging",
    "With Fast Charging",
    "Dual Lens",
    "With OLED Display",
    "Unlocked",
    "2796 x 1290"
  ],
  "description": "WHATS IN THE BOX? Apple iPhone 14 Pro Max USB-C to Lightning Cable Documentation DESCRIPTION iPhone 14 Pro Max. Capture incredible detail with a 48MP Main camera. Experience iPhone in a whole new way with Dynamic Island and Always-On display. And get peace of mind with groundbreaking safety features.",
  "media": [
    "https://encrypted-tbn0.gstatic.com/shopping?q=tbn:ANd9GcRyN1txq1WYcq0kZCQxnW1zBucAmf2HZOD8jgs6Q4LzSYSvDEUTdir39U_GB5vDtR0veksMAfE6Z2nFBcdWIYQLaFG_973DcA&usqp=CAY",
    "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcS-YZR6Pztx9YucZIrT11l_PvbrHfsxipMnm-hxF3KXsFAuJ4hnQPgiVy-0Iwsn8mvsWjCHBi7jaHLKn4Tu8HK3VTlg6hMnQg&usqp=CAY",
    "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcStcLa8JOmqDOz-aonMmdAOJ-AgTxxNcx2wy5C2m3jPottvEIntgqimPtYHVehHxpgmMzKfV5BaBcRDFfyjzZH9opf0pWSRoA&usqp=CAY",
    "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcSar8v-RTvGyepxAZ59sluyaTDCavMNZb-0d2886BIxD_IEWFoWEIgUpjkrnIN3wDSh6Q8oRqWyVzAvnK1aoIjXDb_tUoR-LQ&usqp=CAY",
    "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcTFedaFl4yMCSIHUOJcgGLi_RtNppaAJ5Sg_hCu1XdGLiue6c3KcBgtxSRnBKUAjLWiakKwR5_lSSm-tOkj3nL1KO-FXA_D_w&usqp=CAY",
    "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcQuzxYZTeYcxjeWH18H7Mn-FXXdkX9wWl1aZCEl3N0N78y-jUkn8emUQtt-YkYFo7p-nT6wrwCJHLFoKJGfHzRhs33oiAIi&usqp=CAY",
    "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcRQFsalbyUqrrM7EZwXwSzv1knHGaOYEOSxDRn5Lv5rg9aqbSTjP4DWW3qUjQPFnrJpXZV6mX5dQLtj6HGuLVB9FcRwtKTOxQ&usqp=CAY"
  ],
  "highlights": ["Apple iPhone 14 Pro Max.", "USB-C to Lightning Cable.", "Documentation.", "DESCRIPTION."]
}

Using Google Product Page API from SerpApi

This section is to show the comparison between the DIY solution and our solution.

The biggest difference is that you don't need to create the parser from scratch and maintain it.

There's also a chance that the request might be blocked at some point from Google, we handle it on our backend so there's no need to figure out how to do it yourself or figure out which CAPTCHA, proxy provider to use.

First, we need to install google-search-results-nodejs:

npm i google-search-results-nodejs

Here's the full code example, if you don't need an explanation:

const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch(process.env.API_KEY); //your API key from serpapi.com

const params = {
  product_id: "8757849604759505625", // Parameter defines the ID of a product you want to get the results for.
  engine: "google_product", // search engine
  device: "desktop", //Parameter defines the device to use to get the results. It can be set to "desktop" (default), "tablet", or "mobile"
  hl: "en", // parameter defines the language to use for the Google search
  gl: "us", // parameter defines the country to use for the Google search
};

const getJson = () => {
  return new Promise((resolve) => {
    search.json(params, resolve);
  });
};

const getResults = async () => {
  const json = await getJson();
  const product = json.product_results;
  return product;
};

getResults().then((result) => console.dir(result, { depth: null }));

Code explanation

First, we need to declare SerpApi from google-search-results-nodejs library and define new search instance with your API key from SerpApi:

const SerpApi = require("google-search-results-nodejs");
const search = new SerpApi.GoogleSearch(API_KEY);

Next, we write the necessary parameters for making a request:

const params = {
  product_id: "8757849604759505625", // Parameter defines the ID of a product you want to get the results for.
  engine: "google_product", // search engine
  device: "desktop", //Parameter defines the device to use to get the results. It can be set to "desktop" (default), "tablet", or "mobile"
  hl: "en", // parameter defines the language to use for the Google search
  gl: "us", // parameter defines the country to use for the Google search
};

Next, we wrap the search method from the SerpApi library in a promise to further work with the search results:

const getJson = () => {
  return new Promise((resolve) => {
    search.json(params, resolve);
  });
};

And finally, we declare the function getResult that gets data from the page and return it:

const getResults = async () => {
  ...
};

In this function we get json with results, add product_results data to the new product constant and return it:

const json = await getJson();
const product = json.product_results;
return product;

After, we run the getResults function and print all the received information in the console with the console.dir method, which allows you to use an object with the necessary parameters to change default output options:

getResults().then((result) => console.dir(result, { depth: null }));

Output

{
  "product_id": 8757849604759506000,
  "title": "Apple iPhone 14 Pro Max - 128 GB - Space Black - Unlocked",
  "prices": ["$1,099.00", "$0.00 now", "$1,300.00"],
  "conditions": ["New", "New", "New"],
  "typical_prices": {
    "low": "$1,099.00",
    "high": "$1,798.00",
    "shown_price": "$1,099.00 at Apple"
  },
  "reviews": 748,
  "rating": 4.3,
  "extensions": [
    "Smartphone",
    "Single SIM",
    "iOS",
    "5G",
    "With Wireless Charging",
    "With Fast Charging",
    "Dual Lens",
    "With OLED Display",
    "Unlocked",
    "2796 x 1290"
  ],
  "description": "WHATS IN THE BOX? Apple iPhone 14 Pro Max USB-C to Lightning Cable Documentation DESCRIPTION iPhone 14 Pro Max. Capture incredible detail with a 48MP Main camera. Experience iPhone in a whole new way with Dynamic Island and Always-On display. And get peace of mind with groundbreaking safety features.",
  "media": [
    {
      "type": "image",
      "link": "https://encrypted-tbn0.gstatic.com/shopping?q=tbn:ANd9GcRyN1txq1WYcq0kZCQxnW1zBucAmf2HZOD8jgs6Q4LzSYSvDEUTdir39U_GB5vDtR0veksMAfE6Z2nFBcdWIYQLaFG_973DcA&usqp=CAY"
    },
    {
      "type": "image",
      "link": "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcS-YZR6Pztx9YucZIrT11l_PvbrHfsxipMnm-hxF3KXsFAuJ4hnQPgiVy-0Iwsn8mvsWjCHBi7jaHLKn4Tu8HK3VTlg6hMnQg&usqp=CAY"
    },
    {
      "type": "image",
      "link": "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcStcLa8JOmqDOz-aonMmdAOJ-AgTxxNcx2wy5C2m3jPottvEIntgqimPtYHVehHxpgmMzKfV5BaBcRDFfyjzZH9opf0pWSRoA&usqp=CAY"
    },
    {
      "type": "image",
      "link": "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcSar8v-RTvGyepxAZ59sluyaTDCavMNZb-0d2886BIxD_IEWFoWEIgUpjkrnIN3wDSh6Q8oRqWyVzAvnK1aoIjXDb_tUoR-LQ&usqp=CAY"
    },
    {
      "type": "image",
      "link": "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcTFedaFl4yMCSIHUOJcgGLi_RtNppaAJ5Sg_hCu1XdGLiue6c3KcBgtxSRnBKUAjLWiakKwR5_lSSm-tOkj3nL1KO-FXA_D_w&usqp=CAY"
    },
    {
      "type": "image",
      "link": "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcQuzxYZTeYcxjeWH18H7Mn-FXXdkX9wWl1aZCEl3N0N78y-jUkn8emUQtt-YkYFo7p-nT6wrwCJHLFoKJGfHzRhs33oiAIi&usqp=CAY"
    },
    {
      "type": "image",
      "link": "https://encrypted-tbn2.gstatic.com/shopping?q=tbn:ANd9GcRQFsalbyUqrrM7EZwXwSzv1knHGaOYEOSxDRn5Lv5rg9aqbSTjP4DWW3qUjQPFnrJpXZV6mX5dQLtj6HGuLVB9FcRwtKTOxQ&usqp=CAY"
    }
  ],
  "highlights": ["Apple iPhone 14 Pro Max.", "USB-C to Lightning Cable.", "Documentation.", "DESCRIPTION."]
}

If you want other functionality added to this blog post (e.g. extracting additional categories) or if you want to see some projects made with SerpApi, write me a message.


Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞