Scrape Google Search in NodeJS
You can use Google Search results to track website SEO rank, gather information, automation and etc. Regardless of the task, this post will guide you through the basics of setting up a Google Search scraper using Javascript.
We will focus only on the "People also ask" section of the result. This same principle applies to other sections.
"People also ask" shows the top questions people are asking, they can be good sources for article ideas.
1. Using Axios + Cheerio
We will use axios to download the HTML document and pass that data into cheerio. Cheerio gives us the ability to traverse through the HTML document. With that, we can use it to find the element that contains the information we want.
If you are looking for a quick solution, here is the full working version in replit. Otherwise, you can follow through for the step by step guide.
First of all, let's setup the basic building block, which we use axios to download the HTML result page from Google.
import axios from "axios"
import cheerio from "cheerio"
import fs from "fs"
const getUserAgent = () => {
const agents = [
"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36",
. . .
]
const index = Math.floor(Math.random() * agents.length);
return agents[index]
}
const searchTerm = 'history about cheese';
axios.get(`https://www.google.com/search?q=${searchTerm}`, {
headers: {
"User-Agent": getUserAgent()
}
})
.then((response) => {
const $ = cheerio.load(response.data);
})
.catch((error) => {
console.log(error);
});
User-Agent
instructs Google to give us the desktop version of the page, likewise using mobile user agent will give you the mobile page.
The next step is to find the identifier (class, id, other attributes in the tag), in this case we will target related-question-pair
class. It is better to use class that has relevant name like related-question-pair
because it is less often for Google to change it. Otherwise, it is fine to use whichever class as long as it doesn't conflict with other sections.
...
const $ = cheerio.load(response.data);
const questionElements = $('.related-question-pair');
...
questionElements
now contains an array of questions. We have to further parse it to get the title that we are looking for.
We can see that the title
is inside a tag with few classes, we will use the first class iDjcJe
and with that, we are able to retrieve the title using below code.
...
const questionElements = $('.related-question-pair');
const questions = []
questionElements.each((i, element) => {
questions.push($(element).find('.iDjcJe').text())
});
console.log(questions)
...
Here is the output:
[
'Who first invented cheese?',
'What was cheese originally invented for?',
'When was the first cheese invented?'
]
Well, that's how you can scrape the Google Search result. You can apply the same steps to scrape other sections that you need.
Try a fun exploration: Wrap the function in a loop and replace each search term with questions gathered from previous results for a chain of questions 😊.
But we have to be aware Google has some security measurement in place to keep us from requesting a lot of results in a short time frame. One way to bypass is to rotate the IP address with proxy, but it is not something easy to setup.
That's all you need to get started, if you would like to go in depth on web scraping in Javascript, we have a comprehensive tutorial for you - Web Scraping with Javascript and Nodejs.
2. Using SerpApi
SerpApi makes the whole process much easier and with a handful of code, you can get "People also ask" data and other extra data.
Full working version in replit (you have to supply the API key, which you can get it for free upon register) or you can head to Playground for a live and interactive demo.
import { getJson } from "serpapi";
const searchTerm = 'history about cheese';
const response = await getJson("google", {
api_key: process.env['API_KEY'], // Get your API_KEY from https://serpapi.com/manage-api-key
q: searchTerm,
});
console.log(response.related_questions);
Output:
[
{
question: 'Who first invented cheese?',
snippet: "No one really knows who made the first cheese. According to an ancient legend, it was made accidentally by an Arabian merchant who put his supply of milk into a pouch made from a sheep's stomach, as he set out on a day's journey across the desert.",
title: 'History of Cheese - IDFA - International Dairy Foods Association',
link: 'https://www.idfa.org/history-of-cheese',
displayed_link: 'https://www.idfa.org › history-of-cheese',
next_page_token: '...',
serpapi_link: '...'
},
{
question: 'What was cheese originally invented for?',
snippet: 'The production of cheese predates recorded history and was most likely discovered by accident during the transport of fresh milk in the organs of ruminants such as sheep, goats, cows, and buffalo. In the millennia before refrigeration, cheese became a way to preserve milk.',
title: 'The History of Cheese - The Spruce Eats',
date: 'Aug 9, 2019',
link: 'https://www.thespruceeats.com/the-history-of-cheese-1328765',
displayed_link: 'https://www.thespruceeats.com › ... › Food History',
thumbnail: 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTUyzvORJtS6qmfgRlKeQNYntVPsu2Qd1sv273o2rBoisVl9UVRdQQ7N9Ey5A&s',
next_page_token: '...',
serpapi_link: '...'
},
{
question: 'When was the first cheese invented?',
snippet: "Historians haven't nailed down an exact date when cheese was invented, but jars from the First Dynasty of Egypt were found to contain cheese dating back to 3000 BCE, and Egyptian tomb murals from 2000 BCE depict cheese manufacturing.",
title: 'When Was Cheese Invented, Where & By Who? - HelloFresh',
link: 'https://www.hellofresh.com/eat/history-of-food/the-invention-of-cheese',
displayed_link: 'https://www.hellofresh.com › eat › the-invention-of-cheese',
thumbnail: 'https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTJ1p1auYzNJm6nQ2aKoqsuEW117jYSOVdyAroZbx3e&s',
next_page_token: '...',
serpapi_link: '...'
}
]
Along with the question, SerpApi returns more data like title
, snippet
, link
and etc that can be seen after expanded.
If you have any questions, please feel free to reach out to me.
Add a Feature Request💫 or a Bug🐞