How to scrape Google search results asynchronously with Node.js
Google search results are precious for many different folks. Using this information has many applications like SEO, data analysis, content creation, staying up-to-date with the latest news and trends, checking out the competition, managing your online reputation, or even making voice-activated devices smarter.
In this post, we will talk about scraping Google search results asynchronously using Node.js. Whether you're trying to improve your website's search ranking, train AI models, or analyze data trends, this guide is here to help.
Using SerpApi's Google Search API
Say goodbye to proxy management, captchas, and user agents, and say hello to a simplified and robust solution for your web scraping needs.
Benefits
Building and maintaining your own web scraper can be a complex and time-consuming endeavor. You must consider various factors such as handling proxies, solving captchas, managing user agents, and implementing delays to prevent getting blocked by Google. With SerpApi, all of these complexities are abstracted away. You can avoid the headache of developing and maintaining a scraper and instead concentrate on your core application logic.
SerpApi provides a Ready-to-Use Node.js library specifically tailored for effortless integration into your applications. This library is designed to work seamlessly with Node.js, allowing developers to harness the power of SerpApi with minimal setup and configuration. You can get up and running quickly, saving precious development time.
When you use SerpApi, you don't just receive raw HTML pages that you need to parse and structure. Instead, it directly delivers structured JSON data, making it easy to access and utilize the search results in your application. This streamlined approach enables you to focus on using the data rather than wasting time retrieving and formatting it.
Setup
If you haven’t already signed up for a free SerpApi account, go ahead and do that here. Once you complete the process, you can use the following link to retrieve your API key from your account's Dashboard.
You're now ready to install our Node.js package and start using it:
npm install serpapi
const { getJson } = require("serpapi");
getJson({
engine: "google",
api_key: API_KEY, // Get your API_KEY from https://serpapi.com/manage-api-key
q: "coffee",
location: "Austin, Texas",
}, (json) => {
console.log(json["organic_results"]);
});
What we'll be scraping
We'll be scraping the organic results returned from Google for 200 topics. We'll collect the position, title, source, and actual link for each result. Please check our official Google Search API documentation for a detailed list of the available data returned in its response.
async parameter
We provide an async
parameter that can be used with all of our APIs, including Google Search API. This parameter allows you to send a request to our API and not wait for a response to be returned. Your requests will be processed in parallel on our backend, and you can retrieve the data later.
This approach allows you to first send all of your requests, and then retrieve the data using the getJsonBySearchId()
method. This method is a wrapper around our Search Archive API.
Code
Alright, let's write some code!
Below is the complete code snippet for scraping Google Search results asynchronously. If you don't need any additional explanations about how it works, feel free to grab it and modify it based on your needs.
// Import necessary methods from the serpapi library
const { getJson, getJsonBySearchId } = require("serpapi");
// Your SerpApi API key obtained from https://serpapi.com/manage-api-key
const API_KEY = "YOUR_ACTUAL_API_KEY"
// Summarized list of search topics for readability
topics = [
"Artificial intelligence", "Climate change", "Space exploration", "Healthy recipes", "Virtual reality", "Cryptocurrency", "Photography tips", "Indoor plants care", "Mindfulness meditation", "Travel destinations", "DIY home decor", "Financial planning", "Time management techniques", "Historical events", "Fitness workouts", "Mobile photography"
]
// Initialize an array to store the search requests
const searchQueue = [];
// Function to execute requests to the Google Search API asynchronously
async function getResults(topic) {
try {
// Make a request to the Google Search API and push the response to the searchQueue
const response = await getJson({
engine: "google",
q: topic,
location: "Denver, Colorado, United States",
google_domain: "google.com",
hl: "en",
gl: "us",
async: true,
api_key: API_KEY
});
searchQueue.push(response)
} catch (error) {
console.error('Error fetching data:', error);
throw error;
}
}
// Initialize an object to store the final results
const data = {}
// An array to store promises returned by getResults() for each topic
const searchPromises = [];
// Iterate through each topic and initiate the getResults() function for each
topics.forEach(topic => {
// Create an empty array in the data object for each topic
data[topic] = []
// Get the promise returned by getResults() and push it to the searchPromises array
const promise = getResults(topic);
searchPromises.push(promise);
});
// Function to process organic results from a search query
async function processOrganicResults(searchQuery, searchData) {
searchData.organic_results.forEach(result => {
data[searchQuery].push({
position: result.position,
title: result.title,
source: result.source,
link: result.link
});
});
}
// Function to process the searchQueue and retrieve detailed data for each search
async function processSearchQueue() {
// Wait for all promises in the searchPromises array to be resolved
await Promise.all(searchPromises);
// Process each search item in the searchQueue
while (searchQueue.length > 0) {
const searchItem = searchQueue.shift();
try {
// Retrieve detailed data for a search using its ID
const searchItemData = await getJsonBySearchId(searchItem.search_metadata.id, { api_key: API_KEY });
const searchId = searchItemData.search_metadata.id;
const searchStatus = searchItemData.search_metadata.status;
const searchQuery = searchItemData.search_parameters.q;
// Handle different search statuses
if (searchStatus === "Error") {
console.log("#ERROR", searchItemData);
} else if (searchStatus === "Processing") {
// Requeue the search if it's still processing
searchQueue.push(searchItemData);
console.log(`Requeued Search with ID: ${searchId}`);
} else {
// Process the organic results for a successful search
processOrganicResults(searchQuery, searchItemData);
}
} catch (error) {
console.error('Error fetching data:', error);
throw error;
}
}
// Log the final data in a readable JSON format
console.log(JSON.stringify(data, null, 2));
}
// Execute the processSearchQueue() function to initiate the processing of the queue
processSearchQueue()
Code breakdown
First we need to import the getJson
and getJsonBySearchId
methods from the serpapi
library.
// Import necessary methods from the serpapi library
const { getJson, getJsonBySearchId } = require("serpapi");
// Your SerpApi API key obtained from https://serpapi.com/manage-api-key
const API_KEY = "YOUR_ACTUAL_API_KEY"
// Summarized list of search topics for readability
topics = [
"Artificial intelligence", "Climate change", "Space exploration", "Healthy recipes", "Virtual reality", "Cryptocurrency", "Photography tips", "Indoor plants care", "Mindfulness meditation", "Travel destinations", "DIY home decor", "Financial planning", "Time management techniques", "Historical events", "Fitness workouts", "Mobile photography"
]
We then create a getResults()
function that will execute the requests to the Google Search API using the async
parameter. It accepts a topic as a parameter, which will be used as the search query.
We also initialize a searchQueue
array, in which we'll push all of our queued requests. Later on, we'll use the information from this array to retrieve the actual data returned from our requests.
// Initialize an array to store the search requests
const searchQueue = [];
// Function to execute requests to the Google Search API asynchronously
async function getResults(topic) {
try {
// Make a request to the Google Search API and push the response to the searchQueue
const response = await getJson({
engine: "google",
q: topic,
location: "Denver, Colorado, United States",
google_domain: "google.com",
hl: "en",
gl: "us",
async: true,
api_key: API_KEY
});
searchQueue.push(response)
} catch (error) {
console.error('Error fetching data:', error);
throw error;
}
}
We continue with defining a data
object, which will hold the actual data we get from the results.
It's important to note that the getResults()
function returns a promise. We create a searchPromises
array that will hold all of the promises returned from getResults()
. We need to resolve all of those promises before we start processing the queued requests in the searchQueue
.
We also define a processOrganicResults()
function, which will be used for processing the organic results from the response once the requests in the queue are processed.
// Initialize an object to store the final results
const data = {}
// An array to store promises returned by getResults() for each topic
const searchPromises = [];
// Iterate through each topic and initiate the getResults() function for each
topics.forEach(topic => {
// Create an empty array in the data object for each topic
data[topic] = []
// Get the promise returned by getResults() and push it to the searchPromises array
const promise = getResults(topic);
searchPromises.push(promise);
});
// Function to process organic results from a search query
async function processOrganicResults(searchQuery, searchData) {
searchData.organic_results.forEach(result => {
data[searchQuery].push({
position: result.position,
title: result.title,
source: result.source,
link: result.link
});
});
}
Now, we're ready to start processing the queued requests. The processSearchQueue()
function will do this job for us.
We start by awaiting for all of the Promises from the searchPromises
array to be resolved. Then, we start processing each item in the searchQueue
by shifting it from the array and retrieving its data with getJsonBySearchId
.
If the request has been processed successfully (it has a status === "Success"
instead of Processing
or Error
), we execute the processOrganicResults()
with it to retrieve the actual data.
We then log the serialized JSON data.
// Function to process the searchQueue and retrieve detailed data for each search
async function processSearchQueue() {
// Wait for all promises in the searchPromises array to be resolved
await Promise.all(searchPromises);
// Process each search item in the searchQueue
while (searchQueue.length > 0) {
const searchItem = searchQueue.shift();
try {
// Retrieve detailed data for a search using its ID
const searchItemData = await getJsonBySearchId(searchItem.search_metadata.id, { api_key: API_KEY });
const searchId = searchItemData.search_metadata.id;
const searchStatus = searchItemData.search_metadata.status;
const searchQuery = searchItemData.search_parameters.q;
// Handle different search statuses
if (searchStatus === "Error") {
console.log("#ERROR", searchItemData);
} else if (searchStatus === "Processing") {
// Requeue the search if it's still processing
searchQueue.push(searchItemData);
console.log(`Requeued Search with ID: ${searchId}`);
} else {
// Process the organic results for a successful search
processOrganicResults(searchQuery, searchItemData);
}
} catch (error) {
console.error('Error fetching data:', error);
throw error;
}
}
// Log the final data in a readable JSON format
console.log(JSON.stringify(data, null, 2));
}
Finally, we execute processSearchQueue()
to get everything going.
// Execute the processSearchQueue() function to initiate the processing of the queue
processSearchQueue()
Conclusion
We have seen how to scrape organic results from the Google Search Engine using the Google Search API. You can modify this template in any way you need to based on your use case.
I hope this tutorial was helpful and easy to follow. If you have any questions, feel free to contact me at martin@serpapi.com.