Introduction to scraping Google News using SerpApi and Node.js
Introduction
Saying that news.google.com is possibly the best place to look for news articles will probably not be an overstatement. The pool of data they collect from different sources is enormous. The platform lets you find news on topics ranging from business and technology to entertainment and sports.
Although Google's news results are available under the "News" tab from a regular search and can be used to collect news data effectively, Google's News platform - news.google.com - is specifically designed with aggregating news articles from different sources in mind. This brings some advantages, such as categorizing and grouping the results under various topics, sections, and stories, allowing access to news results from a specific publisher, and more. In this blog post, we'll explore how to retrieve this data using SerpApi's Google News API.
Why Scrape News Results in the First Place?
Scraping data from news.google.com can significantly benefit an organization's growth regardless of its industry. Let's take companies in the travel or hospitality industries, for example. Collecting information regarding travel restrictions, safety measures, and tourism trends can help them predict changes in occupancy rates and plan their actions accordingly. They can also leverage this information to adapt their marketing strategies, making them more effective and attracting new clients to their business.
On the other hand, an investment company can use financial news to gather data about market developments, regulatory changes, and economic forecasts. Utilized correctly, this data can help them manage risk more effectively and thus provide more accurate advice to their clients. This can enhance their portfolio performance and client satisfaction.
Another thing to consider is the rise of AI technologies and their role in an organization's growth. Utilizing AI technologies to process large amounts of data opens many possibilities for gaining a competitive advantage in the market. It can enhance a company's marketing strategy by detecting patterns and predicting future trends. Natural language processing (NLP) can be used to extract critical information from the tons of news article data you feed to it. An AI-driven alert system can monitor for negative coverage and send appropriate notifications so that prevention measures can be taken in a timely manner. The possibilities and use cases of utilizing this data in combination with AI-enhanced systems go on and on.
Setup
First, you need an API key to use the Google News API. Obtaining one is as easy as registering a free account using the link below:
https://serpapi.com/users/sign_up
Then, head to your account's dashboard and copy it from there:
https://serpapi.com/manage-api-key
Now, you're ready to use any of your favorite npm
HTTP libraries to execute the requests. However, SerpApi also provides a handy npm
package for this - serpapi
. In the examples below, we'll use it instead, as it allows for a more straightforward and cleaner way to send requests to the Google News API. You can install this package using the below command:
Detailed information about the package is available in our official documentation:
https://serpapi.com/integrations/javascript
And this is it! We can now proceed with scraping actual Google News results.
Executing a request
Glancing at the Google News API documentation, we see that only two parameters are required to execute a request to it - api_key
and engine
. Let's throw in a basic search query to it - q
, and define the country to use for the Google News search using the gl
parameter as well:
const { getJson } = require("serpapi");
getJson({
api_key: "YOUR_API_KEY",
engine: "google_news",
q: "coffee",
gl: "us"
}, (json) => {
console.log(json);
});
With this little snippet of code, you already have access to data in a structured JSON format retrieved directly from news.google.com. That's how easy it is!
Response breakdown
The response from the Google News API contains two essential sections: news_results
and menu_links
. The first one contains the actual news article results, while the second one provides broader topics that you can use to filter the results. Let’s take a closer look at each one.
news_results
The bread and butter of the Google News API, news_results
, contains the actual data for the news articles. Each entry in this list contains information such as its position in the results, title, source, link to the actual article, thumbnail image, and publication date. This section is crucial for data extraction, as it provides content that can be further analyzed for insights.
Grouped results
For some search queries, Google News can group the first few results by a topic, where the title of the grouped articles block represents the actual topic. The individual articles within this group are listed in the stories array.
Let's search for basketball-related news using q="NBA Playoffs"
:
const { getJson } = require("serpapi");
getJson({
api_key: "YOUR_API_KEY",
engine: "google_news",
gl: "us",
q: "NBA Playoffs"
}, (json) => {
console.log(json);
});
We see that Google News has grouped some results related to a recent Los Angeles Clippers vs. Dallas Mavericks game. This is reflected in the response from the Google News API:
Source
One of the most essential traits of a news piece is its credibility. Analyzing its source can reveal potential biases in the article. The name of the source is available in the name
field. When the actual authors of the article are available, you can retrieve them from the authors
list inside the source
object.
Maximum number of results
Google News returns up to 100 news results for a search query. This cap helps to focus on the most relevant article for the particular search query and dig down from there. All these results are available with a single request to our Google News API.
menu_links
Google News shines in its organization of articles by topic, section, and story. This lets you quickly find relevant articles aggregated together, removing the burden of implementing this on your end.
The broader topics, such as Business, Technology, Entertainment, etc., that you can filter your results by are available in the menu_links
section in the response. This section corresponds to the main navigation menu present in the actual Google News engine:
Each entry in the menu_links
list contains title
, topic_token
, and serpapi_link
fields. You can use these topic_token
s to execute a separate request to the Google News API to further expand on these topics.
const { getJson } = require("serpapi");
getJson({
api_key: "YOUR_API_KEY",
engine: "google_news",
gl: "us",
topic_token: "CAAqJggKIiBDQkFTRWdvSUwyMHZNRGRqTVhZU0FtVnVHZ0pWVXlnQVAB" // Technology topic
}, (json) => {
console.log(json);
});
Wrapping Up
With this, we covered the basics of utilizing our Google News API to dive into the large pool of news data the news.google.com platform provides. Whether you're interested in analyzing news articles to boost the performance of your marketing campaigns or collecting data to train your AI models, our API provides the means for retrieving this data with just a few lines of code.
Don't hesitate to contact our team at contact@serpapi.com if you have any questions!