Introduction to scraping Google Flights using Node.js and SerpApi
Use Cases and Applications
The data obtained from scraping Google Flights results holds immense importance due to its potential applications in revolutionizing the travel industry and enhancing the travel planning experience for both consumers and businesses. By harnessing this data, various use cases emerge that contribute to more informed decision-making, personalized travel experiences, and operational efficiency.
For travel agencies and businesses in the travel sector, the scraped Google Flights data serves as a powerful tool for market analysis and competitive intelligence. Access to up-to-date information on pricing strategies, route popularity, and emerging travel trends enables businesses to adapt and optimize their offerings. This data empowers travel agencies to stay competitive, adjust pricing strategies in response to market dynamics, and tailor their services to meet the evolving demands of their clientele.
Moreover, the scraped data facilitates the development of innovative travel-related applications and services. From building advanced flight comparison platforms to creating intelligent travel assistants that consider various factors like weather, events, and user preferences, developers can leverage Google Flights data to craft solutions that streamline and enrich the entire travel journey. In essence, the scraped Google Flights data is a gateway to a more dynamic, responsive, and personalized travel ecosystem, benefiting both travelers and the travel industry at large.
Getting Started: Setting Up the Environment
If you haven't already signed up for a free SerpApi account, go ahead and do that here. Once you complete the process, you can retrieve your API key from your account's Dashboard.
You're now ready to install our Node.js package and start using it:
npm install serpapi
const { getJson } = require("serpapi");
const util = require('util');
const API_KEY = "ACTUAL_API_KEY"
We use util
to pretty-print the output of our queries. The require
statements are omitted in the code examples in the following sections of this article. Please make sure to include them in your scripts when executing the code locally.
Understanding Google Flights Data Structure
Understanding the Google Flights data structure is crucial for extracting valuable information about available flights, prices, and additional features.
Flights Results
The primary structure includes two key sections: best_flights
and other_flights.
The best_flights
section provides detailed information about top-flight options meeting the specified criteria. Each entry within best_flights
contains a list of individual flights, including departure and arrival details, duration, airplane model, airline information, travel class, flight number, and other relevant features. Layover information, such as the duration and whether it is overnight, is also included in this section. The total duration of the entire journey, carbon emissions data, and the ticket price in the selected currency further enhance the comprehensiveness of each best flight entry.
Additionally, the extensions
field provides an array of flight features, and the ticket_also_sold_by
field lists other sellers offering the same ticket. Notably, the departure_token
serves as a token for retrieving returning flights when the flight type is specified as 'Round trip.' The other_flights
section mirrors the structure of best_flights
and encompasses the remaining flight options when the results are not separated.
Detailed information about the data structure in the response and examples are available in the documentation.
Price Insights
The price_insights
field within the Google Flights API response offers valuable insights into the pricing dynamics of the returned flights. The lowest_price
attribute represents the minimum ticket cost among the available flight options, providing a clear benchmark for cost-effective choices. Complementing this, the price_level
indicates the pricing tier associated with the lowest price, helping to understand the affordability category of their selected flights.
The typical_price_range
field, represented as a two-integer array, specifies the expected range of typical prices for the queried flight, offering a context for assessing the current pricing. Furthermore, the price_history
attribute, structured as a two-integer array with timestamps and corresponding prices, enables visualization of historical pricing trends, which is important for making informed decisions based on past patterns.
You can check the documentation for more information and examples.
Basic Google Flights Scraping
Alright, let's get some data flowing!
Search by airport code
Let's perform a basic search for a one-way flight between 2 airports. You can find the airport codes on the IATA's website or on Google Flights.
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "LHR",
arrival_id: "NRT",
hl: "en",
currency: "USD",
outbound_date: "2024-01-23",
type: "2"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
Search by city ID
Another option is to search by using a city ID instead of specific airport codes. This way, we'll get results from any available airport in the particular city. The city ID can be retrieved using our Google Maps API.
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "/m/04jpl",
arrival_id: "/m/07dfk",
hl: "en",
currency: "USD",
outbound_date: "2024-01-23",
type: "2"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
Round-Trip vs. One-Way Search
In our previous examples, we used one-way flight searches. Let's see how we can search for a round-trip flight instead. We can do that by setting the type
parameter to 1 (this is its default value). We need to specify a return_date
in this case as well.
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "LHR",
arrival_id: "NRT",
hl: "en",
currency: "USD",
outbound_date: "2024-01-23",
return_date: "2024-01-28",
type: "1"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
Retrieving the returning flights
As you've probably noticed, we only got results for the outbound flights in the previous example, even though we specified a round-trip search. One important parameter in the response from our previous search is the departure_token
. We need to use it to retrieve the returning flights. Please note that each of the outbound flights has a separate departure_token
.
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "LHR",
arrival_id: "NRT",
hl: "en",
currency: "USD",
type: "1",
outbound_date: "2024-01-23",
return_date: "2024-01-28",
departure_token: "W1siTEhSIiwiMjAyNC0wMS0yMyIsIldBVyIsbnVsbCwiTE8iLCIyODAiXSxbIldBVyIsIjIwMjQtMDEtMjMiLCJOUlQiLG51bGwsIkxPIiwiNzkiXV0="
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
Travel Class and Flexible Dates
Some corporate clients may request specific travel classes and flexible dates for their travel. Here's how to achieve this.
Flexible Dates Search
Unfortunately, the actual Google Flights engine does not provide an option for a flexible dates search. However, we can easily implement that functionality on our side.
const { getJson } = require("serpapi");
const API_KEY = "ACTUAL_API_KEY"
async function searchFlexibleDates(origin, destination, flexibleDays) {
// Get the current date
const currentDate = new Date();
// Search for flights for the next N days (flexibleDays) starting from today
for (let i = 0; i < flexibleDays; i++) {
const startDate = new Date(2024, 0, 23);
startDate.setDate(startDate.getDate() + i);
const endDate = new Date(startDate);
endDate.setDate(endDate.getDate() + 7); // You can adjust the return date range
// Format dates in "YYYY-MM-DD" format
const formattedOutboundDate = startDate.toISOString().split('T')[0];
const formattedReturnDate = endDate.toISOString().split('T')[0];
results = await getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: origin,
arrival_id: destination,
hl: "en",
currency: "USD",
outbound_date: formattedOutboundDate,
return_date: formattedReturnDate,
type: "1"
});
console.log(`\nOutbound date: ${formattedOutboundDate} | Return date: ${formattedReturnDate}\n`);
console.log('Best Flights\n');
results.best_flights.forEach(item => {
item.flights.forEach(flight => {
console.log(
`Flight Number: ${flight.flight_number} | Departure: ${flight.departure_airport.id} (${flight.departure_airport.time}) -> Arrival: ${flight.arrival_airport.id} (${flight.arrival_airport.time}) | Airline: ${flight.airline} | Duration: ${flight.duration} minutes | Airplane: ${flight.airplane} | Total Duration: ${item.total_duration} minutes | Price: ${item.price} ${results.search_parameters.currency} | Type: ${item.type}`
);
});
});
console.log('\nOther Flights\n');
results.other_flights.forEach(item => {
item.flights.forEach(flight => {
console.log(
`Flight Number: ${flight.flight_number} | Departure: ${flight.departure_airport.id} (${flight.departure_airport.time}) -> Arrival: ${flight.arrival_airport.id} (${flight.arrival_airport.time}) | Airline: ${flight.airline} | Duration: ${flight.duration} minutes | Airplane: ${flight.airplane} | Total Duration: ${item.total_duration} minutes | Price: ${item.price} ${results.search_parameters.currency} | Type: ${item.type}`
);
});
});
}
}
// Example usage
const originCity = 'JFK'; // Replace with the actual city or airport code
const destinationCity = 'LAX'; // Replace with the actual city or airport code
const flexibleDays = 3; // Adjust the number of flexible days as needed
searchFlexibleDates(originCity, destinationCity, flexibleDays);
Remember that to get the returning flights, you need to use the departure_token
from the outbound flights and execute another request.
The code in the example above can be further modified to mimic more realistically an actual flexible days search functionality. However, this is out of the scope of this article. The example should be enough to get you started, though.
Filtering by Airlines
The exclude_airlines
parameter allows us to specify which airlines should be excluded from the results. Multiple values are supported by separating them with a comma.
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "SYD",
arrival_id: "LHR",
hl: "en",
currency: "USD",
type: "2",
outbound_date: "2024-01-23",
stops: "0",
exclude_airlines: "MH,EY"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
The example above filters out flights from Malaysia Airlines and Etihad Airways in the response.
As opposed to the exclude_airlines
parameter, include_airlines
is used to retrieve flights operated by specific airlines.
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "SYD",
arrival_id: "LHR",
hl: "en",
currency: "USD",
type: "2",
outbound_date: "2024-01-23",
include_airlines: "QF"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
The example above will only return flights operated by Qantas Airways.
One important thing to note is that the exclude_airlines
and include_airlines
parameters cannot be used together.
Cabin Class
We can use the travel_class
parameter to specify the flight class. Here are the potential values for this parameter:
1 - Economy (default)
2 - Premium economy
3 - Business
4 - First
Below is an example of filtering for Business class flights:
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "LHR",
arrival_id: "NRT",
hl: "en",
currency: "USD",
type: "2",
outbound_date: "2024-01-23",
travel_class: "3"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
Advanced Filtering
The advanced filtering options allow for granular filtering of the flight results. This includes the ability to filter by a layover and maximum duration, number of stops, connecting airports filtering, and more. Let's explore each option in more detail.
You can check the official documentation for a full list of the supported advanced filtering parameters.
Number of stops
We can use the stops
parameter to filter the results by the number of stops for a given flight. If not provided, the default value of this parameter is 0 - any number of stops. We can pass a value of 1 to filter for non-stop flights only:
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "JFK",
arrival_id: "LHR",
hl: "en",
currency: "USD",
type: "2",
outbound_date: "2024-01-23",
stops: "1"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
To filter for flights with one stop or fewer, we can use stops=2
. Passing a value of 3 for the stops parameter will filter for flights with two stops or fewer.
Maximum duration
To filter by the total flight time duration, we can use the max_duration
parameter:
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "SYD",
arrival_id: "LHR",
hl: "en",
currency: "USD",
type: "2",
outbound_date: "2024-01-23",
max_duration: "1400"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
Layover duration
We can filter the results by the layover duration between the flights using the layover_duration parameter
. This parameter accepts a two-number string representing the time range in minutes:
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "SYD",
arrival_id: "LHR",
hl: "en",
currency: "USD",
type: "2",
outbound_date: "2024-01-23",
layover_duration: "250,300"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
Excluding connecting airport
If we want to exclude certain airports for the connecting flights, we can provide the specific airport ID to be excluded in the exclude_conns
parameter:
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "SYD",
arrival_id: "LHR",
hl: "en",
currency: "USD",
type: "2",
outbound_date: "2024-01-23",
exclude_conns: "AUH"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
The response from the example above will not include connecting flights from Abu Dhabi International Airport.
Outbound and return times ranges
Another pair of useful parameters is outbound_times
and return_times
. A two-number string value can be passed to those parameters, which represents the time range for the departing and returning flights. Each number represents the beginning of an hour.
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "SYD",
arrival_id: "LHR",
hl: "en",
currency: "USD",
type: "1",
outbound_date: "2024-01-23",
return_date: "2024-01-28",
outbound_times: "14,18",
return_times: "15,20"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
The response will contain outbound flights with a departure time between 2:00 PM and 7:00 PM. The departure time of the return flights will be between 3:00 PM and 9:00 PM.
Price
The max_price
parameter proves instrumental in fine-tuning the search criteria for optimal results. It functions as a discerning filter, allowing us to set an upper limit on the total cost of flights.
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "SYD",
arrival_id: "LHR",
hl: "en",
currency: "USD",
type: "2",
outbound_date: "2024-01-23",
max_price: "600"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
Searching for flights for Groups of passengers
Searching for flights tailored to groups of passengers introduces a layer of complexity and nuance to the travel planning process, ensuring that the unique needs of various travelers are accommodated. The flexibility offered by parameters such as adults
, children
, infants_in_seat
, and infants_on_lap
allows for precise customization, catering to the composition of the passenger group.
getJson({
api_key: API_KEY,
engine: "google_flights",
departure_id: "LHR",
arrival_id: "NRT",
hl: "en",
currency: "USD",
type: "2",
outbound_date: "2024-01-23",
adults: "3",
children: "2",
infants_in_seat: "1",
infants_on_lap: "1"
}, (results) => {
console.log(util.inspect(results, { depth: null, colors: true }));
});
Summary
In a nutshell, scraping Google Flights gives you the ability to make well-informed decisions by analyzing the extracted data. As demonstrated in this article, this is easily achieved using our Google Flights API.
If you have any questions or would like to discuss any issues or matters, feel free to contact our team at contact@serpapi.com. We'll be more than happy to assist you and answer all of your questions!