How to Scrape Google Maps Place Photos using SerpApi
Intro
In this blog post, we'll go through the process of extracting data from Google Maps Place results using Python. You can look at the complete code in the online IDE (Replit).
In order to successfully extract Google Maps Photos, you will need to pass the data_id
parameter, this parameter is responsible for photos from a specific place. You can extract this parameter from local results. Have a look at the Using Google Maps Local Results API from SerpApi blog post, in which I described in detail how to extract all the needed data.
If you prefer video format, we have a dedicated video that shows how to do that:
What will be scraped
Why using API?
There are a couple of reasons that may use an API, ours in particular:
- No need to create a parser from scratch and maintain it.
- Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation.
SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.
Full Code
If you just need to extract all available data about the place, then we can create an empty list
and then append
extracted data to it:
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import os, json
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_maps', # SerpApi search engine
'q': 'Starbucks', # query
'll': '@40.7455096,-74.0083012,15.1z', # GPS coordinates
'type': 'search', # list of results for the query
'hl': 'en', # language
'start': 0, # pagination
}
search = GoogleSearch(params) # where data extraction happens on the backend
results = search.get_dict() # JSON -> Python dict
local_data_results = [
(result['title'], result['data_id'])
for result in results['local_results']
]
photo_results = []
for title, data_id in local_data_results:
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_maps_photos', # SerpApi search engine
'hl': 'en', # language
'data_id': data_id # place result
}
search = GoogleSearch(params)
photos = []
# pagination
while True:
new_page_results = search.get_dict()
if 'next' in new_page_results.get('serpapi_pagination', {}):
search.params_dict.update(dict(parse_qsl(urlsplit(new_page_results.get('serpapi_pagination', {}).get('next')).query)))
else:
break
photos.extend(new_page_results['photos'])
photo_results.append({
'title': title,
'photos': photos
})
print(json.dumps(photo_results, indent=2, ensure_ascii=False))
Preparation
Install library:
pip install google-search-results
google-search-results
is a SerpApi API package.
Code Explanation
Import libraries:
from serpapi import GoogleSearch
from urllib.parse import urlsplit, parse_qsl
import os, json
Library | Purpose |
---|---|
GoogleSearch |
to scrape and parse Google results using SerpApi web scraping library. |
urlsplit |
this should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted. |
parse_qsl |
to parse a query string given as a string argument. |
os |
to return environment variable (SerpApi API key) value. |
json |
to convert extracted data to a JSON object. |
At the beginning of the code, you need to make the request in order to get local results. Then place results will be extracted from them.
The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params
dictionary:
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_maps', # SerpApi search engine
'q': 'Starbucks', # query
'll': '@40.7455096,-74.0083012,15.1z', # GPS coordinates
'type': 'search', # list of results for the query
'hl': 'en', # language
'start': 0, # pagination
}
Then, we create a search
object where the data is retrieved from the SerpApi backend. In the results
dictionary we get data from JSON:
search = GoogleSearch(params) # where data extraction happens on the backend
results = search.get_dict() # JSON -> Python dict
At the moment, the first 20 local results are stored in the results
dictionary. If you are interested in all local results with pagination, then check out the Using Google Maps Local Results API from SerpApi blog post.
Data such as title
and data_id
are extracted from each local result. This data will be needed later:
local_data_results = [
(result['title'], result['data_id'])
for result in results['local_results']
]
Declaring the photo_results
list where the extracted data will be added:
photo_results = []
Next, you need to access each place's photos separately by iterating the local_data_results
list:
for title, data_id in local_data_results:
# data extraction will be here
These parameters are defined for generating the URL for place results:
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_maps_photos', # SerpApi search engine
'hl': 'en', # language
'data_id': data_id # place result
}
Parameters | Explanation |
---|---|
api_key |
Parameter defines the SerpApi private key to use. |
engine |
Set parameter to google_maps_photos to use the Google Maps Photos API engine. |
hl |
Parameter defines the language to use for the Google Maps Photos search. It's a two-letter language code, for example, en for English (default), es for Spanish, or fr for French). Head to the Google languages page for a full list of supported Google languages. |
data_id |
Parameter defines the Google Maps data ID. Find the data ID of a place using our Google Maps API. |
Then, we create a search
object where the data is retrieved from the SerpApi backend:
search = GoogleSearch(params)
Declaring the photos
list where the extracted data from the current place will be added:
photos = []
In order to get all photos from a specific location, you need to apply pagination while the next data packet is present. Therefore, an endless loop is created:
while True:
# pagination from current page
In the new_page_results
dictionary we get a new data packet of the data in JSON format:
new_page_results = search.get_dict()
There is a condition inside the loop. If the next data packet is present, then the search
object is updated. Else, the cycle is stopped.
if 'next' in new_page_results.get('serpapi_pagination', {}):
search.params_dict.update(dict(parse_qsl(urlsplit(new_page_results.get('serpapi_pagination', {}).get('next')).query)))
else:
break
Expanding the photos
list with new data from this page:
photos.extend(new_page_results['photos'])
# thumbnail = new_page_results['photos']['thumbnail']
# image = new_page_results['photos']['image']
# user_link = new_page_results['photos']['user']['link']
# user_id = new_page_results['photos']['user']['user_id']
# user_name = new_page_results['photos']['user']['name']
๐Note: If you want to extract some specific fields, then in the comment above I gave an example of how this can be implemented. If some data is missing, the extraction will fail. In this case, it is better to use the dict.get()
method.
Append title
and photos
from this place in the photo_results
list:
photo_results.append({
'title': title,
'photos': photos
})
After the all data is retrieved, it is output in JSON format:
print(json.dumps(photo_results, indent=2, ensure_ascii=False))
Output
[
{
"title": "Starbucks",
"photos": [
{
"thumbnail": "https://lh5.googleusercontent.com/p/AF1QipOuxvgz55TI_eqJCrv_vrsZQs-YcRy2tIf_x97l=w203-h114-k-no",
"image": "https://lh5.googleusercontent.com/p/AF1QipOuxvgz55TI_eqJCrv_vrsZQs-YcRy2tIf_x97l=w5312-h2988-k-no"
},
{
"thumbnail": "https://lh5.googleusercontent.com/p/AF1QipNf9jWD_l0f6AAdhBK6RhzYIxyVFpA2_yOJodGC=w203-h114-k-no",
"image": "https://lh5.googleusercontent.com/p/AF1QipNf9jWD_l0f6AAdhBK6RhzYIxyVFpA2_yOJodGC=w512-h289-k-no"
},
{
"thumbnail": "https://lh5.googleusercontent.com/p/AF1QipOtdpwp3O8OANB5Jyz8RuGNuYdM1sM4NQqdBrRf=w203-h270-k-no",
"image": "https://lh5.googleusercontent.com/p/AF1QipOtdpwp3O8OANB5Jyz8RuGNuYdM1sM4NQqdBrRf=w3024-h4032-k-no"
},
... other photos
]
},
... other results
]
๐Note: You can view playground or check the output. This way you will be able to understand what keys you can use in this JSON structure to get the data you need.
Links
Add a Feature Request๐ซ or a Bug๐