Have you ever wanted to gather data from Yelp, but found it time-consuming to check every place and review manually? In this blog, we’ll see how to scrape the Yelp website. We’ll show you how to extract useful information like business details and customer reviews in an automated way.
We offer three different APIs to scrape Yelp:
- Yelp Search API: Scrape results from the Yelp search page.
- Yelp Place API: Scrape details of a place.
- Yelp Reviews API: Scrape reviews from a place.
You can use a simple GET request or a dedicated SDK based on your programming language, including Python, Javascript, and more. In this post, we'll use the GET request method in Python.
Preparation for accessing SerpApi API
- Register at serpapi.com for free to get your SerpApi Api Key.
- Create a new
main.py
file - Basic setup
Here is what the basic setup looks like
import requests
SERPAPI_API_KEY = "YOUR_REAL_SERPAPI_API_KEY"
params = {
"api_key": SERPAPI_API_KEY,
# soon
}
search = requests.get("https://serpapi.com/search", params=params)
response = search.json()
print(response)
With these few lines of code, we can access all of the search engines available at SerpApi, including Yelp. We'll need to adjust the parameters based on our needs.
Scrape Yelp Search Result page
First, we can use the Yelp Search API to get search results based on a particular query. It's useful when scraping many places in a specific location.
Documentation: https://serpapi.com/yelp-search-api
The required parameter is find_loc
to define where you want the search to originate. The following location formats are acceptable:
- 706 Mission St, San Francisco, CA
- San Francisco, CA
- San Francisco, CA 94103
- 94103
The search query can be put in the find_desc
parameter. Here is the complete code to search for "coffee" in Austin, Texas.
params = {
"api_key": SERPAPI_API_KEY,
"engine": "yelp",
"find_desc": "coffee",
"find_loc": "Austin, TX, United States"
}
search = requests.get("https://serpapi.com/search", params=params)
response = search.json()
print(response)
You can grab the organic_results or the ads_results when they are available.
organic_results = response["organic_results"]
ads_results = response["ads_results"]
Here is the result:
The organic_results will include:
- place_id
- title
- link
- categories
- price
- rating
- reviews
- neighborhoods
- phone
- snippet
- service_options
- thumbnail
Pagination
By default, Yelp returns 10 results per page. We can paginate the results using the start
parameter. It skips the given number of results, e.g., 0
(default) is the first page of the results, 10
is the 2nd page of results, 20
is the 3rd page of results, etc.
params = {
"api_key": SERPAPI_API_KEY,
"engine": "yelp",
"find_desc": "coffee",
"find_loc": "Austin, TX, United States",
"start": 10 #example value for 2nd page
}
If you prefer, we also have a video tutorial on this:
Scrape a specific place from Yelp
Next, we can scrape the individual page for a place using the Yelp Place API.
Documentation: https://serpapi.com/yelp-place
We need the place_id
as the parameter to run this API. The place_id is available from the previous `organic_results,` or you can also use the last path of the original URL when you open a place directly on Yelp. For example, if you're interested in scraping this page https://www.yelp.com/biz/flora-coffee-austin, then we can use flora-coffee-austin
as the place_id
.
Here is the code sample:
params = {
"api_key": SERPAPI_API_KEY,
"engine": "yelp_place",
"place_id": "flora-coffee-austin"
}
search = requests.get("https://serpapi.com/search", params=params)
response = search.json()
print(response)
Here is the result:
In the place_result
response, you can find this information:
- place name
- description
- reviews
- rating
- categories
- images
- address
- directions
- popular items
- review highlights
- business map
- features
- operation hours
- and more!
Scrape places reviews from Yelp
You are probably interested in scraping reviews from a place as well. We've got you covered! You can use the Yelp Reviews API to scrape all the reviews.
It requires the gibberish place_id
of a place. Note that you get two different IDs when using our Yelp Search API. We need the first place_id type for this.
Here is an example:
params = {
"api_key": SERPAPI_API_KEY,
"engine": "yelp_reviews",
"start": 0,
"num": "49",
"place_id": "uyYG-z7yiSPhXmkauzb43Q"
}
search = requests.get("https://serpapi.com/search", params=params)
response = search.json()
print(response)
Num is the parameter to define the number of results per page (maximum is 49). We can use the start
parameter to paginate the results parameter to paginate the results.
The start
parameter defines the result offset. It skips the given number of results. It's used for pagination. (e.g., 0
(default) is the first page of results, 49
is the 2nd page of results, 98
is the 3rd page of results, etc.).
Here is the result:
Under the reviews
response, you can find the comments and ratings for this place.
Filter for reviews API
You can filter the reviews based on a certain keyword using the q
parameter or filter by the rating number using the rating
parameter.
You can also sort the results using sort_by
parameter. Possible values for sorting:
relevance_desc
- Yelp Sort (default)date_desc
- Newest Firstdate_asc
- Oldest Ratedrating_desc
- Highest Ratedrating_asc
- Lowest Ratedelites_desc
- Elites
I hope it's helpful! Feel free to try!