Intro
This is the first blog post of the DuckDuckGo web scraping series. Here you'll see how to scrape Organic Search Results using Python and requests_html
library. An alternative API solution will be shown.
In short, it's a good idea to focus not only on one place (Google) because DuckDuckGo users get a higher conversion rate and tend to have a lower bounce rate.
Data from Similarweb to show that the total amount of visits on June 2021 was almost 1 billion with a bounce rate of 14.04%!
What will be scraped
DuckDuckGo Organic Results API
The first difference that you might encounter is that you will get 30 results instead of 10.
The second difference is that you don't have to render javascript
which will lead to faster program execution.
The third difference is that you immediately get access to a structured JSON
string and don't have to figure out how to scrape certain elements.
import json # for pretty output
from serpapi import GoogleSearch
params = {
"api_key": "YOUR_API_KEY", # your serpapi api key
"engine": "duckduckgo", # search engine
"q": "fus ro dah", # search query
"kl": "us-en" # language
}
search = GoogleSearch(params)
results = search.get_dict()
for result in results['organic_results']:
print(json.dumps(result, indent=2))
-------------------
'''
{
"position": 1,
"title": "FUS RO DAH!!! - YouTube",
"link": "https://www.youtube.com/watch?v=Ip7QZPw04Ks",
"snippet": "Finally found original upload of the prank footage: http://www.youtube.com/watch?v=wmM00L...(video is older but original poster)I am the original poster/crea...",
"favicon": "https://external-content.duckduckgo.com/ip3/www.youtube.com.ico"
}
...
'''
DIY Process
Selecting container with all data, title, link, snippet, icon with SelectorGadget Chrome extension.
The reason why request-html
was used instead of beautifulsoup
is because everything comes from the javascript
and to get the data it needs to be rendered. It could be also done with selenium
. It's the easiest approach to get this data I found.
But, you can parse this data from <script>
tag which will require a lot more time to find the right data and a lot of trial and error. Also, an alternative way to scrape DuckDuckGo without Selenium.
Full DIY Code
from requests_html import HTMLSession
session = HTMLSession()
response = session.get('https://duckduckgo.com/?q=fus+ro+dah&kl=us-en')
response.html.render()
for result in response.html.find('.links_deep'):
title = result.find('.js-result-title-link', first=True).text
link = result.find('.result__extras__url', first=True).text
snippet = result.find('.js-result-snippet', first=True).text
icon = f"https:{result.find('img.result__icon__img', first=True).attrs['data-src']}"
print(f'{title}\n{link}\n{snippet}\n{icon}\n')
------------------
'''
Urban Dictionary: Fus ro dah
https://www.urbandictionary.com/define.php?term=Fus ro dah
Fus ro dah. Literally means Force, Balance, and Push. The first dragon shout you learn in The Elder Scrolls V: Skyrim. In their tongue he is known as Dovahkiin, Dragonborn, Fus ro dah.
https://external-content.duckduckgo.com/ip3/www.urbandictionary.com.ico
Fus Ro Dah - Instant Sound Effect Button | Myinstants
https://www.myinstants.com/instant/fus-ro-dah/
Instant sound effect button of Fus Ro Dah . Fus Ro Dah. From skyrim. 8,072 users favorited this sound button.
https://external-content.duckduckgo.com/ip3/www.myinstants.com.ico
...
'''