Extract all search engines ad results at once using Python

Extract all search engines ad results at once using Python
Python

In this post, we will talk about how important the ad results, and how to extract the ad result, not for a single-engine, but for all engines at once, in one simple tool example.

Intro:
Details:
There might be someone out there asking, it would be very useful to have one engine for each result, for example, one engine to extract all ad results of the other engines combined, well, you don't need the engine to do that! you can do it easily.

The ad results are so important for many companies, shopping companies, SEO companies, and advertisement companies.

Requirements:

Python 3.x

Libraries:

import requests
from serpapi import GoogleSearch
import json
import os
import urllib.parse as urlparse
from urllib.parse import urlencode

All these libraries are optional if you want to use alternatives or your own, we're using these libraries to send requests, saving and formatting the JSON result, and of course the serpapi library.

on line '10' you must enter your API key,

You can grab your API key from here

api_key = "YOUR API KEY"
query_search 

this is the search query you want to search for it.

The engines and the parameters: You can enter any engine you like with  the search parameters

engines = [
  {"name": "google", "query": "q", "ad_param": "ads"},
  # {"name": "baidu", "query": "q", "ad_param": ""},
  {"name": "bing", "query": "q", "ad_param": "ads"},
  {"name": "duckduckgo", "query": "q", "ad_param": "ads"},
  {"name": "yahoo", "query": "p", "ad_param": "ads_results"},
  # {"name": "yandex", "query": "text", "ad_param": ""},
  # {"name": "ebay", "query": "_nkw", "ad_param": ""},
  {"name": "youtube", "query": "search_query", "ad_param": "ads_results"},
  {"name": "walmart", "query": "query", "ad_param": "ads_results"},
  # {"name": "home_depot", "query": "q", "ad_param": ""},
]

The rest is just easy to understand code, pagination, and calling the library.

Getting started:

After downloading the tool that can be found on GitHub

all what you need is to start the tool by using this command from CMD

python ad.py

After that, we gonna wait for a couple of minutes, and we will get our results as a JSON file.

CMD extracting ad results
The ad results JSON file.

This will be the structure of the JSON result file:

Structure of the JSON result file.

Remember the tool takes a long time to extract the results because we're getting the results from all pages available in each engine and for each search result.

Full code (in case you don't want to get it from GitHub)

import requests
from serpapi import GoogleSearch
import json
import os
import urllib.parse as urlparse
from urllib.parse import urlencode

path = os.path.dirname(os.path.abspath(__file__))
ad_results = []
api_key = "YOUR API KEY"
query_search = "Coffee"
output_file = path+'/results.json'

engines = [
  {"name": "google", "query": "q", "ad_param": "ads"},
  # {"name": "baidu", "query": "q", "ad_param": ""},
  {"name": "bing", "query": "q", "ad_param": "ads"},
  {"name": "duckduckgo", "query": "q", "ad_param": "ads"},
  {"name": "yahoo", "query": "p", "ad_param": "ads_results"},
  # {"name": "yandex", "query": "text", "ad_param": ""},
  # {"name": "ebay", "query": "_nkw", "ad_param": ""},
  {"name": "youtube", "query": "search_query", "ad_param": "ads_results"},
  {"name": "walmart", "query": "query", "ad_param": "ads_results"},
  # {"name": "home_depot", "query": "q", "ad_param": ""},
]

def get_next(engine, results):
  out = []
  current = []

  if engine["ad_param"] in results:
    ads = results[engine['ad_param']]
    out = out + ads
  
  current.append(results.get('serpapi_pagination').get('current'))

  while 'next' in results.get('serpapi_pagination'):
    url = results.get('serpapi_pagination').get('next')

    url_parse = urlparse.urlparse(url)
    query = url_parse.query
    url_dict = dict(urlparse.parse_qsl(query))
    url_dict.update(params)
    url_new_query = urlparse.urlencode(url_dict)
    url_parse = url_parse._replace(query=url_new_query)
    new_url = urlparse.urlunparse(url_parse)

    response = requests.get(new_url, timeout=60000)

    results = dict(json.loads(response.text))

    if "error" in results:
      print("oops error: ", results["error"])
      break

    if results.get('serpapi_pagination').get('current') in current:
      break

    current.append(results.get('serpapi_pagination').get('current'))

    if engine["ad_param"] in results:
      ads = results[engine['ad_param']]
      out = out + ads

for engine in engines:
  params = {
    "engine": engine["name"],
    engine["query"]: query_search,
    "api_key": api_key
  }

  search = GoogleSearch(params)
  results = search.get_dict()
  if "error" in results:
    print("oops error: ", results["error"])
    continue
  
  next_results = get_next(engine, results)
  if next_results:
    ad_results.append({"name": engine["name"], "value": next_results})

with open(output_file, 'w') as fout:
  json.dump(ad_results , fout)

Tips:

1- Always don't forget your API key in the tool when you upload it on your GitHub, Your API key should be treated as a password.

2- Try to change the pagination, limit them maybe to get the result faster.

Ending:

You can find the documentation about how to use serpapi and you can follow us on Twitter at @serp_api, to get our latest news and articles.