What will be scraped

wwbs-google-jobs-listing

📌Note: Some queries may not display all sections. You can check your query in the playground.

Why using API?

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

serpapi-status-page

Full Code

If you don't need explanation, have a look at full code example in the online IDE.

from serpapi import GoogleSearch
import os, json


def extract_multiple_jobs():
    params = {
        # https://docs.python.org/3/library/os.html#os.getenv
        'api_key': os.getenv('API_KEY'),        # your serpapi api
        'engine': 'google_jobs',                # SerpApi search engine	
        'gl': 'us',                             # country of the search
        'hl': 'en',                             # language of the search
        'q': 'barista new york',                # search query
    }

    search = GoogleSearch(params)               # where data extraction happens on the SerpApi backend
    results = search.get_dict()                 # JSON -> Python dict

    return [job.get('job_id') for job in results['jobs_results']]


def scrape_google_jobs_listing(job_ids):
    data = []

    for job_id in job_ids:
        params = {
            # https://docs.python.org/3/library/os.html#os.getenv
            'api_key': os.getenv('API_KEY'),    # your serpapi api
            'engine': 'google_jobs_listing',    # SerpApi search engine	
            'q': job_id,                        # search query (job_id)
        }
        
        search = GoogleSearch(params)           # where data extraction happens on the SerpApi backend
        results = search.get_dict()             # JSON -> Python dict
        
        data.append({
            'job_id': job_id,
            'apply_options': results.get('apply_options'),
            'salaries': results.get('salaries'),
            'ratings': results.get('ratings')
        })

    return data


def main():
    job_ids = extract_multiple_jobs()
    google_jobs_listing_results = scrape_google_jobs_listing(job_ids)

    print(json.dumps(google_jobs_listing_results, indent=2, ensure_ascii=False))


if __name__ == '__main__':
    main()

Preparation

Install library:

pip install google-search-results

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
import os, json
Library Purpose
GoogleSearch to scrape and parse Google results using SerpApi web scraping library.
os to return environment variable (SerpApi API key) value.
json to convert extracted data to a JSON object.

Top-level code environment

The extract_multiple_jobs() function is called to get all the job_id values. The resulting list of job_ids is passed to the scrape_google_jobs_listing(job_ids) function to retrieve the required data. The explanation of these functions will be in the corresponding headings below.

This code uses the generally accepted rule of using the __name__ == "__main__" construct:

def main():
    job_ids = extract_multiple_jobs()
    google_jobs_listing_results = scrape_google_jobs_listing(job_ids)

    print(json.dumps(google_jobs_listing_results, indent=2, ensure_ascii=False))


if __name__ == '__main__':
    main()

This check will only be performed if the user has run this file. If the user imports this file into another, then the check will not work.

You can watch the video Python Tutorial: if name == 'main' for more details.

Extract Multiple Jobs

The function returns a list of job_id values. The value of this identifier will be used in the next function to create the request.

This function provides a code snippet for getting data from the first page. If you want to extract data using pagination, you can see it in the Scrape Google Jobs organic results with Python blog post.

At the beginning of the function, parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary.

params = {
    # https://docs.python.org/3/library/os.html#os.getenv
    'api_key': os.getenv('API_KEY'),        # your serpapi api
    'engine': 'google_jobs',                # SerpApi search engine	
    'gl': 'us',                             # country of the search
    'hl': 'en',                             # language of the search
    'q': 'barista new york',                # search query
}
Parameters Explanation
api_key Parameter defines the SerpApi private key to use.
engine Set parameter to google_jobs to use the Google Jobs API engine.
gl Parameter defines the country to use for the Google search. It's a two-letter country code. (e.g., us for the United States, uk for United Kingdom, or fr for France). Head to the Google countries page for a full list of supported Google countries.
hl Parameter defines the language to use for the Google Jobs search. It's a two-letter language code. (e.g., en for English, es for Spanish, or fr for French). Head to the Google languages page for a full list of supported Google languages.
q Parameter defines the query you want to search.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = GoogleSearch(params)   # where data extraction happens on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict

Returns a compiled list of all job_id using list comprehension:

return [job.get('job_id') for job in results['jobs_results']]

The function looks like this:

def extract_multiple_jobs():
    params = {
        # https://docs.python.org/3/library/os.html#os.getenv
        'api_key': os.getenv('API_KEY'),        # your serpapi api
        'engine': 'google_jobs',                # SerpApi search engine	
        'gl': 'us',                             # country of the search
        'hl': 'en',                             # language of the search
        'q': 'barista new york',                # search query
    }

    search = GoogleSearch(params)               # where data extraction happens on the SerpApi backend
    results = search.get_dict()                 # JSON -> Python dict

    return [job.get('job_id') for job in results['jobs_results']]

Scrape Google Jobs Listing

This function takes the job_ids list and returns a list of all data.

Declaring the data list where the extracted data will be added:

data = []

For each job_id value in the job_ids list, separate requests will be made and the corresponding data will be retrieved:

for job_id in job_ids:
    # data extraction will be here

Next, we write a parameters for making a request:

params = {
    # https://docs.python.org/3/library/os.html#os.getenv
    'api_key': os.getenv('API_KEY'),    # your serpapi api
    'engine': 'google_jobs_listing',    # SerpApi search engine	
    'q': job_id,                        # search query (job_id)
}
Parameters Explanation
api_key Parameter defines the SerpApi private key to use.
engine Set parameter to google_jobs_listing to use the Google Jobs Listing API engine.
q Parameter defines the job_id string which can be obtained from Google Jobs API.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = GoogleSearch(params)   # where data extraction happens on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict

We can then create a dictionary structure from values such as job_id, apply_options, salaries, and ratings. The extracted data is written according to the corresponding keys. After that, the dictionary is appended to the data list:

data.append({
    'job_id': job_id,
    'apply_options': results.get('apply_options'),
    'salaries': results.get('salaries'),
    'ratings': results.get('ratings')
})

At the end of the function, the data list is returned with the retrieved data for each job_id:

return data

The complete function to scrape all data would look like this:

def scrape_google_jobs_listing(job_ids):
    data = []

    for job_id in job_ids:
        params = {
            # https://docs.python.org/3/library/os.html#os.getenv
            'api_key': os.getenv('API_KEY'),    # your serpapi api
            'engine': 'google_jobs_listing',    # SerpApi search engine	
            'q': job_id,                        # search query (job_id)
        }
        
        search = GoogleSearch(params)           # where data extraction happens on the SerpApi backend
        results = search.get_dict()             # JSON -> Python dict
        
        data.append({
            'job_id': job_id,
            'apply_options': results.get('apply_options'),
            'salaries': results.get('salaries'),
            'ratings': results.get('ratings')
        })

    return data

Output

[
  {
    "job_id": "eyJqb2JfdGl0bGUiOiJCYXJpc3RhIiwiaHRpZG9jaWQiOiJuc3Y1d1hyNXdFOEFBQUFBQUFBQUFBPT0iLCJnbCI6InVzIiwiaGwiOiJlbiIsImZjIjoiRXVJQkNxSUJRVUYwVm14aVFtcFdYMjl0V0ZadU9USTNWV0ZZUlZRek9XRTJPVlJtYUc1RVZtaGpaRk5WT1VFMlNYZFpaR2ROU0dzdFoyMVBkMmxmUTNKS2RUQnJjMWxFT0dZNFNHWnFXRUZNTjB4eFRWVmtMV1JRVVRWaVJGbFVSMVo1YmxsVWVuazVPRzlxVVVsTmVXcFJjRXhPVWpWbWMwdFlTMlo2V21SUU1XSkZZa2hTY2pKaGRYcEdlRzVxTVVWNGIwZ3lhVXd3UlZGVVZ6Tk5XSGRNYXpKbVYyVjNFaGQzYkhCeFdTMWZUMHhNTW01d2RGRlFNRGhwUW05QmF4b2lRVVJWZVVWSFpqSTJWMjF3TjBoU2FtNDRPSHB5WkVWTldVMVhVWGRTU1hwMVFRIiwiZmN2IjoiMyIsImZjX2lkIjoiZmNfMSIsImFwcGx5X2xpbmsiOnsidGl0bGUiOiIubkZnMmVie2ZvbnQtd2VpZ2h0OjUwMH0uQmk2RGRje2ZvbnQtd2VpZ2h0OjUwMH1BcHBseSBkaXJlY3RseSBvbiBDdWxpbmFyeSBBZ2VudHMiLCJsaW5rIjoiaHR0cHM6Ly9jdWxpbmFyeWFnZW50cy5jb20vam9icy80MTc4NjMtQmFyaXN0YT91dG1fY2FtcGFpZ249Z29vZ2xlX2pvYnNfYXBwbHlcdTAwMjZ1dG1fc291cmNlPWdvb2dsZV9qb2JzX2FwcGx5XHUwMDI2dXRtX21lZGl1bT1vcmdhbmljIn19",
    "apply_options": [
      {
        "title": "Apply on Trabajo.org",
        "link": "https://us.trabajo.org/job-1683-20221107-34e191c4eb8c8ca3ec69adfa55061df2?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
      },
      {
        "title": "Apply on Jobs",
        "link": "https://us.fidanto.com/jobs/job-opening/nov-2022/barista-1432712052?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
      },
      {
        "title": "Apply on Craigslist",
        "link": "https://newyork.craigslist.org/mnh/fbh/d/new-york-cafe-barista/7553733276.html?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
      },
      {
        "title": "Apply directly on Culinary Agents",
        "link": "https://culinaryagents.com/jobs/417863-Barista?utm_campaign=google_jobs_apply&utm_source=google_jobs_apply&utm_medium=organic"
      }
    ],
    "salaries": null,
    "ratings": null
  },
  ... other results
]

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞