What will be scraped

what

📌Note: in this blog post I'll show you how to scrape Apple App Store Search and receive the result exactly like on Apple iMac, because search results on Mac are absolutely different than results on PC. The screenshots below show you the difference:

  • Mac results:
    Mac

  • PC results:
    PC

Full code

If you don't need an explanation, have a look at the full code example in the online IDE

import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";

const engine = "apple_app_store"; // search engine
const resultsLimit = 50; // hardcoded limit for demonstration purpose
const params = {
  api_key: process.env.API_KEY, //your API key from serpapi.com
  term: "image viewer", // Parameter defines the query you want to search
  country: "us", // Parameter defines the country to use for the search
  lang: "en-us", // Parameter defines the language to use for the search
  device: "desktop", //Parameter defines the device to use to get the results. It can be set to "desktop", "tablet", or "mobile" (default)
  num: "10", // Parameter defines the number of results you want to get per each page
  page: 0, // Parameter is used to get the items on a specific page
};

const getResults = async () => {
  const results = [];
  while (true) {
    const json = await getJson(engine, params);
    if (json.organic_results) {
      results.push(...json.organic_results);
      params.page += 1;
    } else break;
    if (results.length >= resultsLimit) break;
  }
  return results;
};

getResults().then((result) => console.dir(result, { depth: null }));

Why use Apple App Store Search Scraper API from SerpApi?

Using API generally solves all or most problems that might get encountered while creating own parser or crawler. From webscraping perspective, our API can help to solve the most painful problems:

  • Bypass blocks from supported search engines by solving CAPTCHA or IP blocks.
  • No need to create a parser from scratch and maintain it.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation if there's a need to extract data in large amounts faster.

Head to the Playground for a live and interactive demo.

Preparation

First, we need to create a Node.js* project and add npm packages serpapi and dotenv.

To do this, in the directory with our project, open the command line and enter:

$ npm init -y

And then:

$ npm i serpapi dotenv

*If you don't have Node.js installed, you can download it from nodejs.org and follow the installation documentation.

  • SerpApi package is used to scrape and parse search engine results using SerpApi. Get search results from Google, Bing, Baidu, Yandex, Yahoo, Home Depot, eBay, and more.

  • dotenv package is a zero-dependency module that loads environment variables from a .env file into process.env.

Next, we need to add a top-level "type" field with a value of "module" in our package.json file to allow using ES6 modules in Node.JS:

ES6Module

For now, we complete the setup Node.JS environment for our project and move to the step-by-step code explanation.

Code explanation

First, we need to import dotenv from dotenv library and call config() method, then import getJson from serpapi library:

import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";
  • config() will read your .env file, parse the contents, assign it to process.env, and return an Object with a parsed key containing the loaded content or an error key if it failed.
  • getJson() allows you to get a JSON response based on search parameters.

Next, we write search engine, set how many results we want to receive (resultsLimit constant) and write the necessary search parameters for making a request:

const engine = "apple_app_store"; // search engine
const resultsLimit = 50; // hardcoded limit for demonstration purpose
const params = {
  api_key: process.env.API_KEY, //your API key from serpapi.com
  term: "image viewer", // Parameter defines the query you want to search
  country: "us", // Parameter defines the country to use for the search
  lang: "en-us", // Parameter defines the language to use for the search
  device: "desktop", //Parameter defines the device to use to get the results. It can be set to "desktop", "tablet", or "mobile" (default)
  num: "10", // Parameter defines the number of results you want to get per each page
  page: 0, // Parameter is used to get the items on a specific page
};

You can use the next search params:

  • api_key parameter defines the SerpApi private key to use.
  • term parameter defines the query you want to search. You can use any search term that you would use in a regular App Store search.
  • country parameter defines the country to use for the search. It's a two-letter country code. (e.g., us (default) for the United States, uk for United Kingdom, or fr for France). Head to the Apple Regions for a full list of supported Apple Regions.
  • lang parameter defines the language to use for the search. It's a four-letter country code. (e.g., en-us (default) for the English, fr-fr for French, or uk-ua for Ukranian). Head to the Apple Languages for a full list of supported Apple Languages.
  • num parameter defines the number of results you want to get per each page. It defaults to 10. Maximum number of results you can get per page is 200. Any number greater than maximum number will default to 200.
  • page parameter is used to get the items on a specific page. (e.g., 0 (default) is the first page of results, 1 is the 2nd page of results, 2 is the 3rd page of results, etc.).
  • disallow_explicit parameter defines the filter for disallowing explicit apps. It defaults to false.
  • property parameter allows to search the property of an app. developer allows searching the developer title of an app ( e.g., property: "developer" and term: "Coffee" gives apps with "Coffee" in their developer's name. (Ex: Coffee Inc.).
  • no_cache parameter will force SerpApi to fetch the App Store Search results even if a cached version is already present. A cache is served only if the query and all parameters are exactly the same. Cache expires after 1h. Cached searches are free, and are not counted towards your searches per month. It can be set to false (default) to allow results from the cache, or true to disallow results from the cache. no_cache and async parameters should not be used together.
  • async parameter defines the way you want to submit your search to SerpApi. It can be set to false (default) to open an HTTP connection and keep it open until you got your search results, or true to just submit your search to SerpApi and retrieve them later. In this case, you'll need to use our Searches Archive API to retrieve your results. async and no_cache parameters should not be used together. async should not be used on accounts with Ludicrous Speed enabled.
  • device parameter defines the device to use to get the results. It can be set to desktop to use a Mac App Store, tablet to use an iPad App Store, or mobile (default) to use an iPhone App Store.

Next, we declare the function getResult that gets data from the page and return it:

const getResults = async () => {
  ...
};

In this function we need to declare an empty results array and using while loop get json with results, add organic_results from each page and set next page index (to params.page value).

If there is no more results on the page or if the number of received results more thanresultsLimit we stop the loop (using break) and return an array with results:

const results = [];
while (true) {
  const json = await getJson(engine, params);
  if (json.organic_results) {
    results.push(...json.organic_results);
    params.page += 1;
  } else break;
  if (results.length >= resultsLimit) break;
}
return results;

And finally, we run the getResults function and print all the received information in the console with the console.dir method, which allows you to use an object with the necessary parameters to change default output options:

getResults().then((result) => console.dir(result, { depth: null }));

Output

[
  {
    "position": 1,
    "id": 1507782672,
    "title": "Pixea",
    "bundle_id": "imagetasks.Pixea",
    "version": "1.4",
    "vpp_license": true,
    "age_rating": "4+",
    "release_note": "- New icon - macOS Big Sur support - Universal Binary - Bug fixes and improvements",
    "seller_link": "https://www.imagetasks.com",
    "minimum_os_version": "10.12",
    "description": "Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them. Supported formats: JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives. Export formats: JPEG, JPEG-2000, PNG, TIFF, BMP. Found a bug? Have a suggestion? Please, send it to support@imagetasks.com Follow us on Twitter @imagetasks!",
    "link": "https://apps.apple.com/us/app/pixea/id1507782672?mt=12&uo=4",
    "serpapi_product_link": "https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
    "serpapi_reviews_link": "https://serpapi.com/search.json?country=us&engine=apple_reviews&page=1&product_id=1507782672",
    "release_date": "2020-04-20 07:00:00 UTC",
    "price": {
      "type": "Free"
    },
    "rating": [
      {
        "type": "All Times",
        "rating": 0,
        "count": 0
      }
    ],
    "genres": [
      {
        "name": "Photo & Video",
        "id": 6008,
        "primary": true
      },
      {
        "name": "Graphics & Design",
        "id": 6027,
        "primary": false
      }
    ],
    "developer": {
      "name": "ImageTasks Inc",
      "id": 450316587,
      "link": "https://apps.apple.com/us/developer/id450316587"
    },
    "size_in_bytes": 5838181,
    "supported_languages": ["EN"],
    "screenshots": {
      "general": [
        {
          "link": "https://is3-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/b1/8c/fb/b18cfb80-cb5c-d67d-2edc-ee1f6666e012/35b8d5a7-b493-4a80-bdbd-3e9d564601dd_Pixea-1.jpg/800x500bb.jpg",
          "size": "800x500"
        },
        {
          "link": "https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/96/08/83/9608834d-3d2b-5c0b-570c-f022407ff5cc/1836573e-1b6a-421c-b654-6ae2f915d755_Pixea-2.jpg/800x500bb.jpg",
          "size": "800x500"
        },
        {
          "link": "https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/58/fd/db/58fddb5d-9480-2536-8679-92d6b067d285/98e22b63-1575-4ee6-b08d-343b9e0474ea_Pixea-3.jpg/800x500bb.jpg",
          "size": "800x500"
        },
        {
          "link": "https://is2-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/c3/f3/f3/c3f3f3b5-deb0-4b58-4afc-79073373b7b9/28f51f38-bc59-4a61-a5a1-bff553838267_Pixea-4.jpg/800x500bb.jpg",
          "size": "800x500"
        }
      ]
    },
    "logos": [
      {
        "size": "60x60",
        "link": "https://is1-ssl.mzstatic.com/image/thumb/Purple114/v4/73/5f/29/735f2997-66b7-9795-ad4f-7ed78d0d3812/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/60x60bb.png"
      },
      {
        "size": "512x512",
        "link": "https://is1-ssl.mzstatic.com/image/thumb/Purple114/v4/73/5f/29/735f2997-66b7-9795-ad4f-7ed78d0d3812/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/512x512bb.png"
      },
      {
        "size": "100x100",
        "link": "https://is1-ssl.mzstatic.com/image/thumb/Purple114/v4/73/5f/29/735f2997-66b7-9795-ad4f-7ed78d0d3812/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/100x100bb.png"
      }
    ]
  },
  ...and other results
]

If you want other functionality added to this blog post or if you want to see some projects made with SerpApi, write me a message.


Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞