Web Scraping With PowerShell
PowerShell is a command-line shell and scripting language that you can use to automate tasks, manage systems, and perform several operations.
It has been the default shell for Windows since 2016, but unless you're a system or server administrator, chances are you've rarely used it. Most people don't realize how powerful it is.
But why PowerShell? Well, depends on your use case, but it's useful for quickly checking our APIs, without having to setup anything or change your project. You can also automate the execution of scripts to run them periodically.
I'm using PowerShell 5.1, but the examples below run on newer versions and PowerShell Core. If you want to upgrade it in Windows, please refer to Microsoft's documentation.
If you’re not a Windows user, don’t worry! PowerShell is cross-platform, and you can check how to install it on Linux and MacOS.
The Basics of PowerShell
Here's PowerShell in a nutshell:
- In PowerShell, named commands are called
cmdlets
(pronounced command-lets). cmdlets
follow a Verb-Noun convention.- Variables in PowerShell always start with a
$
like PHP. - By convention, variables in PowerShell use PascalCase.
- Everything is an object in PowerShell.
For this tutorial we're going to use a single cmdlet: Invoke-RestMethod
. This cmdlet sends a request to a REST API and returns an object formatted differently depending on the response.
To understand Invoke-RestMethod
better, let's use two other cmdlets first:
Invoke-WebRequest
ConvertFrom-Json
Invoke-WebRequest
is PowerShell's version of cURL. It makes a request and returns a response. And ConvertFrom-Json
converts a JSON string into an object (or hash table for later versions of PowerShell).
Using SerpApi
Let's use the URL in SerpApi's web page where it says "Easy integration" and pass it to PowerShell using the -Uri
flag:
Invoke-WebRequest -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY"
This will give us a response like this (with some of its content redacted for brevity):
StatusCode : 200
StatusDescription : OK
Content : {...}
RawContent : HTTP/1.1 200 OK
Connection: keep-alive
CF-Ray: 883ac74bedb8f655-NRT
CF-Cache-Status: EXPIRED
Vary: Accept-Encoding
referrer-policy: strict-origin-when-cross-origin
serpapi-search-id: 664350bfe93...
Forms : {}
Headers : {...}
Images : {}
InputFields : {}
Links : {}
ParsedHtml : System.__ComObject
RawContentLength : 48676
The JSON we actually want is inside the Content
property. We could pipe Invoke-WebRequest
output into the Select-Object
cmdlet to access Content
, by using the -ExpandProperty
flag with Content
as the property we want to expand. Since everything is an object in PowerShell, we can also access Content
by using dot notation:
# Getting Content with Select-Object
Invoke-WebRequest -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY" | Select-Object -ExpandProperty Content
# Getting Content with dot notation
(Invoke-WebRequest -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY").Content
Either way, we can now access the JSON we want:
{
"search_metadata": {
"id": "664350bfe93ff45eb2993ec0",
"status": "Success",
"json_endpoint": "https://serpapi.com/searches/3bc827959d2dd083/664350bfe93ff45eb2993ec0.json",
"created_at": "2024-05-14 11:53:35 UTC",
"processed_at": "2024-05-14 11:53:35 UTC",
"google_url": "https://www.google.com/search?q=Coffee&oq=Coffee&uule=w+CAIQICIaQXVzdGluLFRleGFzLFVuaXRlZCBTdGF0ZXM&hl=en&gl=us&sourceid=chrome&ie=UTF-8",
"raw_html_file": "https://serpapi.com/searches/3bc827959d2dd083/664350bfe93ff45eb2993ec0.html",
"total_time_taken": 1.16
},
...
}
We can then pipe this into the ConvertFrom-Json
cmdlet to convert the JSON string into an object we can use. To make it easier to access later, we'll assign everything to a variable. Here's how your command should look like:
$Json = Invoke-WebRequest -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY" | Select-Object -ExpandProperty Content | ConvertFrom-Json
Now let's go back to Invoke-RestMethod
. What it does is wrap everything we just did in a single command. Instead of running the command above, we could use:
$Json = Invoke-RestMethod -Uri "https://serpapi.com/search.json?q=Coffee&location=Austin,+Texas,+United+States&hl=en&gl=us&google_domain=google.com&api_key=YOUR_API_KEY"
Since we used a variable, there's no output this time. You can type the variable name and press Enter
to have its entire content printed out to the console. You can also redirect the output to a file in its current working directory by using the >
operator:
$Json > out.json
You can now see the JSON response inside the out.json
file. If you're having encoding problems, consider using the Out-File cmdlet instead of the >
operator. If you want to export it as a CSV instead, take a look at the Export-CSV cmdlet and combine it with the >
operator.
We can access keys inside this $Json
object by using dot notation like we did before when accessing the response Content
property.
For example, $Json.search_metadata
will return all the keys and values inside search_metadata
, and $Json.search_metadata.id
will return just the value 664350bfe93ff45eb2993ec0
.
For keys that have arrays as its value, you can use brackets notation to access specific elements inside the array.
For example, $Json.organic_results
will return all 8 search results, while $Json.organic_results[0]
will return the first one.
You can then use dot notation again to get a specific value from this specific organic result. For example, $Json.organic_results[0].link
will return the first organic results' URL.
You can also use the snippet of code below instead of having everything inside a single line:
$Uri = "https://serpapi.com/search.json"
$Parameters = @{
q = "Coffee"
location = "Austin,+Texas,+United+States"
hl = "en"
gl = "us"
google_domain = "google.com"
api_key = "YOUR_API_KEY"
}
$Json = Invoke-RestMethod -Uri $Uri -Body $Parameters
Note: If you don’t want to keep opening the terminal every time, you can also save everything in a PowerShell script file. Just open a text file, paste the snippet of code, save and give it a .ps1
extension. Now you can run it by double-clicking the file.
Wrapping up
I hope this beginners tutorial was able to showcase some of PowerShell's capabilities. It's pretty much a full-fledged programming language, so this is just a small taste of its power. You can use PowerShell to do everything something like Python can do.
While this isn’t an in-depth tutorial, if you want to parse the HTML directly, you could combine Invoke-WebRequest
with the PSParseHTML module or AngleSharp .NET libraries. With this, you can scrape data from web pages, not just the search results we provide.
Feel free to access our Google Search Engine Results API and modify the parameters to test our API, and don't forget to sign up for a free account to get 100 credits/month if you haven't already. That's plenty for testing and simple task automation.
If you have any questions or concerns, feel free to contact our team at contact@serpapi.com!