Scrape Google Lens with Google Sheets

Intro

SerpApi streamlines the process of scraping search results pages. With the SerpApi Google Sheets Extension, you can even get search data without writing any code. This can be an attractive option if you would like to perform automated searches to collect data in a spreadsheet and don't have the know how to write a script.

One of the API's provided by SerpApi that has grown in popularity is the Google Lens API. This API allows simulating Google Lens searches by providing the URL to an image. In this tutorial we will look at how to scrape Google Lens search results into a Google Sheets document using the extension.

The only drawback to the Google Sheets Extension is that since Google Sheets operates outside of SerpApi's control, this method can sometimes deliver unexpected results. In this tutorial, we will cover some of the common pitfalls and best practices for avoiding them.

Setup

If you haven't already, you will need to create a free account on SerpApi.com. Navigate to https://serpapi.com/users/sign_up and complete the sign up process. After verifying your email and phone, remember to select the free plan from the dropdown menu and click Subscribe.

You will also need to install the Google Sheets Extension. After installing the plugin, create a new Google Sheets spreadsheet. Then click SerpApi - Search Engine Results and Ranks -> Register your API Key:

Navigate to https://serpapi.com/manage-api-key and click the clipboard icon to copy your API key:

Then navigate back to your Google Sheet and paste it in the text field:

Image Hosting

For this project we are also going to need some image URLs. Unlike Google Reverse Image Search, Google Lens doesn't need to find an exact match - it searches for images that look similar to the one you provide in your search. So you don't need to use images you expect to find duplicates of throughout the web. Take this example, where I have searched my own Github portrait:

Depending on your use case, you can supply your own photographs or artwork, an image collection from the web, or images you have downloaded onto your hard drive.

If you are familiar with the function of Google Lens, and not so familiar with how APIs, GET requests and web hosting work, you may be expecting to be able to upload images directly from your device to the API. Unfortunately, this is not the case.

To query the Google Lens API, you need to use the URL of an image that is hosted online. There are many reasons for this, but the short explanation is that in order to send your search parameters to SerpApi, you use something called an HTTP GET request. Different programming languages (and the Google Sheets extension) have different ways they accomplish accepting your parameters and creating the request, but they all need to send a GET request behind the scenes. A GET request just takes a URL, like you would type into the address bar of your browser. All of the information you want to send in your request has to be encoded in the URL. Most images have more data than can be represented in few enough characters to include in the URL for the GET request. Since there is therefore no way to include the actual image in the request, we use a URL where the image can be found online.

Fortunately there are a significant number of free and affordable image hosting platforms, the most popular of which is Imgur. I don't recommend using an AWS S3 buckets to host images you want to search Google Lens API with, as it can be difficult for the proxies to gain access to images stored there. I also don't recommend Internxt - while a high quality image hosting service, they don't seem to support linking directly to images.

For this tutorial we will use Imgur, since it is free and easy to use. You can actually upload images without even creating an account, just click New Post and select the images you would like to upload.

However, especially if you have a lot of images you would like to search, I recommend creating a free account, as this will give you access to some convenient tools that aren't available otherwise.

After you create an account and upload some images, click where it says View Image Info and select Generate Image Links from the dropdown. You can then click and drag to select as many images as you like. Then click Done.

A box will appear that contains two dropdowns and a list of links. In the first dropdown, make sure to select Direct Link (email and IM). In the second dropdown, choose Original Image (which should be default). Then select and copy all of the links in the box.

Now go back to your spreadsheet and paste all the URLS you copied into the first column.

Building the Query String

Now to get search results we will need to use the SERPAPI_RESULT function. This function requires two values. The first is the query string, a string of text that will be included in the URL to send the GET request mentioned earlier. The second is the JSON selector we will use to extract the value we want to display from the search results.

The query string needs to include all of the parameters we want to specify for our search. For this we use the format parameter_name=value for each parameter we want to include. We separate the parameters using an & symbol. In the case of Google Lens, we have two parameters we need to include: engine and url. We also have an optional parameter hl which specifies a language.

The engine parameter is where we specify what kind of search we want to perform, in this case Google Lens, which is encoded as google_lens in the query. The URL, as you may already have guessed, is the URL of the image we want to search. The hl parameter takes a language as a two letter code- en for english, fr for french etc. So to search in english for https://i.imgur.com/HBrB8p0.png we would need a query string of engine=google_lens&hl=en&url=https://i.imgur.com/HBrB8p0.png .

Before we add the query string to our Google Sheet, lets add another row for headers. Right click the top row and select "Insert one row above":

Then add headers:

In Google Sheets, there is an easier way to build the query string for each URL than to type each one manually. In the top cell of the second column, right next to where you added your first image URL, type ="engine=google_lens&hl=en&url="&A3.

If you are not familiar with spreadsheets, = means we are doing some kind of calculation to populate the cell, and & means we are concatenating text.

Specifying a Selector

We need to tell Google Sheets which part of the search results page we want to display in the cell. To determine this, we need to take a look at the JSON returned by a Google Lens API search. Head to the SerpApi Playground and select Google Lens from the dropdown in the top left:

Look at the JSON results on the bottom right. All of the data is either part of a key: value pair, or an array. To get values, we need to use either the key, or an index if it is an array. Indexes start at 0.

Scroll down in the JSON results to visual_matches:

The square bracket[ indicates we are dealing with an array. Curly braces {} indicate Objects, which are made up of key: value pairs. You can see that the visual_matches array contains multiple Objects separated by commas. To get the title of the first visual match, we will use the selector visual_matches.0.title. The 0 indicates the first item in the array and the title indicates we want the value corresponding to the title key.

So if we want to perform a search using the query string we created above, and display the title of the first visual match in a cell, we need to use the function =SERPAPI_RESULT("engine=google_lens&hl=en&url=https://i.imgur.com/HBrB8p0.png", "visual_matches.0.title") Of course, since we already have the query string in a cell, we can simplify this and use =SERPAPI_RESULT(B2, "visual_matches.0.title"):

However we are probably going to want to use several selectors to get different information from each of the results. We will probably also want to reuse these selectors for different queries. One way to do this is to add a row between the headers and the data where we will keep our selectors:

We can then change the formula to refer to these selectors, for example =SERPAPI_RESULT(B2, "visual_matches.0.title") becomes =SERPAPI_RESULT(B3, C2).

Copying and Pasting

Google Sheets provides some useful functionality when copying and pasting cells with formulas. Cells referenced in formulas automatically change when you copy and paste them. However, sometimes you don't want the cell reference to change.

For example, try copying and pasting the cell with our SERPAPI_RESULT formula from C3 into cells D3 and E3:

You can see there is a problem. Look at the formulas for the cells we pasted into. The formula in D3 now says =SERPAPI_RESULT(C3,D2). It's helpful that the selector is now referencing the next cell over (D2). That's what we want. But we didn't want the query string to change here. The formula is trying to pass the value of cell C3 as the query string, causing the API to return an error.

Fortunately there is an easy way to deal with this. We can use the $ symbol to make our cell references static. Go back to cell C3 add a $ in the formula before B3, so that it says =SERPAPI_RESULT($B3,C2). Then try copying and pasting the cell again. You will see that the selector will still change, but the correct cell will still be referenced for the query string.

A similar thing happens when you try to copy and paste the formulas into the other rows. This time the query string will change as desired but the selector will also change to reference the wrong row. For this we need to add the $ sign before the row number - for example in cell C3 =SERPAPI_RESULT($B3,C2) becomes =SERPAPI_RESULT($B3,C$2). Now you can copy and paste this cell into every other cell where you want to use the SERPAPI_RESULT formula:

Common Problems

Before trying this at a high volume, there are some common pitfalls we need to address.

Duplicate Searches

The first issue is that Google Sheets implementations of SerpApi can often use up many more search credits than intended. This is due to the way Google Sheets works, so there is unfortunately no way for SerpApi to fix the problem permanently. However there is a workaround.

When developers write code to get search data from SerpApi, they typically run a search once and then get all of the available data from that search. However, with Google Sheets, each cell that includes the SERPAPI_RESULT function is performing it's own search. Ideally, this still wouldn't be a problem, since each unique search is saved in a cache for one hour. When you attempt to make the same search again, it is served from the cache and you are not charged another search credit. The problem is that because of how Google Sheets works, identical searches are often performed at the same time, meaning there is no opportunity for the search to be cached before several identical calls are made.

The work around for this is simple. We create a second spreadsheet, and copy into it just one instance of each search. So, in our Google Lens example, we would copy just the first three columns:

Every time we want to run our searches, we open this document first and allow the searches to run. This ensures they are in the cache. Then we can open the original spreadsheet that collects all the data we want, without fear of getting charged for duplicate searches.

Running Searches by Accident

Another common problem is that the searches run every time the Google Sheet is opened. This makes it very easy to waste searches by running them when you don't intend to. A good way to avoid this is to ensure you close the spreadsheet when you are done with it, and avoid granting too many people access to it (they might forget and leave it open). You can copy the cells and paste them as text without formulas into another document, so that you can still access the data you have collected without rerunning the searches.

To stop any unintentional searches from going through, you can also navigate to https://serpapi.com/manage-api-key and click "Regenerate API Key". When you want to run the searches again, just register your API key in the Google Sheets Extension again. If you decide to do this, make sure you save the data from your searches somewhere else first, as otherwise you may not have access to it again without having to rerun the searches.

Other Problems

Google Sheets can also just be inconsistent at times. Sometimes the searches won't work for no discernible reason. When this happens, you can try uninstalling and reinstalling the plugin, copying and pasting into a new Sheet, and closing and reopening the document. You can also reach out to Customer Support if none of these things fixes the problem.

Limitations

SerpApi is primarily intended for developers. The Google Sheets Extension is provided as a no-code option, but there are limitations to how it can be used. Because of the unpredictable nature of Google Sheets, the inability to add custom logic, and the limited support available for the Extension, it may be a better option for your use case to hire a developer to write a script. This allows for more flexibility, more consistent and reliable performance, and better support.

Conclusion

I hope you found the tutorial informative and easy to follow. If you have any questions, feel free to contact me at ryan@serpapi.com.