Extracting information from Google's protobuf-encoded JSON response
Google's protobuf-encoded JSON response has long been a mystery for many developers. When we make API calls to Google services, we often receive deeply nested arrays in JSON responses, lacking discernible keys to interpret the values. This blog post delves into the challenges I faced while working with SerpApi's Google Flights API and presents two methods for extracting meaningful information from these responses. For example, the rendered UI is:
And the corresponding JSON data (partial) for the "Basic Economy" option is:
[
[
null,
[[1701687003830831, 16330365, 336388772], null, null, null, null, [[4]]],
1,
"...",
"..."
],
[
[
[
0,
[["UA", "United", null, true]],
null,
[
["UA", "914"],
["UA", "1858"],
["UA", "400"],
["UA", "915"]
],
false,
[
"www.united.com/...",
null,
[
"https://www.google.com/travel/clk/f",
[
[
"u",
"..."
]
]
]
],
null,
[
[null, 1460],
"..."
],
[["EUR", 1344]],
null,
false,
null,
null,
[5, null, "France"],
[[[null, ["UA", "BASIC ECONOMY"], 1]]],
null,
null,
[
[1701687003830831, 16330365, 336388772],
null,
null,
null,
null,
[[6]]
],
[
[2, [[null, 142]], 1, [["EUR", 130]]],
[2, [[null, 196]], 1, [["EUR", 180]]],
[3]
],
"...",
null,
[
["UA", "BASIC ECONOMY"],
[
[9, 2],
[2, 4],
[1, 4],
[4, 2],
[5, 4]
],
true,
...
],
...
],
...
],
...
],
...
]
It's relatively easier to understand the strings. The prices ($1460, €1334) can also be found directly in the data. But what about the others? How can I locate the data responsible for rendering the contents inside the red boxes? In this blog post, I'll introduce two ways of extracting them.
Reverse-Engineering the JavaScript Code
Let's look into the contents in the first red box. They are attributes of the flight. Strings like "No refunds" and "No ticket changes" do not exist in the JSON response, so we may assume that they exist in the form of enums, which are basically numbers.
I utilized the browser's global search ability to search through all page resources including HTML, JavaScript and CSS. For Chrome, it's located in the three-dot button in the top-left corner of DevTools. Find "Search" in the "More Tools" submenu.
Search for "No refunds", and we got one result.
Clicking on the highlighted line to navigate to the JavaScript code, we got the actual logic of converting the enums to the strings rendered on the UI.
Although variable and method names are obfuscated, we can still find clues about the data location in the JSON. "No refunds" is matched under case 2
, then case 4
. So I scanned for a 2
and a 4
close to each other, and this array showed up:
[
[9, 2],
[2, 4],
[1, 4],
[4, 2],
[5, 4]
]
where [2, 4]
is the target. To verify, the second string "No ticket changes" is matched under case 1
, then case 4
, so there is [1, 4]
. For the first array [9, 2]
, there is no case 9
, so this one is skipped. Finally, the remaining four arrays are mapped to the four attributes on the UI. The code logic in the JavaScript snippet is quite clear.
Now, let's deal with the next red box. It contains prices - $142 and €130 - which is very characteristic, so we can easily locate the date before figuring out the logic.
[
[2, [[null, 142]], 1, [["EUR", 130]]],
[2, [[null, 196]], 1, [["EUR", 180]]],
[3]
]
By searching "1st checked bag", I got this code:
This snippet was more complex than the last one, there were unclear variables and function calls in between, and it was hard to trace where they originated. I failed to connect the numbers after the case
keyword with the numbers in the arrays. Further digging into the code didn't work well, so I decided to try a different approach.
Modifying HTTP response
Since code inspection didn't help, I decided to treat the code as a black box, and manually enumerate all outputs for all possible inputs. This can be done by modifying the data in the HTTP response and seeing how the change reflects on the UI. Chrome ships with a feature enabling developers to override HTTP responses, but it only matches the exact URL. In our case, each request was attached with a timestamp, making each request URL different from each other. I need a tool that supports modifying HTTP responses and more powerful URL matching. After trying and comparing, I picked Charles.
Setting up Charles is quite simple. Because I was intercepting HTTPS traffic, Charles needed to act as a "Man in the Middle" to properly decrypt and encrypt HTTPS requests and responses between the browser and the real website server. To make this work, a root certificate, which is owned by Charles, is required to be installed into the system so the browser can trust the traffic sent from Charles. Click "Help -> SSL Proxying -> Install Charles Root Certificate" and follow the guide to install the certificate.
Then enable SSL Proxying by clicking "Proxy -> Start SSL Proxying". Once HTTPS traffic appears in the list, the setup is successful.
Then, I set a breakpoint for the URL I wanted to intercept.
Once I refresh the Google Flights page, Charles will automatically pop up and pause the request, letting me modify anything in the response before sending it to the browser.
Click "Edit Response", Modify the text, and click "Execute".
For the part [2, [[null, 142]], 1, [["EUR", 130]]],
I tried to change the first 2
to 1
,3
and 4
respectively, and got:
and
and
Following this approach, I managed to show all the output strings that appeared in the JavaScript snippet. Here is the final extracted UI logic (in Ruby).
For the first line
case array[2][0]
when 1
"No carry-on bags"
when 2
if array[2][1]
"1 carry-on bag: #{[array[2][1][0][1], array[2][4][0][1]].compact.join('-')}"
else
"1 carry-on bag available for a fee"
end
when 3
"1 free carry-on"
end
For the second line
case array[0][0]
when 1
"No checked bags"
when 2
if array[0][1]
"1st checked bag: #{[array[0][1][0][1], array[0][4][0][1]].compact.join('-')}"
else
"1st checked bag available for a fee"
end
when 3
if array[1][0] == 3
"2 free checked bags"
else
kg = another_array[3]
if kg && kg > 0
"1st checked bag up to #{kg} kg free"
else
"1st checked bag free"
end
end
end
They are quite different from the logic in the JavaScript code. (e.g. there was no case 4
). This can be an alternative approach when reading the obfuscated JavaScript code is hard.
Conclusion
In this blog post, I introduced two ways to extract information from Google's protobuf-encoded JSON response. Code inspection is relatively easier and more efficient, but when code inspection doesn't work, we can let different data run through the code and monitor the output - a black-box way of peeking at the code logic. Thanks for reading. I hope you find it helpful.