How to Extract Full Opinion Text from Google Scholar Case Law with SerpApi

SerpApi’s Google Scholar Case Law API returns structured case law data from Google Scholar, including case details and related metadata. For many workflows, that structured response is enough.

However, some use cases require the full opinion body. You may want to store the opinion text locally, make it searchable, review it in a cleaner format, or pass it into another internal workflow.

The full opinion body is available in the raw HTML response. In this tutorial, we’ll extract that opinion body from the HTML, convert it to Markdown, and save it locally using both JavaScript and Python.

If you’re new to the Google Scholar Case Law API, my colleague’s blog on returning the structured response is a helpful starting point, but it’s not required for following this tutorial:

Why use SerpApi?

Google Scholar Case Law pages can be scraped manually, but maintaining that workflow reliably can become difficult. At scale, you need to manage proxy infrastructure, CAPTCHA handling, retries, parsing changes, and monitoring for page structure updates.

SerpApi handles the search engine scraping layer and returns results through an API. For Google Scholar Case Law, that means you can retrieve structured case law data from the JSON response, while still having access to the underlying page HTML when you need the full opinion body.

Why the opinion body is in the raw HTML

The Google Scholar Case Law pages include the full legal opinion, but opinion length varies dramatically. Some opinions are relatively short, while others can be prohibitively long.

For that reason, the full opinion body is not returned directly in every JSON response. Including it by default would increase response size and overhead for users who only need metadata like the case title, court, decision date, citations, or related case information.

The full opinion is still available through the raw HTML. In the HTML, the opinion body is contained in the #gs_opinion element:

<div id="gs_opinion">

The sample HTML in the companion repo shows the case opinion inside #gs_opinion, including headings, body text, links, page references, blockquotes, and footnotes.

What we’re building

In these examples, we’ll request the Google Scholar Case Law page directly as HTML. If your workflow also needs structured metadata, you can request the JSON response first and then retrieve the raw HTML file linked in that response. The extraction logic is the same once you have the HTML.

The overall workflow is the same in both examples. We’ll walk through it in JavaScript first, then show the equivalent Python version:

Request the Google Scholar Case Law page from SerpApi as raw HTML.
Parse the HTML.
Select the content within the #gs_opinion element.
Convert the opinion HTML to Markdown.
Save the Markdown file locally.

Requirements

To follow along, you’ll need:

A SerpApi account
A SerpApi API key
Node.js for the JavaScript example
Python 3 for the Python example

If this is your first time using SerpApi, you can sign up for a free account and use the included 250 monthly searches to test the examples in this tutorial.

JavaScript example

Let’s start with the JavaScript version. This example requests the Google Scholar Case Law page as HTML, extracts the opinion body, converts it to Markdown, and saves the result locally.

The full JavaScript example is available in the /javascript directory of the companion repository.

Install dependencies

From the javascript directory, install the required packages:

npm install

The dependencies are already listed in the example project’s package.json. This example uses:

serpapi to fetch the raw HTML response from SerpApi.
cheerio to parse the HTML and select the opinion body with familiar CSS-style selectors.
turndown to convert the opinion HTML into Markdown.
dotenv to load your SerpApi API key from a local .env file.

Fetch the raw HTML

We import getHtml from the serpapi package and define the Google Scholar Case Law case_id we want to retrieve:

const { getHtml } = require("serpapi");
require("dotenv").config();

const caseId = "9174924986185145879";

Now we can request the page HTML. The parameters are fairly straightforward:

engine - Set to google_scholar_case_law, the API we are requesting data from.
api_key - Your SerpApi API key, loaded from the environment variable.
case_id - The Google Scholar Case Law case ID for the opinion we want to extract.

getHtml(
  {
    api_key: process.env.SERPAPI_KEY,
    engine: "google_scholar_case_law",
    case_id: caseId,
  },
  (html) => {
    // We'll parse the HTML in the next step.
  }
);

This returns the Google Scholar Case Law page as HTML, which we can then parse and extract the opinion body from.

Parse the opinion body

Once the HTML is returned, load it with Cheerio:

const $ = cheerio.load(html);

Cheerio lets us query the HTML with CSS-style selectors. Since the opinion body is contained in the #gs_opinion element, we can select that element and get its inner HTML:

const opinionHtml = $("#gs_opinion").html();

It’s also worth handling the case where the opinion body is not found:

if (!opinionHtml) {
  console.error("Could not find case opinion in the search results.");
  return;
}

At this point, opinionHtml contains the HTML for the opinion body, including paragraphs, headings, links, blockquotes, page references, and footnotes.

Convert the opinion to Markdown

Next, create a new Turndown service and pass the opinion HTML to turndown():

const turndownService = new TurndownService({
  headingStyle: "atx",
  codeBlockStyle: "fenced",
  strongDelimiter: "**",
  emDelimiter: "*",
  linkStyle: "inlined",
});

const markdown = turndownService.turndown(opinionHtml);

Turndown is a good fit here because we are not trying to manually scrape each paragraph, heading, link, or blockquote. At this point, we already have the opinion body as HTML. We just want to preserve the readable structure in a more portable text format.

Markdown works well for that because it keeps the output readable while still preserving useful formatting like headings, links, paragraphs, blockquotes, bold text, and italics.

Save the Markdown file

Finally, create an output filename and write the Markdown to the shared output directory:

const isoDate = new Date().toISOString().split("T")[0];
const outputPath = path.join(outputDir, `js_${caseId}_${isoDate}.md`);

fs.mkdirSync(outputDir, { recursive: true });
fs.writeFileSync(outputPath, markdown, "utf8");

console.log(`Saved case opinion to ${outputPath}`);

You can certainly tune Turndown further based on your use case and formatting needs. Here's some example output from the JavaScript example using Turndown:

**937 S.W.2d 444 (1996)**

### CONTINENTAL COFFEE PRODUCTS CO. and Allen D. Duff, Petitioners,  
v.  
Juanita CAZAREZ, Respondent.

[No. 95-0827.](/scholar?scidkt=15876565480693424878&as_sdt=2&hl=en)

**Supreme Court of Texas.**

Argued February 14, 1996.

Decided December 13, 1996.

Rehearing Overruled February 21, 1997.

[445](#p445)[\*445](#p445) A. Martin Wickliff, Jr., Barbara L. Johnson, Paul E. Hash, Houston, for Respondent.
...

Python example

The Python version follows the same overall workflow: request the page as HTML, parse the returned HTML, select the #gs_opinion element, convert it to Markdown, and save the result locally.

The full Python example is available in the /python directory of the companion repository.

Install dependencies

From the python directory, install the required packages:

pip install -r requirements.txt

This example uses:

serpapi to fetch the raw HTML response from SerpApi.
beautifulsoup4 to parse the HTML and select the opinion body.
markdownify to convert the opinion HTML into Markdown.
python-dotenv to load your SerpApi API key from a local .env file.

Fetch the raw HTML

First, load the API key from your .env file and make sure it exists:

load_dotenv()

api_key = os.getenv("SERPAPI_KEY")

if not api_key:
    raise RuntimeError("SERPAPI_KEY is required.")

Then request the Google Scholar Case Law page as HTML:

html = serpapi.search(
    api_key=api_key,
    engine="google_scholar_case_law",
    case_id=CASE_ID,
    output="html",
)

The output="html" parameter tells SerpApi to return the Google Scholar Case Law page as HTML instead of JSON.

Parse the opinion body

Next, parse the HTML with Beautiful Soup and select the #gs_opinion element:

soup = BeautifulSoup(html, "html.parser")
opinion = soup.select_one("#gs_opinion")

As in the JavaScript example, it’s worth handling the case where the opinion body is not found:

if not opinion:
    raise ValueError("Could not find case opinion in the search results.")

Convert and save as Markdown

Once we have the opinion body, we can convert it to Markdown with markdownify:

markdown = markdownify(
    str(opinion),
    heading_style="ATX",
    bullets="-",
).strip()

Then create the output directory, generate a filename using the case ID and current date, and write the Markdown file:

OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

iso_date = date.today().isoformat()
output_path = OUTPUT_DIR / f"py_{CASE_ID}_{iso_date}.md"

output_path.write_text(markdown, encoding="utf-8")

print(f"Saved case opinion to {output_path}")

The Python output follows the same structure as the JavaScript example, with Markdown headings, links, page references, and opinion text preserved.

Caveats and edge cases

This workflow is intentionally simple, but there are a few things to keep in mind.

First, the script should fail clearly if the #gs_opinion element is not found. If Google Scholar changes the page structure, or if a specific result does not include an opinion body, you do not want the script to silently save an empty file.

Second, HTML-to-Markdown libraries may handle links, anchors, spacing, and nested elements differently. The output should be reviewed before using it in production workflows, especially if you need to preserve legal citations or page references exactly.

Finally, Google Scholar opinions may include page markers, footnotes, blockquotes, citation links, and other formatting from the original opinion page. Depending on your use case, you may want to preserve those elements, clean them from the final Markdown, or customize the conversion rules further.

View the full code on GitHub

The full JavaScript and Python examples are available in the companion repository:

Each folder includes its own setup instructions and writes generated Markdown files to the repo’s root-level output directory.

Conclusion

SerpApi’s Google Scholar Case Law API gives you structured case law data, while the raw HTML provides access to the full opinion body when needed. By selecting the #gs_opinion element, you can extract the complete opinion and convert it into Markdown for storage, analysis, search indexing, or internal research workflows.

You can use the companion repository to run the JavaScript or Python example locally, then adapt the parsing and Markdown conversion steps for your own case law workflows.

For more info, you can also check out:

How to Extract Full Opinion Text from Google Scholar Case Law with SerpApi

Nathan Skiles

Why use SerpApi?

Why the opinion body is in the raw HTML

What we’re building

Requirements

JavaScript example

Install dependencies

Fetch the raw HTML

Parse the opinion body

Convert the opinion to Markdown

Save the Markdown file

Python example

Install dependencies

Fetch the raw HTML

Parse the opinion body

Convert and save as Markdown

Caveats and edge cases

View the full code on GitHub

Conclusion

Free Plan · 250 searches / month

APIs

Easy Integrations

Features

Use Cases

Company

Pricing

Why use SerpApi?

Why the opinion body is in the raw HTML

What we’re building

Requirements

JavaScript example

Install dependencies

Fetch the raw HTML

Parse the opinion body

Convert the opinion to Markdown

Save the Markdown file

Python example

Install dependencies

Fetch the raw HTML

Parse the opinion body

Convert and save as Markdown

Caveats and edge cases

View the full code on GitHub

Conclusion

Free Plan · 250 searches / month