Scraping YouTube video transcripts is a powerful way to unlock deeper content analysis like searching for specific words, summarizing video content, and analyzing sentiment across videos.
SerpApi offers a YouTube Video Transcript API which you can use to scrape the video transcript. Let's see how you can use it with Python and then do some summarization of the resulting transcript using llama3.
Why SerpApi?
SerpApi streamlines the process of web-scraping. We take care of proxies and any CAPTCHAs that might be encountered, so that you don't have to worry about your searches being blocked. If you were to implement your own scraper using tools like BeautifulSoup and Requests, you'd need to determine your own solution for this. SerpApi manages the intricacies of scraping and returns structured JSON results. This allows you to save time and effort by avoiding the need to build your own Google scraper or rely on other web scraping tools.
We also do all the work to maintain all of our parsers and adapt them to respond to changes on Google's side. This is important, as Google is constantly experimenting with new layouts, new elements, and other changes. By taking care of this for you on our side, we eliminate a lot of time and complexity from your workflow.
Getting Started: YouTube Video Transcript API
Let's take a look at the parameters our Youtube Video Transcript API supports:
v: Parameter defines the Video ID, it can be found in the URL of the video as youtu.be/video_id or youtube.com/watch?v=video_id
language_code: Parameter defines the language to use for the YouTube video transcript. It accepts a language code, which may be a two-letter or extended code (e.g., en for English, es-ES for Spanish (Spain), or zh-Hans for Simplified Chinese). If no language is provided, the default language will be English (en). If the requested language code is not available for the video, the first available language for the transcript will be used instead. Head to the YouTube Video Transcript Languages page for a full list of supported YouTube Video Transcript languages.
title: Parameter is used to get the specific transcript using the transcript title, e.g. Twitch Chat - Simple.
type: Parameter is used to get the transcript type. E.g: asr for automatic speech recognition (auto-generated) transcript.
To begin extracting transcript data, first, create a free account on serpapi.com. You'll receive 250 free search credits each month to explore our APIs.
- Get your SerpApi API Key from this page.
- [Optional but Recommended] Set your API key in an environment variable, instead of directly pasting it in the code. Refer here to understand more about using environment variables. For this tutorial, I have saved the API key in an environment variable named '
SERPAPI_API_KEY'. - Next, on your local computer, you need to install the
google-search-resultsPython library:pip install google-search-results
As a side note: You can use this library to scrape search results from other search engine pages and other search engines, not just Youtube.
More Information About Our Python Libraries
We have two separate Python libraries serpapi and google-search-results, and both work perfectly fine. However, serpapi is a new one, and all the examples you can find on our website are from the old one google-search-results. If you'd like to use our Python library with all the examples from our website, you should install the google-search-results module instead of serpapi.
For this blog post, I am using google-search-results because all of our documentation references this one.
You may encounter issues if you have both libraries installed at the same time. If you have the old library installed and want to proceed with using our new library, please follow these steps:
- Uninstall
google-search-resultsmodule from your environment. - Make sure that neither
serpapinorgoogle-search-resultsare installed at that stage. - Install
serpapimodule, for example with the following command if you're usingpip:pip install serpapi
Then head to a code editor, and import the necessary libraries to use the Google Events API:
from serpapi import GoogleSearch
import os, json
load_dotenv()Following this, we can construct a search query and add the necessary parameters for making a request to the API. I've written a simple function to do that:
def get_transcript_from_youtube(video_id):
params = {
"api_key": os.environ["SERPAPI_API_KEY"],
"engine": "youtube_video_transcript",
"v": video_id,
"language_code": "en"
}
search = GoogleSearch(params)
results = search.get_dict()
return results.get("transcript", "")Running this code with video ID:ksWFcUW_KA8 outputs a JSON response with a transcript like this:
"transcript": [
{
"start_ms": 2086,
"end_ms": 3492,
"snippet": "Can you see them anymore?",
"start_time_text": "0:02"
},
{
"start_ms": 3517,
"end_ms": 4517,
"snippet": "No.",
"start_time_text": "0:03"
},
{
"start_ms": 7278,
"end_ms": 8279,
"snippet": "We've got to go.",
"start_time_text": "0:07"
},
{
"start_ms": 24062,
"end_ms": 25796,
"snippet": "This is it. We're going in.",
"start_time_text": "0:24"
},
{
"start_ms": 31143,
"end_ms": 32439,
"snippet": "- Mom!\n- Help!",
"start_time_text": "0:31"
},
{
"start_ms": 34446,
"end_ms": 36204,
"snippet": "- Mom!\n- Help, Mom!",
"start_time_text": "0:34"
},
{
"start_ms": 37056,
"end_ms": 38056,
"snippet": "Now, do you see?",
"start_time_text": "0:37"
},
{
"start_ms": 38391,
"end_ms": 42767,
"snippet": "You tied your family to this twisted world, and now, one can't exist without the other.",
"start_time_text": "0:38"
},
{
"start_ms": 42792,
"end_ms": 44085,
"snippet": "- Vision!\n- Wanda!",
"start_time_text": "0:42"
},
{
"start_ms": 47296,
"end_ms": 49335,
"snippet": "- Mom!\n- Mom!",
"start_time_text": "0:47"
},
{
"start_ms": 50132,
"end_ms": 51132,
"snippet": "Boys!",
"start_time_text": "0:50"
},
{
"start_ms": 51340,
"end_ms": 53371,
"snippet": "Save Westview or save your family.",
"start_time_text": "0:51"
},
{
"start_ms": 53395,
"end_ms": 55056,
"snippet": "- Help!\n- Mom!",
"start_time_text": "0:53"
},
{
"start_ms": 55081,
"end_ms": 57500,
"snippet": "- Mom! Help!\n- Help! Please!",
"start_time_text": "0:55"
},
{
"start_ms": 72467,
"end_ms": 74662,
"snippet": "- Hi!\n- Mom! Are you okay?",
"start_time_text": "1:12"
},
{
"start_ms": 76845,
"end_ms": 77888,
"snippet": "No!",
"start_time_text": "1:16"
},
{
"start_ms": 92041,
"end_ms": 94275,
"snippet": "Mom? Are you okay?",
"start_time_text": "1:32"
},
{
"start_ms": 95251,
"end_ms": 97040,
"snippet": "How sweet.",
"start_time_text": "1:35"
},
{
"start_ms": 101804,
"end_ms": 102804,
"snippet": "Dad?",
"start_time_text": "1:41"
},
{
"start_ms": 112892,
"end_ms": 113892,
"snippet": "Listen, boys.",
"start_time_text": "1:52"
},
{
"start_ms": 114714,
"end_ms": 116877,
"snippet": "Your mother and I never really prepared you for this.",
"start_time_text": "1:54"
},
{
"start_ms": 120963,
"end_ms": 123118,
"snippet": "But you were born for it.",
"start_time_text": "2:00"
},
{
"start_ms": 128699,
"end_ms": 130878,
"snippet": "Same story, different century.",
"start_time_text": "2:08"
},
{
"start_ms": 131678,
"end_ms": 135911,
"snippet": "There'll always be torches and pitchforks for ladies like us, Wanda.",
"start_time_text": "2:11"
},
{
"start_ms": 145115,
"end_ms": 147247,
"snippet": "Boys, handle the military.",
"start_time_text": "2:25"
},
{
"start_ms": 148415,
"end_ms": 150125,
"snippet": "Mommy will be right back.",
"start_time_text": "2:28"
},
{
"start_ms": 159227,
"end_ms": 161086,
"snippet": "No, no, no! Stand down!",
"start_time_text": "2:39"
},
{
"start_ms": 180384,
"end_ms": 181384,
"snippet": "Nice tricks.",
"start_time_text": "3:00"
},
{
"start_ms": 181666,
"end_ms": 182666,
"snippet": "I like yours, too.",
"start_time_text": "3:01"
},
{
"start_ms": 192104,
"end_ms": 193268,
"snippet": "Have fun in prison.",
"start_time_text": "3:12"
},
{
"start_ms": 194319,
"end_ms": 195403,
"snippet": "- Dad!\n- Dad!",
"start_time_text": "3:14"
},
{
"start_ms": 196802,
"end_ms": 199099,
"snippet": "- Hi! Hi!\n- Boys, boys, boys, boys!",
"start_time_text": "3:16"
},
{
"start_ms": 199124,
"end_ms": 200562,
"snippet": "- Dad! Mom!\n- Mom!",
"start_time_text": "3:19"
},
{
"start_ms": 202061,
"end_ms": 203303,
"snippet": "- Come here.\n- Are you okay?",
"start_time_text": "3:22"
},
{
"start_ms": 206718,
"end_ms": 208512,
"snippet": "I know you'll set everything right.",
"start_time_text": "3:26"
},
{
"start_ms": 208537,
"end_ms": 209562,
"snippet": "Just not for us.",
"start_time_text": "3:28"
},
{
"start_ms": 209738,
"end_ms": 210757,
"snippet": "Not for us.",
"start_time_text": "3:29"
},
{
"start_ms": 212083,
"end_ms": 213083,
"snippet": "It's time.",
"start_time_text": "3:32"
},
{
"start_ms": 213295,
"end_ms": 214349,
"snippet": "Should we head home?",
"start_time_text": "3:33"
}
]It also provides a chapter breakdown like this for videos which include chapters:
"chapters": [
{
"chapter": "Billy Has a Vision of Wanda Fighting Agatha",
"start_ms": 0,
"end_ms": 11000
},
{
"chapter": "Wanda Weakens the Hex",
"start_ms": 11000,
"end_ms": 27000
},
{
"chapter": "Vision, Billy, and Tommy Begin Disintegrating",
"start_ms": 27000,
"end_ms": 58000
},
{
"chapter": "Wanda Reasserts the Hex to Save Her Family",
"start_ms": 58000,
"end_ms": 77000
},
{
"chapter": "Agatha Draws Wanda's Magical Energy",
"start_ms": 77000,
"end_ms": 99000
},
{
"chapter": "S.W.O.R.D. Agents Arrive",
"start_ms": 99000,
"end_ms": 144000
},
{
"chapter": "Billy and Tommy vs. S.W.O.R.D. Agents",
"start_ms": 144000,
"end_ms": 159000
},
{
"chapter": "Monica Rambeau Uses Her Power to Save Billy and Tommy",
"start_ms": 159000,
"end_ms": 194000
},
{
"chapter": "Wanda Takes Her Family Home and Closes the Hex",
"start_ms": 194000,
"end_ms": 267000
}
]Try it out in our playground to visualize the response and try other parameters:
Analysis With Transcript Data
Once you have the transcript data, you can do a number of things like time stamping keywords, sentiment analysis, content summarization or identifying key topics.
In this blog post, we are going to summarize the transcript text using the open source Ollama phi3 model.
Install and Use Ollama For Summarization
We're first going to install Python's Ollama library locally:
pip install ollamaThen, we can import the required parts from it:
from ollama import chat
from ollama import ChatResponseSummarize text with the model
Following that, we can write up some code to use that model for summarization. I have written up a simple function to sent the prompt to Ollama and get a response:
def summarize_text(text):
response: ChatResponse = chat(model='llama3', messages=[
{
'role': 'user',
'content': f"Summarize the following text:\n\n{text}\n\nSummary:",
},
])
return response['message']['content']In this case, we are using their phi3 model, so I've specified that above. You can use any model such as llama3, gemma3, among others.
I also wrote a simple main function to call the two functions above:
# --- Main Execution ---
if __name__ == "__main__":
video_id = "ksWFcUW_KA8" # Replace with your YouTube video ID
transcript = get_transcript_from_youtube(video_id)
if not transcript:
print("No transcript found for the video.")
else:
summary = summarize_text(transcript)
print("Transcript Summary:", summary)Testing With An Example Video
Let's take this video as an example:
Using the video ID: epMDcqKoQys, we can get results for this using SerpApi's API. Here are the results from our Youtube Video Transcript API in the playground:
Here are the results from the get_transcript_from_youtube() function we wrote above:
[
{
"start_ms": 2400,
"end_ms": 4826,
"snippet": "All right, so I'm having the first espresso of the day",
"start_time_text": "0:02"
},
{
"start_ms": 4826,
"end_ms": 7610,
"snippet": "and I'm here at Vecerka in Brno with our friend Tomi,",
"start_time_text": "0:04"
},
{
"start_ms": 7610,
"end_ms": 10725,
"snippet": "who is a skilled barista and also co-owner of the place",
"start_time_text": "0:07"
},
{
"start_ms": 10725,
"end_ms": 12725,
"snippet": "we'll explain prepare and",
"start_time_text": "0:10"
},
{
"start_ms": 12800,
"end_ms": 15000,
"snippet": "show you all the espresso drinks on the menu, ",
"start_time_text": "0:12"
},
{
"start_ms": 15000,
"end_ms": 16882,
"snippet": "so check it out!",
"start_time_text": "0:15"
},
{
"start_ms": 16882,
"end_ms": 18720,
"snippet": "We'll break down all espresso drinks",
"start_time_text": "0:16"
},
{
"start_ms": 18720,
"end_ms": 20160,
"snippet": "into two categories. ",
"start_time_text": "0:18"
},
...
...
...
]Now, sending this raw transcript to ollama with the model set to llama3 for summarization gives us:
Transcript Summary: The text appears to be a video script or presentation about different types of espresso-based drinks. The presenter goes through various coffee drinks, including black coffee, milk coffee, and specialty drinks like Cappuccino, Latte, and Macchiato. They provide information on the start and end times (in milliseconds) for each section of the video.
The main points discussed in the video include:
* The most popular coffee drinks at Večerka: 32% black coffee, 68% milk coffee
* The most popular black coffee drink: Batch Brew with 16%
* The most popular milk coffee drink: Cappuccino with over 37%
The video also mentions other espresso-based drinks that were not discussed in detail, such as Espresso Tonic and Freddo Espresso.If we change the model to phi3, the summary changes to:
Transcript Summary: In Večernka coffee shop in Slovakia this year, the most popular drink is Cappuccino with over 37% of orders. This beats black coffees combined which had only around 25%. The second and third place were taken by Flat White (16%) and Espresso Tonic/Freddo Espresso videos respectively, while all other mentioned espresso-based drinks like Macchiato or Americano have a significantly lower popularity. Milk coffees as per the order are quite popular with Cortado and Freddo Espresso getting around 1 to 2% of orders each but still outnumber black coffee consumption by almost double in Večernka's annual report for this year, totaling over half (58%) vs just under a quarter that went towards espresso-based drinks. The rest had very similar popularity with less than 10%.You can also ask it different questions like:
What did they conclude was the most popular coffee drink?
The response from the model for that is:
According to the text, the most popular coffee drink at Večerka is Cappuccino, with over 37% of all orders.Feel free to experiment and analyze with different models, and pick the responses from the one most suitable to your use case.
Conclusion
We've covered how to scrape transcripts for videos using SerpApi's Youtube Transcript API and Python and analyze them using Ollama's llama3 and phi3 models.
You can find all the code used in this blog post here:
I hope you found this tutorial helpful. If you have any questions, don't hesitate to reach out to me at sonika@serpapi.com.
Relevant Links
Documentation
Related Posts





