Build Simple CLI-Based Voice Assistant with PyAudio, Speech Recognition, pyttsx3 and SerpApi

Intro

As you saw by the title, this is a demo project that shows a very basic voice-assistant script that can answer your questions in the terminal based on Google Search results.

You can find the full code in the GitHub repository: dimitryzub/serpapi-demo-projects/speech-recognition/cli-based/

The follow-up blog post(s) will be about:

Web-based solution using Flask, some HTML, CSS and Javascript.
Android & Windows based solution using Flutter and Dart.

What we will build

Prerequisites

First, let's make sure we are in a different environment and properly install the libraries we need for the project. The hardest (possibly) will be to install pyaudio.

Virtual Environment and Libraries Installation

Before we start installing libraries, we need create and activate new environment for this project:

# if you're on Linux based systems
$ python -m venv env && source env/bin/activate
$ (env) <path>

# if you're on Windows and using Bash terminal
$ python -m venv env && source env/Scripts/activate
$ (env) <path>

# if you're on Windows and using CMD
python -m venv env && .\env\Scripts\activate
$ (env) <path>

	Explanation
`python -m venv env`	tells Python to run module (`-m`) `venv` and create a folder called `env`.
`&&`	Stands for AND.
`source <venv_name>/bin/activate`	will activate your environment and you'll be able to install libraries only in that environment.

Now install all needed libraries:

pip install rich pyttsx3 SpeechRecognition google-search-results

Now to pyaudio. Please, keep in mind that pyaudio may throw an error while installing. An additional research may be needed on your end.

If you're on Linux, we need to install some development dependencies to use pyaudio:

$ sudo apt-get install -y libasound-dev portaudio19-dev
$ pip install pyaudio

If you're on Windows, it's simpler (tested with CMD and Git Bash):

pip install pyaudio

Full Code

import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv

load_dotenv('.env')
console = Console()

def main():
    console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')
    
    recognizer = speech_recognition.Recognizer()

    while True:
        with console.status(status='Listening you...', spinner='point') as progress_bar:
            try:
                with speech_recognition.Microphone() as mic:
                    recognizer.adjust_for_ambient_noise(mic, duration=0.1)
                    audio = recognizer.listen(mic)
                    
                    text = recognizer.recognize_google(audio_data=audio).lower()
                    console.print(f'[bold]Recognized text[/bold]: {text}')
                    
                    progress_bar.update(status='Looking for answers...', spinner='line')
                    params = {
                        'api_key': os.getenv('API_KEY'),
                        'device': 'desktop',
                        'engine': 'google',
                        'q': text,
                        'google_domain': 'google.com',
                        'gl': 'us',
                        'hl': 'en'
                    }

                    search = GoogleSearch(params)
                    results = search.get_dict()
                    
                    try:
                        if 'answer_box' in results:
                            try:
                                primary_answer = results['answer_box']['answer']
                            except:
                                primary_answer = results['answer_box']['result']
                            console.print(f'[bold]The answer is[/bold]: {primary_answer}')
                            
                        elif 'knowledge_graph' in results:
                            secondary_answer = results['knowledge_graph']['description']
                            console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
                        else:
                            tertiary_answer = results['answer_box']['list']
                            console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')
                        progress_bar.stop() # if answered is success -> stop progress bar.
                        
                        user_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')
                        
                        if user_promnt_to_contiune_if_answer_is_success == 'y':
                            recognizer = speech_recognition.Recognizer()
                            continue # run speech recognizion again until `user_promt` == 'n'
                        else:
                            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                            break
                    except KeyError:
                        progress_bar.stop()
                        
                        error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")
                        
                        if error_user_promt == 'y':
                            recognizer = speech_recognition.Recognizer()
                            continue # run speech recognizion again until `user_promt` == 'n'
                        else:
                            console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                            break
                            
            except speech_recognition.UnknownValueError:
                progress_bar.stop()
                user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')
                
                if user_promt_to_continue == 'y':
                    recognizer = speech_recognition.Recognizer()
                    continue # run speech recognizion again until `user_promt` == 'n'
                else:
                    progress_bar.stop()
                    console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                    break

                
if __name__ == '__main__':
    main()

Code Explanation

Import libraries:

import os
import speech_recognition
import pyttsx3
from serpapi import GoogleSearch
from rich.console import Console
from dotenv import load_dotenv

Library	Purpose
`rich`	Python library for beautiful formatting in the terminal.
`pyttsx3`	Python's Text-to-speech converter that works in offline.
`SpeechRecognition`	Python library to convert speech to text.
`google-search-results`	SerpApi's Python API wrapper that parses data from 15+ search engines.
`os`	To read secret environment variable. In this case it's SerpApi API key.
`dotenv`	To load your environment variable(s) (SerpApi API key) from `.env` file. `.env` file could renamed to any file: `.napoleon` `.` (dot) represents a environment variable file.

Define rich Console(). It will be used to prettify terminal output (animations, etc):

console = Console()

Define main function where all will be happening:

def main():
    console.rule('[bold yellow]SerpApi Voice Assistant Demo Project')

    recognizer = speech_recognition.Recognizer()

At the beginning of the function we're defining speech_recognition.Recognizer() and console.rule will create the following output:

───────────────────────────────────── SerpApi Voice Assistant Demo Project ─────────────────────────────────────

The next step is to create a while loop that will be constantly listening for microphone input to recognize the speech:

while True:
    with console.status(status='Listening you...', spinner='point') as progress_bar:
        try:
            with speech_recognition.Microphone() as mic:
                recognizer.adjust_for_ambient_noise(mic, duration=0.1)
                audio = recognizer.listen(mic)
                
                text = recognizer.recognize_google(audio_data=audio).lower()
                console.print(f'[bold]Recognized text[/bold]: {text}')

Code	Explanation
`console.status`	A `rich` progress bar, it's used only for cosmetic purpose.
`speech_recognition.Microphone()`	To start picking input from the microphone.
`recognizer.adjust_for_ambient_noise`	Intended to calibrate the energy threshold with the ambient energy level.
`recognizer.listen`	To listen for actual user text.
`recognizer.recognize_google`	Performs speech recognition using Google Speech Recongition API. `lower()` is to lower recognized text.
`console.print`	A `rich` `print` statement that allows to use text modification, such as adding bold, italic and so on.

spinner='point' will produce the following output (use python -m rich.spinner to see list of spinners):

After that, we need to initialize SerpApi search parameters for the search:

progress_bar.update(status='Looking for answers...', spinner='line') 
params = {
    'api_key': os.getenv('API_KEY'),  # serpapi api key   
    'device': 'desktop',              # device used for 
    'engine': 'google',               # serpapi parsing engine: https://serpapi.com/status
    'q': text,                        # search query 
    'google_domain': 'google.com',    # google domain:          https://serpapi.com/google-domains
    'gl': 'us',                       # country of the search:  https://serpapi.com/google-countries
    'hl': 'en'                        # language of the search: https://serpapi.com/google-languages
    # other parameters such as locations: https://serpapi.com/locations-api
}

search = GoogleSearch(params)         # where data extraction happens on the SerpApi backend
results = search.get_dict()           # JSON -> Python dict

progress_bar.update will, well, update progress_bar with a new status (text printed in the console), and spinner='line' will produce the following animation:

After that, the data extraction happens from Google search using SerpApi's Google Search Engine API.

The following part of the code will do the following:

try:
    if 'answer_box' in results:
        try:
            primary_answer = results['answer_box']['answer']
        except:
            primary_answer = results['answer_box']['result']
        console.print(f'[bold]The answer is[/bold]: {primary_answer}')

    elif 'knowledge_graph' in results:
        secondary_answer = results['knowledge_graph']['description']
        console.print(f'[bold]The answer is[/bold]: {secondary_answer}')
    else:
        tertiary_answer = results['answer_box']['list']
        console.print(f'[bold]The answer is[/bold]: {tertiary_answer}')
    progress_bar.stop()  # if answered is success -> stop progress bar

    user_promnt_to_contiune_if_answer_is_success = input('Would you like to to search for something again? (y/n) ')

    if user_promnt_to_contiune_if_answer_is_success == 'y':
        recognizer = speech_recognition.Recognizer()
        continue         # run speech recognizion again until `user_promt` == 'n'
    else:
        console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
        break
except KeyError:
    progress_bar.stop()  # if didn't found the answer -> stop progress bar

    error_user_promt = input("Sorry, didn't found the answer. Would you like to rephrase it? (y/n) ")

    if error_user_promt == 'y':
        recognizer = speech_recognition.Recognizer()
        continue         # run speech recognizion again until `user_promt` == 'n'
    else:
        console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
        break

The final step is to handle error when no sound was picked up from the microphone:

# while True:
#     with console.status(status='Listening you...', spinner='point') as progress_bar:
#         try:
            # speech recognition code
            # data extraction code
        except speech_recognition.UnknownValueError:
                progress_bar.stop()         # if didn't heard the speech -> stop progress bar
                user_promt_to_continue = input('Sorry, not quite understood you. Could say it again? (y/n) ')

                if user_promt_to_continue == 'y':
                    recognizer = speech_recognition.Recognizer()
                    continue               # run speech recognizion again until `user_promt` == 'n'
                else:
                    progress_bar.stop()    # if want to quit -> stop progress bar
                    console.rule('[bold yellow]Thank you for cheking SerpApi Voice Assistant Demo Project')
                    break

console.rule() will provide the following output:

───────────────────── Thank you for cheking SerpApi Voice Assistant Demo Project ──────────────────────

Add if __name__ == '__main__' idiom which protects users from accidentally invoking the some script(s) when they didn't intend to, and call the main function which will run the whole script:

if __name__ == '__main__':
    main()

Links

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞