How to Build a Smart Voice Assistant in 20 mins

Playback speed

Share post at current time

Share from 0:00

0:00

How to Build a Smart Voice Assistant in 20 mins

Introduction to LangChain

Damien Benveniste

Sep 21, 2023

Today, I felt too lazy to type with my keyboard to interact with ChatGPT, so I thought it was a good idea to interact with it with my voice.

I am going to show how to build an intelligent voice assistant. We are going to build the interface to speak to ChatGPT and listen to its responses. And we are going to augment our assistant by giving it access to the Google Search Engine:

What are we building
Setting up the project
From speech to text
From text to speech
Building a Conversational Agent
Augmenting the Agent with Tools

Below are the code and images used in the video!

What are we building

Let’s build an intelligent voice assistant

Setting up the project

We create a virtual environment

python -m venv ./env

and we activate it

source ./env/bin/activate

We now create three empty files: agents.py, interface.py, and app.py. Additionally, we add the .env file

voice assistant/
├── docs/
├── env/
├── src/
│   ├── agents.py
│   ├── interface.py
│   ├── app.py
├── .env

From speech to text

Let’s build the listen function of the AudioInterface class

# interface.py

import os
import speech_recognition as sr

class AudioInterface:

    def listen(self) -> str:
        recognizer = sr.Recognizer()
        with sr.Microphone() as source:
            print("Say something!")
            audio = recognizer.listen(source)

        text = recognizer.recognize_whisper_api(
            audio, 
            api_key=os.environ['OPENAI_API_KEY'],
        )

        return text

We install the necessary packages:

pip install SpeechRecognition PyAudio openai

From text to speech

We use ElevenLabs to synthesize speech. We can create the speak function

# interface.py

import os
from elevenlabs import generate, play, set_api_key

set_api_key(os.environ['ELEVEN_API_KEY'])

class AudioInterface:
    def listen(self) -> str:
        ...

    def speak(self, text):
        audio = generate(
            text=text, 
            voice='Bella', 
            model='eleven_monolingual_v1'
        )
        play(audio)

Building a Conversational Agent

Let’s create a simple conversational agent

# agents.py

from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI

class ConversationAgent:

    def __init__(self) -> None:
        self.llm = ChatOpenAI()
        self.chain = ConversationChain(llm=self.llm)

    def run(self, text):
        return self.chain.run(text)

And let’s run the application

# app.py

from dotenv import load_dotenv
load_dotenv()

from interface import AudioInterface
from agents import ConversationAgent

interface = AudioInterface()
agent = ConversationAgent()

while True:
    text = interface.listen()
    response = agent.run(text)
    interface.speak(response)

Augmenting the Agent with Tools

Let’s give ChatGPT access to Google Search, so we need the API key. Follow the steps to get those:

Go to the Google Cloud Console.
If you don't already have an account, create one and log in
Create a new project by clicking on the Select a Project dropdown at the top of the page and clicking New Project
Give it a name and click Create
Set up a custom search API and add it to your .env file:
1. Go to the APIs & Services Dashboard
2. Click Enable APIs and Services
3. Search for Custom Search API and click on it
4. Click Enable
5. Go to the Credentials page
6. Click Create Credentials
7. Choose API Key
8. Copy the API key
Enable the Custom Search API on your project (it might need to wait a few minutes to propagate.) Set up a custom search engine and add it to your .env file:
1. Go to the Custom Search Engine page
2. Click Add
3. Set up your search engine by following the prompts. You can choose to search the entire web or specific sites.
4. Once you've created your search engine, click on Control Panel
5. Click Basics
6. Copy the Search engine ID

We need to install the Google Python client

pip install google-api-python-client

Let’s create an agent with access to Google Search

# agents.py

from langchain.chat_models import ChatOpenAI
from langchain.agents import (
    AgentType, 
    load_tools, 
    initialize_agent
)
from langchain.memory import ConversationBufferMemory
from langchain.callbacks import StdOutCallbackHandler    

class SmartChatAgent:

    def __init__(self) -> None:

        self.memory = ConversationBufferMemory(
            memory_key="chat_history", 
            return_messages=True
        )

        self.llm = ChatOpenAI()
        self.tools = load_tools(['google-search'])

        self.agent = initialize_agent(
            self.tools, 
            self.llm, 
            agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION, 
            memory=self.memory,
            verbose=True,
        )

    def run(self, text):
        handler = StdOutCallbackHandler()
        return self.agent.run(text, callbacks=[handler])

We modify the application

# app.py

from dotenv import load_dotenv
load_dotenv()

from interface import AudioInterface
from agents import SmartChatAgent 

interface = AudioInterface()
agent = SmartChatAgent()

while True:
    text = interface.listen()
    response = agent.run(text)
    interface.speak(response)

How to Build a Smart Voice Assistant in 20 mins

What are we building

Setting up the project

From speech to text

From text to speech

Building a Conversational Agent

Augmenting the Agent with Tools

Discussion about this video