Today, I felt too lazy to type with my keyboard to interact with ChatGPT, so I thought it was a good idea to interact with it with my voice.
I am going to show how to build an intelligent voice assistant. We are going to build the interface to speak to ChatGPT and listen to its responses. And we are going to augment our assistant by giving it access to the Google Search Engine:
What are we building
Setting up the project
From speech to text
From text to speech
Building a Conversational Agent
Augmenting the Agent with Tools
Below are the code and images used in the video!
What are we building
Let’s build an intelligent voice assistant
Setting up the project
We create a virtual environment
python -m venv ./env
and we activate it
source ./env/bin/activate
We now create three empty files: agents.py, interface.py, and app.py. Additionally, we add the .env
file
voice assistant/
├── docs/
├── env/
├── src/
│ ├── agents.py
│ ├── interface.py
│ ├── app.py
├── .env
From speech to text
Let’s build the listen function of the AudioInterface class
# interface.py
import os
import speech_recognition as sr
class AudioInterface:
def listen(self) -> str:
recognizer = sr.Recognizer()
with sr.Microphone() as source:
print("Say something!")
audio = recognizer.listen(source)
text = recognizer.recognize_whisper_api(
audio,
api_key=os.environ['OPENAI_API_KEY'],
)
return text
We install the necessary packages:
pip install SpeechRecognition PyAudio openai
From text to speech
We use ElevenLabs to synthesize speech. We can create the speak function
# interface.py
import os
from elevenlabs import generate, play, set_api_key
set_api_key(os.environ['ELEVEN_API_KEY'])
class AudioInterface:
def listen(self) -> str:
...
def speak(self, text):
audio = generate(
text=text,
voice='Bella',
model='eleven_monolingual_v1'
)
play(audio)
Building a Conversational Agent
Let’s create a simple conversational agent
# agents.py
from langchain.chains import ConversationChain
from langchain.chat_models import ChatOpenAI
class ConversationAgent:
def __init__(self) -> None:
self.llm = ChatOpenAI()
self.chain = ConversationChain(llm=self.llm)
def run(self, text):
return self.chain.run(text)
And let’s run the application
# app.py
from dotenv import load_dotenv
load_dotenv()
from interface import AudioInterface
from agents import ConversationAgent
interface = AudioInterface()
agent = ConversationAgent()
while True:
text = interface.listen()
response = agent.run(text)
interface.speak(response)
Augmenting the Agent with Tools
Let’s give ChatGPT access to Google Search, so we need the API key. Follow the steps to get those:
Go to the Google Cloud Console.
If you don't already have an account, create one and log in
Create a new project by clicking on the Select a Project dropdown at the top of the page and clicking New Project
Give it a name and click Create
Set up a custom search API and add it to your .env file:
Go to the APIs & Services Dashboard
Click Enable APIs and Services
Search for Custom Search API and click on it
Click Enable
Go to the Credentials page
Click Create Credentials
Choose API Key
Copy the API key
Enable the Custom Search API on your project (it might need to wait a few minutes to propagate.) Set up a custom search engine and add it to your .env file:
Go to the Custom Search Engine page
Click Add
Set up your search engine by following the prompts. You can choose to search the entire web or specific sites.
Once you've created your search engine, click on Control Panel
Click Basics
Copy the Search engine ID
We need to install the Google Python client
pip install google-api-python-client
Let’s create an agent with access to Google Search
# agents.py
from langchain.chat_models import ChatOpenAI
from langchain.agents import (
AgentType,
load_tools,
initialize_agent
)
from langchain.memory import ConversationBufferMemory
from langchain.callbacks import StdOutCallbackHandler
class SmartChatAgent:
def __init__(self) -> None:
self.memory = ConversationBufferMemory(
memory_key="chat_history",
return_messages=True
)
self.llm = ChatOpenAI()
self.tools = load_tools(['google-search'])
self.agent = initialize_agent(
self.tools,
self.llm,
agent=AgentType.CHAT_CONVERSATIONAL_REACT_DESCRIPTION,
memory=self.memory,
verbose=True,
)
def run(self, text):
handler = StdOutCallbackHandler()
return self.agent.run(text, callbacks=[handler])
We modify the application
# app.py
from dotenv import load_dotenv
load_dotenv()
from interface import AudioInterface
from agents import SmartChatAgent
interface = AudioInterface()
agent = SmartChatAgent()
while True:
text = interface.listen()
response = agent.run(text)
interface.speak(response)
How to Build a Smart Voice Assistant in 20 mins