Imagine having the ability to transcribe your voice calls. Look no further because we’ll learn how to do that in this article by combining Twilio with Deepgram.
With Twilio, we can use one of their phone numbers to receive and record incoming calls and get a transcript using the Deepgram Speech Recognition API. We’ll use the Deepgram Python SDK in this example.
Here’s a snapshot of what we’ll see in the browser after making the phone call and using Deepgram voice-to-text.
Getting Started
Before we start, it’s essential to generate a Deepgram API key to use in our project. We can go to our Deepgram console. Make sure to copy it and keep it in a safe place, as you won’t be able to retrieve it again and will have to create a new one. In this tutorial, we’ll use Python 3.10, but Deepgram supports some earlier versions of Python.
Make sure to go to Twilio and sign up for an account. We’ll need to purchase a phone number with voice capabilities.
We’ll also need two phones to make the outgoing call and another to receive a call.
In our project, we’ll use Ngrok, which provides a temporary URL that will act as the webhook in our application. Ngrok will forward requests to our application that is running locally. You can download it here.
Next, let’s make a directory anywhere we’d like.
mkdir deepgram-twiliomkdir deepgram-twiliomkdir deepgram-twilio
Enter fullscreen mode Exit fullscreen mode
Then change into that directory so we can start adding things to it.
cd deepgram-twiliocd deepgram-twiliocd deepgram-twilio
Enter fullscreen mode Exit fullscreen mode
We’ll also need to set up a virtual environment to hold our project and its dependencies. We can read more about those here and how to create one.
Panel with important note
It’s recommended in Python to use a virtual environment so our project can be installed inside a container rather than installing it system-wide.
Ensure our virtual environment is activated because we’ll install dependencies inside. If our virtual environment is named venv
, then activate it.
source venv/bin/activatesource venv/bin/activatesource venv/bin/activate
Enter fullscreen mode Exit fullscreen mode
Let’s install our dependencies for our project by running the below pip
installs from our terminal inside our virtual environment.
pip install deepgram-sdkpip install twiliopip install python-dotenvpip install Flaskpip install 'flask[async]'pip install pysondbpip install deepgram-sdk pip install twilio pip install python-dotenv pip install Flask pip install 'flask[async]' pip install pysondbpip install deepgram-sdk pip install twilio pip install python-dotenv pip install Flask pip install 'flask[async]' pip install pysondb
Enter fullscreen mode Exit fullscreen mode
Now we can open up our favorite editor and create a file called deepgram-twilio-call.py
. If you’d like to make it from the command line, do this:
touch deepgram-twilio-call.pytouch deepgram-twilio-call.pytouch deepgram-twilio-call.py
Enter fullscreen mode Exit fullscreen mode
The Code
Now to the fun part! Open our script called deepgram-twilio-call.py
and add the following code to make sure our Flask application runs without errors:
<span>from</span> <span>flask</span> <span>import</span> <span>Flask</span><span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span><span>@</span><span>app</span><span>.</span><span>get</span><span>(</span><span>"/"</span><span>)</span><span>def</span> <span>hello</span><span>():</span><span>return</span> <span>"Hello World!"</span><span>if</span> <span>__name__</span> <span>==</span> <span>"__main__"</span><span>:</span><span>app</span><span>.</span><span>run</span><span>()</span><span>from</span> <span>flask</span> <span>import</span> <span>Flask</span> <span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span> <span>@</span><span>app</span><span>.</span><span>get</span><span>(</span><span>"/"</span><span>)</span> <span>def</span> <span>hello</span><span>():</span> <span>return</span> <span>"Hello World!"</span> <span>if</span> <span>__name__</span> <span>==</span> <span>"__main__"</span><span>:</span> <span>app</span><span>.</span><span>run</span><span>()</span>from flask import Flask app = Flask(__name__) @app.get("/") def hello(): return "Hello World!" if __name__ == "__main__": app.run()
Enter fullscreen mode Exit fullscreen mode
Run our Flask application by typing this into the terminal python deepgram-twilio-call.py
.
Then pull up the browser window by going to http://127.0.0.1:5000/
and we should see the text Hello World
.
At the same time our application is running, open a new terminal window and type :
ngrok http 127.0.0.1:5000ngrok http 127.0.0.1:5000ngrok http 127.0.0.1:5000
Enter fullscreen mode Exit fullscreen mode
Copy the ngrok url and add it to Twilio by navigating to ‘Phone Numbers -> Manage -> Active Numbers’, then click on your Twilio phone number.
Scroll down to the ‘Voice’ section and add the webhook, our ngrok URL with the recordings endpoint and save. Like this https://6d71-104-6-9-133.ngrok.io/recordings
We’ll implement the /recordings
endpoint in a few.
Leave both terminals running as we’ll need these to run our application and receive the phone call.
Let’s store our environment variables in a .env
file with the following:
DEEPGRAM_API_KEY=[‘YOUR_API_KEY’]RECEIVER_NUMBER=[‘PHONE_NUMBER_TO_RECEIVE_CALL’]DEEPGRAM_API_KEY=[‘YOUR_API_KEY’] RECEIVER_NUMBER=[‘PHONE_NUMBER_TO_RECEIVE_CALL’]DEEPGRAM_API_KEY=[‘YOUR_API_KEY’] RECEIVER_NUMBER=[‘PHONE_NUMBER_TO_RECEIVE_CALL’]
Enter fullscreen mode Exit fullscreen mode
We can replace YOUR_API_KEY
with the API key we received from signing up in the Deepgram console, and the PHONE_NUMBER_TO_RECEIVE_CALL
is the phone number we would like to receive the call.
Let’s replace the code in our deepgram-twilio-call.py
with the following:
<span>import</span> <span>asyncio</span><span>import</span> <span>json</span><span>import</span> <span>os</span><span>from</span> <span>flask</span> <span>import</span> <span>Flask</span><span>,</span> <span>request</span><span>,</span> <span>render_template</span><span>from</span> <span>deepgram</span> <span>import</span> <span>Deepgram</span><span>from</span> <span>twilio.twiml.voice_response</span> <span>import</span> <span>Dial</span><span>,</span> <span>VoiceResponse</span><span>from</span> <span>twilio.rest</span> <span>import</span> <span>Client</span><span>from</span> <span>pysondb</span> <span>import</span> <span>db</span><span>from</span> <span>dotenv</span> <span>import</span> <span>load_dotenv</span><span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span><span>calls_db</span><span>=</span><span>db</span><span>.</span><span>getDb</span><span>(</span><span>'calls'</span><span>)</span><span>load_dotenv</span><span>()</span><span>@</span><span>app</span><span>.</span><span>post</span><span>(</span><span>"/inbound"</span><span>)</span><span>def</span> <span>inbound_call</span><span>():</span><span>response</span> <span>=</span> <span>VoiceResponse</span><span>()</span><span>dial</span> <span>=</span> <span>Dial</span><span>(</span><span>record</span><span>=</span><span>'record-from-answer-dual'</span><span>,</span><span>recording_status_callback</span><span>=</span><span>'https://6d71-104-6-9-133.ngrok.io/recordings'</span><span>)</span><span>dial</span><span>.</span><span>number</span><span>(</span><span>os</span><span>.</span><span>getenv</span><span>(</span><span>"RECEIVER_NUMBER"</span><span>))</span><span>response</span><span>.</span><span>append</span><span>(</span><span>dial</span><span>)</span><span>return</span> <span>str</span><span>(</span><span>response</span><span>)</span><span>import</span> <span>asyncio</span> <span>import</span> <span>json</span> <span>import</span> <span>os</span> <span>from</span> <span>flask</span> <span>import</span> <span>Flask</span><span>,</span> <span>request</span><span>,</span> <span>render_template</span> <span>from</span> <span>deepgram</span> <span>import</span> <span>Deepgram</span> <span>from</span> <span>twilio.twiml.voice_response</span> <span>import</span> <span>Dial</span><span>,</span> <span>VoiceResponse</span> <span>from</span> <span>twilio.rest</span> <span>import</span> <span>Client</span> <span>from</span> <span>pysondb</span> <span>import</span> <span>db</span> <span>from</span> <span>dotenv</span> <span>import</span> <span>load_dotenv</span> <span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span> <span>calls_db</span><span>=</span><span>db</span><span>.</span><span>getDb</span><span>(</span><span>'calls'</span><span>)</span> <span>load_dotenv</span><span>()</span> <span>@</span><span>app</span><span>.</span><span>post</span><span>(</span><span>"/inbound"</span><span>)</span> <span>def</span> <span>inbound_call</span><span>():</span> <span>response</span> <span>=</span> <span>VoiceResponse</span><span>()</span> <span>dial</span> <span>=</span> <span>Dial</span><span>(</span> <span>record</span><span>=</span><span>'record-from-answer-dual'</span><span>,</span> <span>recording_status_callback</span><span>=</span><span>'https://6d71-104-6-9-133.ngrok.io/recordings'</span> <span>)</span> <span>dial</span><span>.</span><span>number</span><span>(</span><span>os</span><span>.</span><span>getenv</span><span>(</span><span>"RECEIVER_NUMBER"</span><span>))</span> <span>response</span><span>.</span><span>append</span><span>(</span><span>dial</span><span>)</span> <span>return</span> <span>str</span><span>(</span><span>response</span><span>)</span>import asyncio import json import os from flask import Flask, request, render_template from deepgram import Deepgram from twilio.twiml.voice_response import Dial, VoiceResponse from twilio.rest import Client from pysondb import db from dotenv import load_dotenv app = Flask(__name__) calls_db=db.getDb('calls') load_dotenv() @app.post("/inbound") def inbound_call(): response = VoiceResponse() dial = Dial( record='record-from-answer-dual', recording_status_callback='https://6d71-104-6-9-133.ngrok.io/recordings' ) dial.number(os.getenv("RECEIVER_NUMBER")) response.append(dial) return str(response)
Enter fullscreen mode Exit fullscreen mode
Here we are importing our libraries and creating a new instance of a Flask application. Then we create a new database named calls
. We are using a lightweight JSON database called PysonDB.
We create the /inbound
endpoint, which allows us to make a voice call. The parameter record='record-from-answer-dual'
will help us make a dual call or a phone that can call another.
Next, in our /recordings
route below, we tap into Deepgram’s speech-to-text feature by getting the recording of our call and using speech recognition to transcribe the audio. We check if results
is in the response and format it by using a list comprehension and storing the results in utterances
. We then add the utterances
to the calls
database.
<span>@</span><span>app</span><span>.</span><span>route</span><span>(</span><span>"/recordings"</span><span>,</span> <span>methods</span><span>=</span><span>[</span><span>'GET'</span><span>,</span> <span>'POST'</span><span>])</span><span>async</span> <span>def</span> <span>get_recordings</span><span>():</span><span>deepgram</span> <span>=</span> <span>Deepgram</span><span>(</span><span>os</span><span>.</span><span>getenv</span><span>(</span><span>"DEEPGRAM_API_KEY"</span><span>))</span><span>recording_url</span> <span>=</span> <span>request</span><span>.</span><span>form</span><span>[</span><span>'RecordingUrl'</span><span>]</span><span>source</span> <span>=</span> <span>{</span><span>'url'</span><span>:</span> <span>recording_url</span><span>}</span><span>transcript_data</span> <span>=</span> <span>await</span> <span>deepgram</span><span>.</span><span>transcription</span><span>.</span><span>prerecorded</span><span>(</span><span>source</span><span>,</span> <span>{</span><span>'punctuate'</span><span>:</span> <span>True</span><span>,</span><span>'utterances'</span><span>:</span> <span>True</span><span>,</span><span>'model'</span><span>:</span> <span>'phonecall'</span><span>,</span><span>'multichannel'</span><span>:</span> <span>True</span><span>})</span><span>if</span> <span>'results'</span> <span>in</span> <span>transcript_data</span><span>:</span><span>utterances</span> <span>=</span> <span>[</span><span>{</span><span>'channel'</span><span>:</span> <span>utterance</span><span>[</span><span>'channel'</span><span>],</span><span>'transcript'</span><span>:</span> <span>utterance</span><span>[</span><span>'transcript'</span><span>]</span><span>}</span> <span>for</span> <span>utterance</span> <span>in</span> <span>transcript_data</span><span>[</span><span>'results'</span><span>][</span><span>'utterances'</span><span>]</span><span>]</span><span>calls_db</span><span>.</span><span>addMany</span><span>(</span><span>utterances</span><span>)</span><span>return</span> <span>json</span><span>.</span><span>dumps</span><span>(</span><span>utterances</span><span>,</span> <span>indent</span><span>=</span><span>4</span><span>)</span><span>@</span><span>app</span><span>.</span><span>route</span><span>(</span><span>"/recordings"</span><span>,</span> <span>methods</span><span>=</span><span>[</span><span>'GET'</span><span>,</span> <span>'POST'</span><span>])</span> <span>async</span> <span>def</span> <span>get_recordings</span><span>():</span> <span>deepgram</span> <span>=</span> <span>Deepgram</span><span>(</span><span>os</span><span>.</span><span>getenv</span><span>(</span><span>"DEEPGRAM_API_KEY"</span><span>))</span> <span>recording_url</span> <span>=</span> <span>request</span><span>.</span><span>form</span><span>[</span><span>'RecordingUrl'</span><span>]</span> <span>source</span> <span>=</span> <span>{</span><span>'url'</span><span>:</span> <span>recording_url</span><span>}</span> <span>transcript_data</span> <span>=</span> <span>await</span> <span>deepgram</span><span>.</span><span>transcription</span><span>.</span><span>prerecorded</span><span>(</span><span>source</span><span>,</span> <span>{</span><span>'punctuate'</span><span>:</span> <span>True</span><span>,</span> <span>'utterances'</span><span>:</span> <span>True</span><span>,</span> <span>'model'</span><span>:</span> <span>'phonecall'</span><span>,</span> <span>'multichannel'</span><span>:</span> <span>True</span> <span>})</span> <span>if</span> <span>'results'</span> <span>in</span> <span>transcript_data</span><span>:</span> <span>utterances</span> <span>=</span> <span>[</span> <span>{</span> <span>'channel'</span><span>:</span> <span>utterance</span><span>[</span><span>'channel'</span><span>],</span> <span>'transcript'</span><span>:</span> <span>utterance</span><span>[</span><span>'transcript'</span><span>]</span> <span>}</span> <span>for</span> <span>utterance</span> <span>in</span> <span>transcript_data</span><span>[</span><span>'results'</span><span>][</span><span>'utterances'</span><span>]</span> <span>]</span> <span>calls_db</span><span>.</span><span>addMany</span><span>(</span><span>utterances</span><span>)</span> <span>return</span> <span>json</span><span>.</span><span>dumps</span><span>(</span><span>utterances</span><span>,</span> <span>indent</span><span>=</span><span>4</span><span>)</span>@app.route("/recordings", methods=['GET', 'POST']) async def get_recordings(): deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY")) recording_url = request.form['RecordingUrl'] source = {'url': recording_url} transcript_data = await deepgram.transcription.prerecorded(source, {'punctuate': True, 'utterances': True, 'model': 'phonecall', 'multichannel': True }) if 'results' in transcript_data: utterances = [ { 'channel': utterance['channel'], 'transcript': utterance['transcript'] } for utterance in transcript_data['results']['utterances'] ] calls_db.addMany(utterances) return json.dumps(utterances, indent=4)
Enter fullscreen mode Exit fullscreen mode
We can see how the utterances will look after they’re formatted:
[{'channel': 0, 'transcript': 'Hello?', 'id': 288397603074461838},{'channel': 1, 'transcript': 'Hello?', 'id': 109089630999017748},{'channel': 0, 'transcript': "Hey. How's it going? It's good to hear from you.", 'id': 124620676610936565},{'channel': 0, 'transcript': 'Thanks. You too.', 'id': 182036969834868158},{'channel': 1, 'transcript': 'Thanks. You too.', 'id': 817052835121297399}][{'channel': 0, 'transcript': 'Hello?', 'id': 288397603074461838}, {'channel': 1, 'transcript': 'Hello?', 'id': 109089630999017748}, {'channel': 0, 'transcript': "Hey. How's it going? It's good to hear from you.", 'id': 124620676610936565}, {'channel': 0, 'transcript': 'Thanks. You too.', 'id': 182036969834868158}, {'channel': 1, 'transcript': 'Thanks. You too.', 'id': 817052835121297399}][{'channel': 0, 'transcript': 'Hello?', 'id': 288397603074461838}, {'channel': 1, 'transcript': 'Hello?', 'id': 109089630999017748}, {'channel': 0, 'transcript': "Hey. How's it going? It's good to hear from you.", 'id': 124620676610936565}, {'channel': 0, 'transcript': 'Thanks. You too.', 'id': 182036969834868158}, {'channel': 1, 'transcript': 'Thanks. You too.', 'id': 817052835121297399}]
Enter fullscreen mode Exit fullscreen mode
Lastly, let’s add our /transcribe
route and a templates folder with an index.html
file that will display our phone speech-to-text transcript.
In our Python file, add the following code, which will get the voice-to-text transcript from the database and renders them in the HTML template.
<span>@</span><span>app</span><span>.</span><span>route</span><span>(</span><span>"/transcribe"</span><span>,</span> <span>methods</span><span>=</span><span>[</span><span>'GET'</span><span>,</span> <span>'POST'</span><span>])</span><span>def</span> <span>transcribe_call</span><span>():</span><span>context</span> <span>=</span> <span>calls_db</span><span>.</span><span>getAll</span><span>()</span><span>return</span> <span>render_template</span><span>(</span><span>"index.html"</span><span>,</span> <span>context</span><span>=</span><span>context</span> <span>)</span><span>if</span> <span>__name__</span> <span>==</span> <span>"__main__"</span><span>:</span><span>app</span><span>.</span><span>run</span><span>(</span><span>debug</span><span>=</span><span>True</span><span>)</span><span>@</span><span>app</span><span>.</span><span>route</span><span>(</span><span>"/transcribe"</span><span>,</span> <span>methods</span><span>=</span><span>[</span><span>'GET'</span><span>,</span> <span>'POST'</span><span>])</span> <span>def</span> <span>transcribe_call</span><span>():</span> <span>context</span> <span>=</span> <span>calls_db</span><span>.</span><span>getAll</span><span>()</span> <span>return</span> <span>render_template</span><span>(</span><span>"index.html"</span><span>,</span> <span>context</span><span>=</span><span>context</span> <span>)</span> <span>if</span> <span>__name__</span> <span>==</span> <span>"__main__"</span><span>:</span> <span>app</span><span>.</span><span>run</span><span>(</span><span>debug</span><span>=</span><span>True</span><span>)</span>@app.route("/transcribe", methods=['GET', 'POST']) def transcribe_call(): context = calls_db.getAll() return render_template("index.html", context=context ) if __name__ == "__main__": app.run(debug=True)
Enter fullscreen mode Exit fullscreen mode
Create a folder in our project directory called templates
and add an index.html
file. In that file, add the following HTML and Jinja code:
<span><!DOCTYPE html></span><span><html</span> <span>lang=</span><span>"en"</span><span>></span><span><head></span><span><meta</span> <span>charset=</span><span>"UTF-8"</span><span>></span><span><meta</span> <span>http-equiv=</span><span>"X-UA-Compatible"</span> <span>content=</span><span>"IE=edge"</span><span>></span><span><meta</span> <span>name=</span><span>"viewport"</span> <span>content=</span><span>"width=device-width, initial-scale=1.0"</span><span>></span><span><title></span>Document<span></title></span><span></head></span><span><body></span>{% for c in context %}{{ c.transcript }} <span><br/></span>{% endfor %}<span></body></span><span></html></span><span><!DOCTYPE html></span> <span><html</span> <span>lang=</span><span>"en"</span><span>></span> <span><head></span> <span><meta</span> <span>charset=</span><span>"UTF-8"</span><span>></span> <span><meta</span> <span>http-equiv=</span><span>"X-UA-Compatible"</span> <span>content=</span><span>"IE=edge"</span><span>></span> <span><meta</span> <span>name=</span><span>"viewport"</span> <span>content=</span><span>"width=device-width, initial-scale=1.0"</span><span>></span> <span><title></span>Document<span></title></span> <span></head></span> <span><body></span> {% for c in context %} {{ c.transcript }} <span><br/></span> {% endfor %} <span></body></span> <span></html></span><!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Document</title> </head> <body> {% for c in context %} {{ c.transcript }} <br/> {% endfor %} </body> </html>
Enter fullscreen mode Exit fullscreen mode
Here we loop through every transcript and display it on the screen.
Finally, let’s try making a phone call and using your non-Twilio phone to initiate a phone conversation with the phone number you provided in the environment variable RECEIVER_NUMBER
. We should be able to receive a call and engage in a conversation. After we hang up, the transcript will appear in our browser.
Congratulations on building a speech-to-text Python project with Twilio and Deepgram! If you have any questions, please feel free to reach out to us on Twitter at @DeepgramDevs.
暂无评论内容