Speech Recognition with Twilio and Python

Imagine having the ability to transcribe your voice calls. Look no further because we’ll learn how to do that in this article by combining Twilio with Deepgram.

With Twilio, we can use one of their phone numbers to receive and record incoming calls and get a transcript using the Deepgram Speech Recognition API. We’ll use the Deepgram Python SDK in this example.

Here’s a snapshot of what we’ll see in the browser after making the phone call and using Deepgram voice-to-text.

Getting Started

Before we start, it’s essential to generate a Deepgram API key to use in our project. We can go to our Deepgram console. Make sure to copy it and keep it in a safe place, as you won’t be able to retrieve it again and will have to create a new one. In this tutorial, we’ll use Python 3.10, but Deepgram supports some earlier versions of Python.

Make sure to go to Twilio and sign up for an account. We’ll need to purchase a phone number with voice capabilities.

We’ll also need two phones to make the outgoing call and another to receive a call.

In our project, we’ll use Ngrok, which provides a temporary URL that will act as the webhook in our application. Ngrok will forward requests to our application that is running locally. You can download it here.

Next, let’s make a directory anywhere we’d like.


mkdir deepgram-twilio
mkdir deepgram-twilio
mkdir deepgram-twilio

Enter fullscreen mode Exit fullscreen mode

Then change into that directory so we can start adding things to it.


cd deepgram-twilio
cd deepgram-twilio
cd deepgram-twilio

Enter fullscreen mode Exit fullscreen mode

We’ll also need to set up a virtual environment to hold our project and its dependencies. We can read more about those here and how to create one.

Panel with important note
It’s recommended in Python to use a virtual environment so our project can be installed inside a container rather than installing it system-wide.

Ensure our virtual environment is activated because we’ll install dependencies inside. If our virtual environment is named venv, then activate it.


source venv/bin/activate
source venv/bin/activate
source venv/bin/activate

Enter fullscreen mode Exit fullscreen mode

Let’s install our dependencies for our project by running the below pip installs from our terminal inside our virtual environment.


 pip install deepgram-sdk
 pip install twilio
 pip install python-dotenv
 pip install Flask 
 pip install 'flask[async]'  
 pip install pysondb 
 pip install deepgram-sdk
 pip install twilio
 pip install python-dotenv
 pip install Flask 
 pip install 'flask[async]'  
 pip install pysondb 
 pip install deepgram-sdk
 pip install twilio
 pip install python-dotenv
 pip install Flask 
 pip install 'flask[async]'  
 pip install pysondb

Enter fullscreen mode Exit fullscreen mode

Now we can open up our favorite editor and create a file called deepgram-twilio-call.py. If you’d like to make it from the command line, do this:


touch deepgram-twilio-call.py
touch deepgram-twilio-call.py
touch deepgram-twilio-call.py

Enter fullscreen mode Exit fullscreen mode

The Code

Now to the fun part! Open our script called deepgram-twilio-call.py and add the following code to make sure our Flask application runs without errors:


<span>from</span> <span>flask</span> <span>import</span> <span>Flask</span>
<span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span>
<span>@</span><span>app</span><span>.</span><span>get</span><span>(</span><span>"/"</span><span>)</span>
<span>def</span> <span>hello</span><span>():</span>
    <span>return</span> <span>"Hello World!"</span>
<span>if</span> <span>__name__</span> <span>==</span> <span>"__main__"</span><span>:</span>
    <span>app</span><span>.</span><span>run</span><span>()</span>
<span>from</span> <span>flask</span> <span>import</span> <span>Flask</span>

<span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span>

<span>@</span><span>app</span><span>.</span><span>get</span><span>(</span><span>"/"</span><span>)</span>
<span>def</span> <span>hello</span><span>():</span>
    <span>return</span> <span>"Hello World!"</span>

<span>if</span> <span>__name__</span> <span>==</span> <span>"__main__"</span><span>:</span>
    <span>app</span><span>.</span><span>run</span><span>()</span>
from flask import Flask

app = Flask(__name__)

@app.get("/")
def hello():
    return "Hello World!"

if __name__ == "__main__":
    app.run()

Enter fullscreen mode Exit fullscreen mode

Run our Flask application by typing this into the terminal python deepgram-twilio-call.py.

Then pull up the browser window by going to http://127.0.0.1:5000/ and we should see the text Hello World.

At the same time our application is running, open a new terminal window and type :


ngrok http 127.0.0.1:5000
ngrok http 127.0.0.1:5000
ngrok http 127.0.0.1:5000

Enter fullscreen mode Exit fullscreen mode

Copy the ngrok url and add it to Twilio by navigating to ‘Phone Numbers -> Manage -> Active Numbers’, then click on your Twilio phone number.

Scroll down to the ‘Voice’ section and add the webhook, our ngrok URL with the recordings endpoint and save. Like this https://6d71-104-6-9-133.ngrok.io/recordings

We’ll implement the /recordings endpoint in a few.

Leave both terminals running as we’ll need these to run our application and receive the phone call.

Let’s store our environment variables in a .env file with the following:


DEEPGRAM_API_KEY=[‘YOUR_API_KEY’]
RECEIVER_NUMBER=[‘PHONE_NUMBER_TO_RECEIVE_CALL’]
DEEPGRAM_API_KEY=[‘YOUR_API_KEY’]
RECEIVER_NUMBER=[‘PHONE_NUMBER_TO_RECEIVE_CALL’]
DEEPGRAM_API_KEY=[‘YOUR_API_KEY’]
RECEIVER_NUMBER=[‘PHONE_NUMBER_TO_RECEIVE_CALL’]

Enter fullscreen mode Exit fullscreen mode

We can replace YOUR_API_KEY with the API key we received from signing up in the Deepgram console, and the PHONE_NUMBER_TO_RECEIVE_CALL is the phone number we would like to receive the call.

Let’s replace the code in our deepgram-twilio-call.py with the following:


<span>import</span> <span>asyncio</span>
<span>import</span> <span>json</span>
<span>import</span> <span>os</span>
<span>from</span> <span>flask</span> <span>import</span> <span>Flask</span><span>,</span> <span>request</span><span>,</span> <span>render_template</span>
<span>from</span> <span>deepgram</span> <span>import</span> <span>Deepgram</span>
<span>from</span> <span>twilio.twiml.voice_response</span> <span>import</span> <span>Dial</span><span>,</span> <span>VoiceResponse</span>
<span>from</span> <span>twilio.rest</span> <span>import</span> <span>Client</span>
<span>from</span> <span>pysondb</span> <span>import</span> <span>db</span>
<span>from</span> <span>dotenv</span> <span>import</span> <span>load_dotenv</span>
<span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span>
<span>calls_db</span><span>=</span><span>db</span><span>.</span><span>getDb</span><span>(</span><span>'calls'</span><span>)</span>
<span>load_dotenv</span><span>()</span>
<span>@</span><span>app</span><span>.</span><span>post</span><span>(</span><span>"/inbound"</span><span>)</span>
<span>def</span> <span>inbound_call</span><span>():</span>
  <span>response</span> <span>=</span> <span>VoiceResponse</span><span>()</span>
  <span>dial</span> <span>=</span> <span>Dial</span><span>(</span>
      <span>record</span><span>=</span><span>'record-from-answer-dual'</span><span>,</span>
      <span>recording_status_callback</span><span>=</span><span>'https://6d71-104-6-9-133.ngrok.io/recordings'</span>
      <span>)</span>
  <span>dial</span><span>.</span><span>number</span><span>(</span><span>os</span><span>.</span><span>getenv</span><span>(</span><span>"RECEIVER_NUMBER"</span><span>))</span>
  <span>response</span><span>.</span><span>append</span><span>(</span><span>dial</span><span>)</span>
  <span>return</span> <span>str</span><span>(</span><span>response</span><span>)</span>
<span>import</span> <span>asyncio</span>
<span>import</span> <span>json</span>
<span>import</span> <span>os</span>


<span>from</span> <span>flask</span> <span>import</span> <span>Flask</span><span>,</span> <span>request</span><span>,</span> <span>render_template</span>
<span>from</span> <span>deepgram</span> <span>import</span> <span>Deepgram</span>
<span>from</span> <span>twilio.twiml.voice_response</span> <span>import</span> <span>Dial</span><span>,</span> <span>VoiceResponse</span>
<span>from</span> <span>twilio.rest</span> <span>import</span> <span>Client</span>
<span>from</span> <span>pysondb</span> <span>import</span> <span>db</span>
<span>from</span> <span>dotenv</span> <span>import</span> <span>load_dotenv</span>


<span>app</span> <span>=</span> <span>Flask</span><span>(</span><span>__name__</span><span>)</span>

<span>calls_db</span><span>=</span><span>db</span><span>.</span><span>getDb</span><span>(</span><span>'calls'</span><span>)</span>

<span>load_dotenv</span><span>()</span>

<span>@</span><span>app</span><span>.</span><span>post</span><span>(</span><span>"/inbound"</span><span>)</span>
<span>def</span> <span>inbound_call</span><span>():</span>
  <span>response</span> <span>=</span> <span>VoiceResponse</span><span>()</span>
  <span>dial</span> <span>=</span> <span>Dial</span><span>(</span>
      <span>record</span><span>=</span><span>'record-from-answer-dual'</span><span>,</span>
      <span>recording_status_callback</span><span>=</span><span>'https://6d71-104-6-9-133.ngrok.io/recordings'</span>
      <span>)</span>

  <span>dial</span><span>.</span><span>number</span><span>(</span><span>os</span><span>.</span><span>getenv</span><span>(</span><span>"RECEIVER_NUMBER"</span><span>))</span>
  <span>response</span><span>.</span><span>append</span><span>(</span><span>dial</span><span>)</span>

  <span>return</span> <span>str</span><span>(</span><span>response</span><span>)</span>
import asyncio
import json
import os


from flask import Flask, request, render_template
from deepgram import Deepgram
from twilio.twiml.voice_response import Dial, VoiceResponse
from twilio.rest import Client
from pysondb import db
from dotenv import load_dotenv


app = Flask(__name__)

calls_db=db.getDb('calls')

load_dotenv()

@app.post("/inbound")
def inbound_call():
  response = VoiceResponse()
  dial = Dial(
      record='record-from-answer-dual',
      recording_status_callback='https://6d71-104-6-9-133.ngrok.io/recordings'
      )

  dial.number(os.getenv("RECEIVER_NUMBER"))
  response.append(dial)

  return str(response)

Enter fullscreen mode Exit fullscreen mode

Here we are importing our libraries and creating a new instance of a Flask application. Then we create a new database named calls. We are using a lightweight JSON database called PysonDB.

We create the /inbound endpoint, which allows us to make a voice call. The parameter record='record-from-answer-dual' will help us make a dual call or a phone that can call another.

Next, in our /recordings route below, we tap into Deepgram’s speech-to-text feature by getting the recording of our call and using speech recognition to transcribe the audio. We check if results is in the response and format it by using a list comprehension and storing the results in utterances. We then add the utterances to the calls database.


<span>@</span><span>app</span><span>.</span><span>route</span><span>(</span><span>"/recordings"</span><span>,</span> <span>methods</span><span>=</span><span>[</span><span>'GET'</span><span>,</span> <span>'POST'</span><span>])</span>
<span>async</span> <span>def</span> <span>get_recordings</span><span>():</span>
   <span>deepgram</span> <span>=</span> <span>Deepgram</span><span>(</span><span>os</span><span>.</span><span>getenv</span><span>(</span><span>"DEEPGRAM_API_KEY"</span><span>))</span>
   <span>recording_url</span> <span>=</span> <span>request</span><span>.</span><span>form</span><span>[</span><span>'RecordingUrl'</span><span>]</span>
   <span>source</span> <span>=</span> <span>{</span><span>'url'</span><span>:</span> <span>recording_url</span><span>}</span>
   <span>transcript_data</span> <span>=</span> <span>await</span> <span>deepgram</span><span>.</span><span>transcription</span><span>.</span><span>prerecorded</span><span>(</span><span>source</span><span>,</span> <span>{</span><span>'punctuate'</span><span>:</span> <span>True</span><span>,</span>
                                                                       <span>'utterances'</span><span>:</span> <span>True</span><span>,</span>
                                                                       <span>'model'</span><span>:</span> <span>'phonecall'</span><span>,</span>
                                                                       <span>'multichannel'</span><span>:</span> <span>True</span>
                                                            <span>})</span>
   <span>if</span> <span>'results'</span> <span>in</span> <span>transcript_data</span><span>:</span>
       <span>utterances</span> <span>=</span> <span>[</span>
           <span>{</span>
               <span>'channel'</span><span>:</span> <span>utterance</span><span>[</span><span>'channel'</span><span>],</span>
               <span>'transcript'</span><span>:</span> <span>utterance</span><span>[</span><span>'transcript'</span><span>]</span>
           <span>}</span> <span>for</span> <span>utterance</span> <span>in</span> <span>transcript_data</span><span>[</span><span>'results'</span><span>][</span><span>'utterances'</span><span>]</span>
       <span>]</span>
       <span>calls_db</span><span>.</span><span>addMany</span><span>(</span><span>utterances</span><span>)</span>
       <span>return</span> <span>json</span><span>.</span><span>dumps</span><span>(</span><span>utterances</span><span>,</span> <span>indent</span><span>=</span><span>4</span><span>)</span>
<span>@</span><span>app</span><span>.</span><span>route</span><span>(</span><span>"/recordings"</span><span>,</span> <span>methods</span><span>=</span><span>[</span><span>'GET'</span><span>,</span> <span>'POST'</span><span>])</span>
<span>async</span> <span>def</span> <span>get_recordings</span><span>():</span>
   <span>deepgram</span> <span>=</span> <span>Deepgram</span><span>(</span><span>os</span><span>.</span><span>getenv</span><span>(</span><span>"DEEPGRAM_API_KEY"</span><span>))</span>

   <span>recording_url</span> <span>=</span> <span>request</span><span>.</span><span>form</span><span>[</span><span>'RecordingUrl'</span><span>]</span>
   <span>source</span> <span>=</span> <span>{</span><span>'url'</span><span>:</span> <span>recording_url</span><span>}</span>
   <span>transcript_data</span> <span>=</span> <span>await</span> <span>deepgram</span><span>.</span><span>transcription</span><span>.</span><span>prerecorded</span><span>(</span><span>source</span><span>,</span> <span>{</span><span>'punctuate'</span><span>:</span> <span>True</span><span>,</span>
                                                                       <span>'utterances'</span><span>:</span> <span>True</span><span>,</span>
                                                                       <span>'model'</span><span>:</span> <span>'phonecall'</span><span>,</span>
                                                                       <span>'multichannel'</span><span>:</span> <span>True</span>
                                                            <span>})</span>


   <span>if</span> <span>'results'</span> <span>in</span> <span>transcript_data</span><span>:</span>
       <span>utterances</span> <span>=</span> <span>[</span>
           <span>{</span>
               <span>'channel'</span><span>:</span> <span>utterance</span><span>[</span><span>'channel'</span><span>],</span>
               <span>'transcript'</span><span>:</span> <span>utterance</span><span>[</span><span>'transcript'</span><span>]</span>
           <span>}</span> <span>for</span> <span>utterance</span> <span>in</span> <span>transcript_data</span><span>[</span><span>'results'</span><span>][</span><span>'utterances'</span><span>]</span>
       <span>]</span>

       <span>calls_db</span><span>.</span><span>addMany</span><span>(</span><span>utterances</span><span>)</span>

       <span>return</span> <span>json</span><span>.</span><span>dumps</span><span>(</span><span>utterances</span><span>,</span> <span>indent</span><span>=</span><span>4</span><span>)</span>
@app.route("/recordings", methods=['GET', 'POST'])
async def get_recordings():
   deepgram = Deepgram(os.getenv("DEEPGRAM_API_KEY"))

   recording_url = request.form['RecordingUrl']
   source = {'url': recording_url}
   transcript_data = await deepgram.transcription.prerecorded(source, {'punctuate': True,
                                                                       'utterances': True,
                                                                       'model': 'phonecall',
                                                                       'multichannel': True
                                                            })


   if 'results' in transcript_data:
       utterances = [
           {
               'channel': utterance['channel'],
               'transcript': utterance['transcript']
           } for utterance in transcript_data['results']['utterances']
       ]

       calls_db.addMany(utterances)

       return json.dumps(utterances, indent=4)

Enter fullscreen mode Exit fullscreen mode

We can see how the utterances will look after they’re formatted:


[{'channel': 0, 'transcript': 'Hello?', 'id': 288397603074461838}, 
{'channel': 1, 'transcript': 'Hello?', 'id': 109089630999017748}, 
{'channel': 0, 'transcript': "Hey. How's it going? It's good to hear from you.", 'id': 124620676610936565}, 
{'channel': 0, 'transcript': 'Thanks. You too.', 'id': 182036969834868158}, 
{'channel': 1, 'transcript': 'Thanks. You too.', 'id': 817052835121297399}]
[{'channel': 0, 'transcript': 'Hello?', 'id': 288397603074461838}, 
{'channel': 1, 'transcript': 'Hello?', 'id': 109089630999017748}, 
{'channel': 0, 'transcript': "Hey. How's it going? It's good to hear from you.", 'id': 124620676610936565}, 
{'channel': 0, 'transcript': 'Thanks. You too.', 'id': 182036969834868158}, 
{'channel': 1, 'transcript': 'Thanks. You too.', 'id': 817052835121297399}]
[{'channel': 0, 'transcript': 'Hello?', 'id': 288397603074461838}, 
{'channel': 1, 'transcript': 'Hello?', 'id': 109089630999017748}, 
{'channel': 0, 'transcript': "Hey. How's it going? It's good to hear from you.", 'id': 124620676610936565}, 
{'channel': 0, 'transcript': 'Thanks. You too.', 'id': 182036969834868158}, 
{'channel': 1, 'transcript': 'Thanks. You too.', 'id': 817052835121297399}]

Enter fullscreen mode Exit fullscreen mode

Lastly, let’s add our /transcribe route and a templates folder with an index.html file that will display our phone speech-to-text transcript.

In our Python file, add the following code, which will get the voice-to-text transcript from the database and renders them in the HTML template.


<span>@</span><span>app</span><span>.</span><span>route</span><span>(</span><span>"/transcribe"</span><span>,</span> <span>methods</span><span>=</span><span>[</span><span>'GET'</span><span>,</span> <span>'POST'</span><span>])</span>
<span>def</span> <span>transcribe_call</span><span>():</span>
   <span>context</span> <span>=</span> <span>calls_db</span><span>.</span><span>getAll</span><span>()</span>
   <span>return</span> <span>render_template</span><span>(</span><span>"index.html"</span><span>,</span> <span>context</span><span>=</span><span>context</span> <span>)</span>
<span>if</span> <span>__name__</span> <span>==</span> <span>"__main__"</span><span>:</span>
   <span>app</span><span>.</span><span>run</span><span>(</span><span>debug</span><span>=</span><span>True</span><span>)</span>
<span>@</span><span>app</span><span>.</span><span>route</span><span>(</span><span>"/transcribe"</span><span>,</span> <span>methods</span><span>=</span><span>[</span><span>'GET'</span><span>,</span> <span>'POST'</span><span>])</span>
<span>def</span> <span>transcribe_call</span><span>():</span>
   <span>context</span> <span>=</span> <span>calls_db</span><span>.</span><span>getAll</span><span>()</span>
   <span>return</span> <span>render_template</span><span>(</span><span>"index.html"</span><span>,</span> <span>context</span><span>=</span><span>context</span> <span>)</span>


<span>if</span> <span>__name__</span> <span>==</span> <span>"__main__"</span><span>:</span>
   <span>app</span><span>.</span><span>run</span><span>(</span><span>debug</span><span>=</span><span>True</span><span>)</span>
@app.route("/transcribe", methods=['GET', 'POST'])
def transcribe_call():
   context = calls_db.getAll()
   return render_template("index.html", context=context )


if __name__ == "__main__":
   app.run(debug=True)

Enter fullscreen mode Exit fullscreen mode

Create a folder in our project directory called templates and add an index.html file. In that file, add the following HTML and Jinja code:


<span><!DOCTYPE html></span>
<span><html</span> <span>lang=</span><span>"en"</span><span>></span>
<span><head></span>
   <span><meta</span> <span>charset=</span><span>"UTF-8"</span><span>></span>
   <span><meta</span> <span>http-equiv=</span><span>"X-UA-Compatible"</span> <span>content=</span><span>"IE=edge"</span><span>></span>
   <span><meta</span> <span>name=</span><span>"viewport"</span> <span>content=</span><span>"width=device-width, initial-scale=1.0"</span><span>></span>
   <span><title></span>Document<span></title></span>
<span></head></span>
<span><body></span>
   {% for c in context %}
       {{ c.transcript }} <span><br/></span>
   {% endfor %}
<span></body></span>
<span></html></span>
<span><!DOCTYPE html></span>
<span><html</span> <span>lang=</span><span>"en"</span><span>></span>
<span><head></span>
   <span><meta</span> <span>charset=</span><span>"UTF-8"</span><span>></span>
   <span><meta</span> <span>http-equiv=</span><span>"X-UA-Compatible"</span> <span>content=</span><span>"IE=edge"</span><span>></span>
   <span><meta</span> <span>name=</span><span>"viewport"</span> <span>content=</span><span>"width=device-width, initial-scale=1.0"</span><span>></span>
   <span><title></span>Document<span></title></span>
<span></head></span>
<span><body></span>
   {% for c in context %}
       {{ c.transcript }} <span><br/></span>
   {% endfor %}
<span></body></span>
<span></html></span>
<!DOCTYPE html>
<html lang="en">
<head>
   <meta charset="UTF-8">
   <meta http-equiv="X-UA-Compatible" content="IE=edge">
   <meta name="viewport" content="width=device-width, initial-scale=1.0">
   <title>Document</title>
</head>
<body>
   {% for c in context %}
       {{ c.transcript }} <br/>
   {% endfor %}
</body>
</html>

Enter fullscreen mode Exit fullscreen mode

Here we loop through every transcript and display it on the screen.

Finally, let’s try making a phone call and using your non-Twilio phone to initiate a phone conversation with the phone number you provided in the environment variable RECEIVER_NUMBER. We should be able to receive a call and engage in a conversation. After we hang up, the transcript will appear in our browser.

Congratulations on building a speech-to-text Python project with Twilio and Deepgram! If you have any questions, please feel free to reach out to us on Twitter at @DeepgramDevs.

原文链接：Speech Recognition with Twilio and Python

展开阅读全文

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END