VOSK the Offline Speech Recognition-拾光赋

VOSK the Offline Speech Recognition

2个月前发布

04811

Introduction

What is VOSK? VOSK is a powerful tool for real-time speech recognition that does not require an internet connection. Developed by Alpha Cephei, it supports multiple languages and is highly efficient. It can even run on low-performance devices, such as the Raspberry Pi.

Using VOSK

To use VOSK, you first need to download the model from this website: https://alphacephei.com/vosk/models

VOSK offers models for many languages, including Portuguese, English, Japanese, and others:

Once you download the ZIP file containing the VOSK model, you will need to unzip it

Install Dependences

First, you need to install the VOSK package and PyAudio using the following command:


pip install vosk pyaudio
pip install vosk pyaudio
pip install vosk pyaudio

Enter fullscreen mode Exit fullscreen mode

PyAudio is a library that allows you to capture and reproduce audio using the PortAudio API. In this code, PyAudio is used to:

Open the microphone.
Capture audio in real time.
Collect audio data to Vosk for processing.

Let’s Code!!!

Code Exemple:


import pyaudio
import json
from vosk import Model, KaldiRecognizer
model = Model("vosk-model-en-us-0.22")
recognizer = KaldiRecognizer(model, 16000)
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
stream.start_stream()
print("listening...")
while True:
    data = stream.read(4096, exception_on_overflow=False)
    if recognizer.AcceptWaveform(data):
        result = json.loads(recognizer.Result())
        print("You:", result["text"])
import pyaudio
import json
from vosk import Model, KaldiRecognizer

model = Model("vosk-model-en-us-0.22")
recognizer = KaldiRecognizer(model, 16000)

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
stream.start_stream()

print("listening...")

while True:
    data = stream.read(4096, exception_on_overflow=False)
    if recognizer.AcceptWaveform(data):
        result = json.loads(recognizer.Result())
        print("You:", result["text"])
import pyaudio
import json
from vosk import Model, KaldiRecognizer

model = Model("vosk-model-en-us-0.22")
recognizer = KaldiRecognizer(model, 16000)

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
stream.start_stream()

print("listening...")

while True:
    data = stream.read(4096, exception_on_overflow=False)
    if recognizer.AcceptWaveform(data):
        result = json.loads(recognizer.Result())
        print("You:", result["text"])

Enter fullscreen mode Exit fullscreen mode

I created this simple example code to demonstrate how VOSK works.

1. Library Imports


import pyaudio
import json
from vosk import Model, KaldiRecognizer
import pyaudio
import json
from vosk import Model, KaldiRecognizer
import pyaudio
import json
from vosk import Model, KaldiRecognizer

Enter fullscreen mode Exit fullscreen mode

It’s a simple import library

2. Speech recognition model loading


model = Model("vosk-model-en-us-0.22")
model = Model("vosk-model-en-us-0.22")
model = Model("vosk-model-en-us-0.22")

Enter fullscreen mode Exit fullscreen mode

This loads the VOSK model for English (US).
3. Speech recognizer initialization


recognizer = KaldiRecognizer(model, 16000)
recognizer = KaldiRecognizer(model, 16000)
recognizer = KaldiRecognizer(model, 16000)

Enter fullscreen mode Exit fullscreen mode

KaldiRecognizer: Starts Speech Recognition model with 16000 Hz (16 kHz) sampling rate, commum sampling rate for Speech Recognition

4. Audio Settings


p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
stream.start_stream()
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
stream.start_stream()
p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)
stream.start_stream()

Enter fullscreen mode Exit fullscreen mode

pyaudio.PyAudio(): Initializes the PyAudio object, which is used to configure and manage audio capture.
p.open(): Opens an audio stream with the following settings:
format=pyaudio.paInt16: Audio format of 16 bits.
channels=1: Mono audio (1 channel).
rate=16000: Sampling rate of 16000 Hz.
input=True: Indicates that the stream will be used for audio input (capture).
frames_per_buffer=8192: Size of the audio buffer.
stream.start_stream(): Starts the audio capture.

5. Speech Recognition Loop


print("listening...")
while True:
    data = stream.read(4096, exception_on_overflow=False)
    if recognizer.AcceptWaveform(data):
        result = json.loads(recognizer.Result())
        print("You:", result["text"])
print("listening...")

while True:
    data = stream.read(4096, exception_on_overflow=False)
    if recognizer.AcceptWaveform(data):
        result = json.loads(recognizer.Result())
        print("You:", result["text"])
print("listening...")

while True:
    data = stream.read(4096, exception_on_overflow=False)
    if recognizer.AcceptWaveform(data):
        result = json.loads(recognizer.Result())
        print("You:", result["text"])

Enter fullscreen mode Exit fullscreen mode

stream.read(4096, exception_on_overflow=False):
Reads 4096 frames of audio from the stream. The exception_on_overflow=False parameter prevents the program from raising an exception in case of overflow.
recognizer.AcceptWaveform(data): Sends the audio data to the speech recognizer. If the recognizer detects a complete phrase, it returns True.
json.loads(recognizer.Result()): Converts the speech recognition result (which is in JSON format) into a Python dictionary.

Conclusion

VOSK is a powerful and efficient tool for real-time speech recognition, supporting multiple languages and running seamlessly on low-performance devices like the Raspberry Pi.
Its offline capability makes it ideal for applications where internet access is limited.
With this guide, you’ve learned how to set up VOSK, configure audio capture, and implement a basic speech recognition system.
Whether for voice-controlled apps, transcription tools, or language learning, VOSK offers a simple yet robust solution for integrating speech recognition into your projects.

Thanks for reading

原文链接：VOSK the Offline Speech Recognition

展开阅读全文

© 版权声明

文章版权声明 1、本网站名称：拾光赋
2、本站永久网址：https://www.blogs.ink
3、本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长QQ：805375623进行删除处理。
4、本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5、本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6、本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END

Python（EN）
# python # AI # speechrecognition # vosk

喜欢就支持一下吧

When we learn to treasure simple happiness then we will be winners in life.

当我们懂得珍惜平凡的幸福时，就已经成了人生的赢家

评论抢沙发

请登录后发表评论

暂无评论内容

文章目录

今日剩余 84.8%

本周剩余 69.3%

本月剩余 26.2%

本年剩余 69.3%

最近评论

每日一言

Little compliments mean so much to me sometimes.

有时候，一点微不足道的肯定，对我却意义非凡