Python Speech Recognition System Advantages and Installation Manual

2023-03-26 22:07:02

â–Œ Language Recognition Working Principles Overview

Speech recognition stems from research done in Bell Laboratories in the early 1950s. Earlier speech recognition systems can only recognize a single speaker and a vocabulary of only about a dozen words. Modern speech recognition systems have made great progress, can identify multiple speakers, and have a large vocabulary that recognizes multiple languages.

The first part of speech recognition is of course speech. Through the microphone, speech is converted from physical sounds to electrical signals and then converted to data by an analog-to-digital converter. Once digitized, several models can be applied to transcribe the audio into text.

Most modern speech recognition systems rely on Hidden Markov Models (HMM). Its working principle is that the speech signal can be approximated as a stationary process on a very short time scale (for example, 10 milliseconds), that is, a process whose statistical characteristics do not change with time.

Many modern speech recognition systems use neural networks before HMM recognition, simplifying speech signals through feature transformation and dimensionality reduction techniques. It is also possible to use a voice activity detector (VAD) to reduce the audio signal to parts that may only contain speech.

Fortunately, for Python users, some speech recognition services are available online through the API, and most of them also provide the Python SDK.

Python Speech Recognition System Advantages and Installation Manual

â–Œ Select Python Speech Recognition Pack

There are some ready-made speech recognition software packages in PyPI. These include:

â€¢apiai

â€¢google-cloud-speech

â€¢ pocketsphinx

â€¢SpeechRcognition

â€¢watson-developer-cloud

â€¢wit

Some software packages (such as wit and apiai) provide built-in features that go beyond basic speech recognition, such as natural language processing that recognizes the speaker's intentions. Other software packages, such as Google Cloud Voice, focus on the conversion of speech to text.

Among them, SpeechRecognition stands out for ease of use.

Recognizing speech requires inputting audio, and retrieving audio input in SpeechRecognition is very straightforward. It eliminates the need to build scripts that access the microphone and process audio files from scratch, and can be retrieved and run in minutes automatically.

The SpeechRecognition library meets several mainstream speech APIs and is therefore extremely flexible. The Google Web Speech API supports hard-coding to the default API key in the SpeechRecognition library and can be used without registration. SpeechRecognition is the best choice for writing Python programs with its flexibility and ease of use.

â–ŒInstall SpeechRecognation

SpeechRecognition is compatible with Python 2.6, 2.7, and 3.3+, but it also requires some additional installation steps if used in Python 2. All development versions in this tutorial default to Python 3.3+.

The reader can use the pip command to install SpeechRecognition from the terminal:

$ pip install SpeechRecognition

After the installation is complete, open the interpreter window and enter the following to verify the installation:

>>> import speech_recognition as sr>>> sr.__version__'3.8.1'

Note: Do not close this session and you will use it in the next few steps.

If you are dealing with an existing audio file, simply call SpeechRecognition directly, paying attention to the dependencies of the specific use case. Also note that the PyAudio package is installed for microphone input.

Recognizer class

The core of SpeechRecognition is the recognizer class.

The main purpose of the Recognizer API is to recognize speech. Each API has multiple settings and functions to identify the audio of the audio source. They are:

Recognize_bing(): Microsoft Bing Speech

Recognize_google(): Google Web Speech API

Recognize_google_cloud(): Google Cloud Speech - requires installation of the google-cloud-speech package

Recognize_houndify(): Houndify by SoundHound

Recognize_ibm(): IBM Speech to Text

Recognize_sphinx(): CMU Sphinx - requires installing PocketSphinx

Recognize_wit(): Wit.ai

Of the seven above, only recognition_sphinx() works offline with the CMU Sphinx engine, and the other six need Internet connection.

SpeechRecognition comes with the Google Web Speech API's default API key, which can be used directly. The other six APIs require an API key or username/password combination for authentication, so this article uses the Web Speech API.

Now start practicing and call the recognize_google() function in the interpreter session.

>>> r.recognize_google()

The screen will appear:

Traceback (most recent call last): File " ", line 1, in TypeError: recognize_google() missing 1 required positional argument: 'audio_data'

I believe you have guessed the result. How can you identify the data from the empty file?

The 7 recognize_*() recognizer classes all need to input the audio_data parameter, and each recognizer's audio_data must be an instance of the SpeechRecognition's AudioData class.

There are two ways to create an AudioData instance: an audio file or audio recorded by a microphone, starting with an easier to use audio file.

â–Œ Use of audio files

You first need to download the audio file (https://github.com/realpython/python-speech-recognition/tree/master/audio_files) and save it to the same directory as the Python interpreter session.

The AudioFile class can be initialized through the path of the audio file and provides a context manager interface for reading and processing the file content.

Support file type

The currently supported file types for SpeechRecognition are:

WAV: Must be in PCM/LPCM format

AIFF

AIFF-C

FLAC: Must be in initial FLAC format; OGG-FLAC format not available

If you are using x-86, macOS, or Windows on a Linux system, you need to support FLAC files. If you are running on another system, you need to install the FLAC encoder and make sure you can access the flac command.

Use record() to get data from a file

Type the following command in the interpreter session box to process the contents of the "harvard.wav" file:

>>> harvard = sr.AudioFile('harvard.wav')>>> with harvard as source:... audio = r.record(source)...

Open the file through the context manager and read the file content, and store the data in the AudioFile instance. Then record the data in the entire file to the AudioData instance through record(). You can confirm this by checking the audio type:

>>> type(audio)

Now you can call recognition_google() to try to recognize the speech in the audio.

>>> r.recognize_google(audio)'the stale smell of old beer lingers it takes heatto bring out the odor a cold dip restores health andzest a salt pickle taste fine with ham tacos alPastore are my favorite a zestful food is the hotcross bun'

The above completes the recording of the first audio file.

Obtain audio clips using offsets and durations

What if I only want to capture some of the speech content in the document? The record() command has a duration keyword that causes the command to stop recording after the specified number of seconds.

For example, the following only captures speech within the first four seconds of the file:

>>> with harvard as source:... audio = r.record(source, duration=4)...>>> r.recognize_google(audio)'the stale smell of old beer lingers'

When the record() command is called in the with block, the file stream moves forward. This means that if you record for four seconds and then record for four seconds, the first four seconds will return the second four seconds of audio.

>>> with harvard as source:... audio1 = r.record(source, duration=4)... audio2 = r.record(source, duration=4)...>>> r.recognize_google(audio1) 'the stale smell of old beer lingers'>>> r.recognize_google(audio2)'it takes heat to bring out the odor a cold dip'

In addition to specifying the record duration, you can also use the offset parameter to specify a starting point for the record() command, whose value indicates when the recording started. For example, to obtain only the second phrase in the file, you can set an offset of 4 seconds and record the duration of 3 seconds.

>>> with harvard as source:... audio = r.record(source, offset=4, duration=3)...>>> recognizer.recognize_google(audio)'it takes heat to bring out the odor'

The offset and duration keyword parameters are useful for splitting audio files when the structure of the speech in the file is known in advance. However, inaccurate use can lead to poor transcription.

>>> with harvard as source:... audio = r.record(source, offset=4.7, duration=2.8)...>>> recognizer.recognize_google(audio)'Mesquite to bring out the odor Aiko'

This program starts recording from the 4.7th second, so that the phrase â€œit takes heat to bring out the odorâ€ and the â€œit tâ€ in it is not recorded. At this time, the API only gets the input â€œakes heatâ€ and matches it. The result of "Mesquite".

Similarly, the API only captures "a co" when capturing the ending phrase of the recording "a cold dip restores health and zest", and is therefore incorrectly matched as "Aiko".

Noise is also a major culprit in the accuracy of translation. In the above example, since the audio file is clean, it works well, but in reality, it is impossible to obtain no-noise audio unless the audio file is processed in advance.

The effect of noise on speech recognition

Noise does exist in the real world, and all recordings have some degree of noise, and unprocessed noise can damage the accuracy of speech recognition applications.

To understand how noise affects speech recognition, download the "jackhammer.wav" (https://github.com/realpython/python-speech-recognition/tree/master/audio_files) file and make sure to save it to the interpreter session Working directory. The phrase "the stale smell of old beer lingers" in the document was read in a background sound that is very loud in the wall.

What happens when I try to transcribe this file?

>>> jackhammer = sr.AudioFile('jackhammer.wav')>>> with jackhammer as source:... audio = r.record(source)...>>> r.recognize_google(audio)'the snail smell Of old gear vendors'

So how do you deal with this problem? You can try calling the adjust_for_ambient_noise() command of the Recognizer class.

>>> with jackhammer as source:... r.adjust_for_ambient_noise(source)... audio = r.record(source)...>>> r.recognize_google(audio)'still smell of old beer vendors'

This is close to the exact result, but the accuracy is still problematic, and the â€œtheâ€ at the beginning of the phrase is lost. What is the reason?

Because when using the adjust_for_ambient_noise() command, the first second of the file stream is recognized as the noise level of the audio by default, so the first second of the file is consumed before the record() is used to retrieve the data.

You can use the duration keyword parameter to adjust the time analysis scope of the adjust_for_ambient_noise() command, which is in seconds, defaults to 1, and now reduces this value to 0.5.

>>> with jackhammer as source:... r.adjust_for_ambient_noise(source, duration=0.5)... audio = r.record(source)...>>> r.recognize_google(audio)'the snail smell like old Beer Mongers'

Now we have got the "the" of this sentence, but now there are some new problems - sometimes because the signal is so noisy that it cannot eliminate the effect of noise.

If you often encounter these problems, you need to do some preprocessing on the audio. This preprocessing can be done through audio editing software, or by applying a filter to a Python package of files (eg, SciPy). When dealing with noisy files, you can improve accuracy by looking at the actual API response. Most APIs return a JSON string containing multiple possible transcripts, but the recognition_google() method always returns only the most likely transcribed character if the full response is not mandatory.

The full response is given by changing the True parameter in recognition_google() to show_all .

>>> r.recognize_google(audio, show_all=True){'alternative': [ {'transcript': 'the snail smell like old Beer Mongers'}, {'transcript': 'the still smell of old beer vendors'} , {'transcript': 'the snail smell like old beer vendors'}, {'transcript': 'the stale smell of old beer vendors'}, {'transcript': 'the snail smell like old beermongers'}, {' Transcript': 'destihl smell of old beer vendors'}, {'transcript': 'the still smell like old beer vendors'}, {'transcript': 'bastille smell of old beer vendors'}, {'transcript': ' The still smell like old beermongers'}, {'transcript': 'the still smell of old beer venders'}, {'transcript': 'the still smelling old beer vendors'}, {'transcript': 'musty smell of old Beer vendors'}, {'transcript': 'the still smell of old beer vendor'}], 'final': True}

As you can see, recognition_google() returns a list with the keyword 'alternative', which refers to a list of all possible responses. This response list structure varies depending on the API and is mainly used to debug the results.

Use of microphone

To use the SpeechRecognizer to access the microphone, you must install the PyAudio package. Please close the current interpreter window and do the following:

Install PyAudio

The process of installing PyAudio varies depending on the operating system.

Debian Linux

If you are using Debian-based Linux (such as Ubuntu), you can use apt to install PyAudio:

$ sudo apt-get install python-pyaudio python3-pyaudio

You may still need to enable pip install pyaudio after the installation is complete, especially if you are running in a virtual environment.

macOS: macOS users first need to use Homebrew to install PortAudio, then call pip command to install PyAudio.

$ brew install portaudio$ pip install pyaudio

Windows: Windows users can directly call pip to install PyAudio.

$ pip install pyaudio

Installation Test: After installing PyAudio, you can perform installation tests from the console.

$ python -m speech_recognition

Make sure the default microphone is on and unmuted. If it is installed properly, you should see something like this:

A moment of silence, please...Set minimum energy threshold to 600.4452854381937Say something!

Speak into the microphone and watch how SpeechRecognition transcribes your speech.

Microphone class

Please open another interpreter session and create an example of a different class.

>>> import speech_recognition as sr>>> r = sr.Recognizer()

The default system microphone will be used instead of an audio file as the source. The reader can access it by creating an instance of the Microphone class.

>>> mic = sr.Microphone()

If the system does not have a default microphone (such as on the Raspberry Pi) or if you want to use a non-default microphone, you need to specify which microphone to use by providing a device index. The reader can obtain a list of microphone names by calling the list_microphone_names() function of the Microphone class.

>>> sr.Microphone.list_microphone_names()['HDA Intel PCH: ALC272 Analog (hw:0,0)', 'HDA Intel PCH: HDMI 0 (hw:0,3)', 'sysdefault', 'front' , 'surround40', 'surround51', 'surround71', 'hdmi', 'pulse', 'dmix', 'default']

Note: Your output may be different from the previous example.

List_microphone_names() Returns the index of the microphone device name in the list. In the above output, if you want to use a microphone named "front" with an index of 3 in the list, you can create a microphone instance like this:

>>> # This is just an example; do not run>>> mic = sr.Microphone(device_index=3)

However, the system default microphone needs to be used in most cases.

Use listen() to get microphone input data

After preparing the microphone instance, the reader can capture some input.

Just like the AudioFile class, Microphone is a context manager. You can use the listen() method of the Recognizer class in the with block to capture the microphone's input. This method takes the audio source as the first parameter and automatically records the input from the source until it stops automatically when silence is detected.

>>> with mic as source:... audio = r.listen(source)...

After executing the with block, try saying "hello" in the microphone. Please wait for the interpreter to display the prompt again. Once the ">>>>" prompt appears, you can recognize the voice.

>>> r.recognize_google(audio)'hello'

If you are not prompted to return again, probably because the microphone received too much ambient noise, use Ctrl + C to interrupt the process and let the interpreter show the prompt again.

To handle ambient noise, call the adjust_for_ambient_noise() function of the Recognizer class, which operates in the same way as a noisy audio file. Since the microphone input sound is less predictable than an audio file, this process can be used when listening to a microphone input any time.

>>> with mic as source:... r.adjust_for_ambient_noise(source)... audio = r.listen(source)...

Wait a few moments after running the above code and try saying "hello" in the microphone. Similarly, you must wait for the interpreter to prompt you to return before attempting to recognize speech.

Remember, adjust_for_ambient_noise() by default analyzes 1 second of audio in the audio source. If the reader thinks this time is too long, use the duration parameter to adjust.

The SpeechRecognition data suggests that the duration parameter should be no less than 0.5 seconds. In some cases, you may find that a duration longer than the default of one second produces better results. The minimum you need depends on the ambient environment the microphone is in, but this information is usually unknown during development. In my experience, the default duration of one second is sufficient for most applications.

Handle difficult-to-recognize speech

Try to enter the previous code example into the interpreter and enter some incomprehensible noise in the microphone. You should get this result:

Traceback (most recent call last): File " ", line 1, in File "/home/david/real_python/speech_recognition_primer/venv/lib/python3.5/site-packages/speech_recognition/__init__.py", line 858, in recognize_google if not isinstance(actual_result, dict) or len(actual_result.get( "alternative", [])) == 0: raise UnknownValueError()speech_recognition.UnknownValueError

Audio that cannot be matched to text by the API throws an UnknownValueError exception, so use try and except blocks frequently to solve this type of problem. The API will do its best to turn any sound into text. For example, a short buzz may be identified as "How". Cough, applause, and tongue snoring may all be converted into text and cause abnormalities.

Concluding remarks

In this tutorial, we have been recognizing English speech. English is the default language for each recognition _ * () method in the SpeechRecognition package. However, it is absolutely possible and easy to identify other voices. To identify speech in different languages, set the language keyword parameter of the recognition _ * () method to the string corresponding to the desired language.

Industrial Strip Lights Led Driver

Industrial Strip Lights Led Driver

12V led driver , 24v led driver, led power driver, Specifically for Industrial strip light, Isolated driver, easy to install, high security, constant Voltage power based. Are generally aluminum materials, long type driver.We have skilled engineers to answer all your questions and enquiries and provide all-round solutions basis on your project(s) and provide technical supporting.

What's the benefits of Fahold Driver?

Standard Linear Lighting, strips lighting
Cost-effective light-line solution for industry,commercial and other applications
Good quality of light with high lumen output to meet different requirements
Easy to order and install,requiring less time,reducing packaging waste and complexity
Flexible solution

Parameter:

Input voltage: 100-277vac / 100-347V
output voltage: 25-40vdc / 12Vdc / 24vdc
current: 100mA-8000mA.
Power factor: >0.9
Dimming:0-10V / PWM / RX / DALI.
>=50000hours, 5 years warranty.
certificate: UL CE FCC TUV SAA ect.

Industrial Strip Lights Led Driver-HR100W0212V2

led Industristal strips lights driver

FAQ:
Question 1:Are you a factory or a trading company?
Answer: We are a factory.
Question 2: Payment term?
Answer: 30% TT deposit + 70% TT before shipment,50% TT deposit + 50% LC balance, Flexible payment
can be negotiated.
Question 3: What's the main business of Fahold?
Answer: Fahold focused on LED controllers and dimmers from 2010. We have 28 engineers who dedicated themselves to researching and developing LED controlling and dimming system.
Question 4: What Fahold will do if we have problems after receiving your products?
Answer: Our products have been strictly inspected before shipping. Once you receive the products you are not satisfied, please feel free to contact us in time, we will do our best to solve any of your problems with our good after-sale service.

12V Led Driver,24V Led Driver,Led Power Driver

ShenZhen Fahold Electronic Limited , https://www.fahold.net