Converting an Audio File to Text with Python

To transcribe a WAV audio file into text using Python, I rely on a combination of modules.

`pydub` is the module used for audio file manipulation (loading, splitting, exporting).
`speech_recognition` is the module that handles converting audio to text using Google’s speech recognition service.

from pydub import AudioSegment
import speech_recognition as sr

# Load the audio file
audio = AudioSegment.from_wav("/path/audio.wav")

# Define segment length (e.g., 30 seconds)
segment_length = 30 * 1000 # in milliseconds
segments = [audio[i:i+segment_length] for i in range(0, len(audio), segment_length)]

recognizer = sr.Recognizer()

# Transcribe each segment
for i, segment in enumerate(segments):
    segment.export(f"segment_{i}.wav", format="wav")
    with sr.AudioFile(f"segment_{i}.wav") as source:
        audio_data = recognizer.record(source)
        try:
            text = recognizer.recognize_google(audio_data, language='en-US')
            print(f"Segment {i}: {text}")
        except sr.RequestError as e:
            print(f"Error transcribing segment {i}: {e}")
        except sr.UnknownValueError:
            print(f"Could not transcribe segment {i}")

This code is designed to transcribe an audio file into text using Google’s Speech Recognition service.

If the audio file is too long, the code will split it into smaller segments and transcribe each one individually to avoid connection issues or timeouts.

Note: Splitting a long audio file into smaller segments helps prevent errors related to file size or request time limits, increasing the chances of successful transcription. Each segment is processed and transcribed separately, and the results are displayed for each one.

Before running the program for the first time, make sure you’ve installed the necessary external modules.

I’m working in a Python 3.4 environment, but any version above this should work fine.

To install the required modules, simply run the following command in the terminal:

pip install --upgrade SpeechRecognition pydub

The audio file must be in WAV format.

audio = AudioSegment.from_wav("/path/audio.wav")

The code loads the WAV audio file into an `AudioSegment` object, which allows for easy manipulation of the audio.

Next, the program splits the audio into segments:

segment_length = 30 * 1000 # in milliseconds
segments = [audio[i:i+segment_length] for i in range(0, len(audio), segment_length)]

The audio file is divided into 30-second segments (or a different duration if you prefer). Each segment is stored in a `segments` list.

Each segment is then exported as a temporary WAV file and loaded using `speech_recognition`.

The code attempts to transcribe the audio into text using Google’s service.

Finally, the transcribed text for each segment is printed to the console.

for i, segment in enumerate(segments):
    segment.export(f"segment_{i}.wav", format="wav")
    with sr.AudioFile(f"segment_{i}.wav") as source:
        audio_data = recognizer.record(source)
        try:
            text = recognizer.recognize_google(audio_data, language='en-US')
            print(f"Segment {i}: {text}")
        except sr.RequestError as e:
            print(f"Error transcribing segment {i}: {e}")
        except sr.UnknownValueError:
            print(f"Could not transcribe segment {i}")

The code handles potential connection errors (`RequestError`) and speech recognition errors (`UnknownValueError`), allowing the transcription process to continue even if a segment cannot be transcribed correctly.

What About MP3 Files?

If the source audio file is in MP3 format, you can convert it to a temporary WAV file directly in the code before processing.

Replace the following line:

audio = AudioSegment.from_wav("audio.wav")

with this one, which reads the MP3 file and converts it to WAV:

audio = AudioSegment.from_mp3(audio_path)
audio.export("audio.wav", format="wav")

Alternatively, you can manually convert the MP3 to WAV using FFmpeg from the terminal:

ffmpeg -i audio.mp3 -ar 16000 -ac 1 audio.wav

What If the Source File Is a Video?

In this case, you can extract the audio in WAV format from the video using FFmpeg.

For example, here’s the command to extract audio from a WEBM video file:

ffmpeg -i video_name.webm -vn -acodec pcm_s16le -ar 44100 -ac 2 audio.wav