Software Engineer's Blog

Posts

Showing posts with the label Wav2Vec2

Speech-to-Text and Speaker Diarization with Wav2Vec2 and pyannote.audio 3.1 on Ubuntu 24.04

- May 07, 2025

In this post, I walk through how to use Wav2Vec2 for speech-to-text (STT) and pyannote.audio 3.1 for speaker diarization on Ubuntu 24.04. I'll use a Python virtual environment with Python 3.10, CUDA GPU acceleration, and Hugging Face models. Step 1: Python 3.10 Installation Ubuntu 24.04 comes with Python 3.12 by default, but since pyannote.audio 3.1.1 was trained with torch==1.13.1 and CUDA 11.7, which support Python 3.10, I need to install Python 3.10: How To Install Python 3.10 on Ubuntu 24.04 Step 2: Create Virtual Environment stt@GU502DU:~$ python3.10 -m venv wav2vec2_env stt@GU502DU:~$ source wav2vec2_env/bin/activate Step 3: Install torch 2.5.1 and cuda 12.1 My laptop is equipped with an NVIDIA GeForce GTX 1660, so I chose to install and use CUDA. Since the GPU driver reports CUDA Version 12.8, I installed the cu121 build of PyTorch, which is compatible with it. (wav2vec2_env) stt@GU502DU:~$ pip install torch==2.5.1+cu121 torchaudio==2.2.1+cu121 \ --extr...

Search This Blog

Software Engineer's Blog

Posts

Changing the Default Terminal to Terminator on Ubuntu

Speech-to-Text and Speaker Diarization with Wav2Vec2 and pyannote.audio 3.1 on Ubuntu 24.04