How to Push to a GitHub Repository in IntelliJ

Image
1. Initialize and Connect the Git Repository # Run in the terminal from the project root git init git remote add origin https://github.com/[user]/[repository].git 2. Configure Git in IntelliJ Select VCS → Enable Version Control Integration . Choose Git and click OK . 3. Connect Your GitHub Account Go to File → Settings (on Windows) or IntelliJ IDEA → Preferences (on macOS). Navigate to Version Control → GitHub . Click Add Account ( + ). Select Log In with Token... and enter your GitHub Personal Access Token. 4. Add and Commit Files Go to VCS → Git → Add (or use the shortcut Ctrl+Alt+A ). Select the files you want to commit. Go to VCS → Commit (or use the shortcut Ctrl+K ). Write a commit message and click Commit . 5. Push Go to VCS → Git → Push (or use the shortcut Ctrl+Shift+K ). Click the Push button. Simpler Method (Using IntelliJ's Built-in Feature) Go to VCS → Share Project on GitHub . Set the repository name to vita-user-...

Speech-to-Text and Speaker Diarization with Whisper and pyannote.audio 3.1 on Ubuntu 24.04

open ai whisper

If you want to generate subtitles with speaker labels from an audio file using Whisper and pyannote.audio on Ubuntu 24.04, this blog post walks you through the full setup process.

Step 1: Python 3.10 Installation (Manual)

Ubuntu 24.04 comes with Python 3.12 by default, but pyannote.audio 3.1.1 was trained with torch==1.13.1 and CUDA 11.7, which supports Python 3.10. So I need to install Python 3.10:

Step 2: Create and Activate a Python Virtual Environment

stt@GU502DU:~$ python3.10 -m venv whisper_env
stt@GU502DU:~$ source whisper_env/bin/activate

Step 3: Install torch 2.5.1 and cuda 12.1

My laptop is equipped with an NVIDIA GeForce GTX 1660, so I chose to install and use CUDA. Since the GPU driver reports CUDA Version 12.8, I installed the cu121 build of PyTorch, which is compatible with it.

(whisper_env) stt@GU502DU:~$ pip install torch==2.5.1+cu121 torchaudio==2.2.1+cu121 \
  --extra-index-url https://download.pytorch.org/whl/cu121

Note on CUDA Compatibility

To ensure compatibility between your installed CUDA version and the PyTorch build, run the following command:

(diarization_env) stt@GU502DU:~$ nvidia-smi
+------------------------------------------------------------------------+
| NVIDIA-SMI 570.133.07  Driver Version: 570.133.07  CUDA Version: 12.8  |
+------------------------------------------------------------------------+

Make sure that the CUDA version reported here (e.g., CUDA Version: 12.8) is greater than or equal to the CUDA version used by the installed PyTorch build (in this case, cu121 = CUDA 12.1).

Step 4: pyannote.audio and dependencies:

(whisper_env) stt@GU502DU:~$ pip install pyannote.audio==3.1.1 numpy scipy librosa huggingface_hub

Step 5: Install Whisper (with GPU support)

(whisper_env) stt@GU502DU:~$ pip install git+https://github.com/openai/whisper.git --upgrade --no-deps

I use the --no-deps option to prevent pip from overriding already-installed packages like torch.

Step 6: Hugging Face Access Token

I create a Hugging Face access token from hf.co/settings/tokens and use it in my code:

from huggingface_hub import login
login("hf_your_token_here")

Step 7: Accept User Conditions for the Following Models

I visit the following URLs and click “Agree” to accept the terms:

Step 8: Run the following code (Whisper + pyannote.audio)

(whisper_env) stt@GU502DU:~$ python3 speaker_test4.py
from pyannote.audio import Pipeline
import whisper
import json

# Diarization
pipeline = Pipeline.from_pretrained(
    "pyannote/speaker-diarization-3.1",
    use_auth_token="hf_your_token"
)
diarization = pipeline("test.wav")

# STT
model = whisper.load_model("base")
whisper_result = model.transcribe("test.wav")

# Merge
def get_speaker(start_time, end_time, diarization):
    for turn, _, speaker in diarization.itertracks(yield_label=True):
        overlap = max(0, min(end_time, turn.end) - max(start_time, turn.start))
        if overlap > (end_time - start_time) * 0.5:
            return speaker
    return "Unknown"

merged = []
for seg in whisper_result["segments"]:
    speaker = get_speaker(seg["start"], seg["end"], diarization)
    merged.append({
        "speaker": speaker,
        "start": seg["start"],
        "end": seg["end"],
        "text": seg["text"]
    })

with open("test_stt_merged.json", "w") as f:
    json.dump(merged, f, indent=2)

Output

This generates test_stt_merged.json that combines speaker info and transcribed text.

{
  "speaker": "SPEAKER_00",
  "start": 11.200000000000001,
  "end": 19.36,
  "text": " you this morning? Um, I just had some, um, diary for the last three days. Um, and it's"
},
{
  "speaker": "SPEAKER_00",
  "start": 19.36,
  "end": 24.16,
  "text": " been affecting me. I need to stay close to the toilet and, um, yeah, it's been affecting"
},

Comments

Popular posts from this blog

Resolving Key Exchange Failure When Connecting with SecureCRT to OpenSSH

SecureCRT] How to Back Up and Restore SecureCRT Settings on Windows

How to Set Up Vaultwarden (Bitwarden) on Synology NAS (Best Free Alternative to LastPass)