This Python script automates speech-to-text transcription for multiple audio files using OpenAI Whisper.
It scans an audio/ folder for supported files, transcribes them, and saves all results into a single CSV file.
- Automatically detects all audio files in the
audio/folder - Uses OpenAI’s Whisper model for accurate transcription
- Supports multiple audio formats (
.mp3,.wav,.m4a,.flac,.ogg,.webm) - Saves transcripts neatly in a
transcripts.csvfile - Flushes results to disk after every file (prevents data loss)
- Displays progress and elapsed time
- Python 3.8+
- Dependencies:
pip install openai-whisper pip install torch # required backend for Whisper
(If using GPU, install the CUDA-enabled version of PyTorch)
If you want to manually install Whisper from source, follow these steps:
Download Whisper from GitHub: 🔗 https://github.com/openai/whisper
Extract the downloaded ZIP and navigate to the folder: whisper-main
Install locally using PowerShell or VS Code terminal:
cd C:\Users\User\Downloads\whisper-main
pip install -e .Alternatively, install dependencies manually:
pip install -r requirements.txtOR
pip install openai-whisper✅ This automatically installs all required dependencies, including:
numba
numpy
torch
tqdm
more-itertools
tiktoken
You do not need to install these manually unless building from source.
Download FFmpeg from: 🔗 https://ffmpeg.org/download.html#build-windows
Extract the files and move the folder to:
C:\ffmpegAdd FFmpeg to your system environment variables:
Go to: Control Panel → System → Advanced System Settings → Environment Variables
Under User variables, find Path
Add:
C:\ffmpeg\binVerify installation:
ffmpeg -versionproject/
│
├── audio/ # Folder containing your audio files
│ ├── example1.mp3
│ ├── example2.wav
│
├── transcripts.csv # Generated output (after running the script)
│
└── transcribe.py # The main script
Clone the repository:
git clone https://github.com/monojitbgit/whisper-batch-transcriber.git
cd whisper-batch-transcriberPlace your audio files inside the audio/ folder.
Run the script:
python transcribe.pyBy default, the script loads the medium Whisper model:
model = whisper.load_model("medium")To use a different model:
model = whisper.load_model("small")Change the transcription language by adjusting:
transcribe_file(file_path, language="en")(Replace "en" with your desired ISO language code.)
Whisper runs locally — no API key required
For faster performance, use a GPU (CUDA-compatible)
Large models need more memory — try small or base if you get CUDA out of memory
| Option | Speed | Accuracy | Model | ⏱️ CPU (Intel-i7/Ryzen-7) | ⚙️ GPU (RTX 3060/3090/4090) | Size | Speed (Relative) | Accuracy (Relative) | 🌐 Multilingual Support |
|---|---|---|---|---|---|---|---|---|---|
Use "tiny" |
Fast | Low accuracy | tiny |
~30 sec | ~5–10 sec | 75 MB | ⭐⭐⭐⭐ (Fastest) | ⭐ (Basic) | ✅ Yes |
Use "base" |
Fast | Low accuracy | base |
~1 min | ~10–20 sec | 142 MB | ⭐⭐⭐ (Fast) | ⭐⭐ (Good) | ✅ Yes |
Use "small" with CPU optimizations |
Faster | Good | small |
~2–3 min | ~20–30 sec | 466 MB | ⭐⭐ (Medium) | ⭐⭐⭐ (Better) | ✅ Yes |
Use "medium" after reducing file size |
Slow | Better | medium |
~5–6 min | ~40–60 sec | 1.5 GB | ⭐ (Slower) | ⭐⭐⭐⭐ (Very Good) | ✅ Yes |
Use Google Colab GPU (medium / large) |
Super Fast | Best | large |
~10–15 min | ~1.5–2 min | 3.0 GB | ⭐ (Slowest) | ⭐⭐⭐⭐⭐ (Best) | ✅ Yes |
- 🧠 For quick tests: Use
tinyorbase - ⚖️ For balance (speed + accuracy): Use
small - 🗣️ For high-quality transcription: Use
medium - 🚀 For best performance: Use
largeon GPU or Google Colab
This project is licensed under the MIT License.
You’re free to use, modify, and distribute it with attribution.