MediVoiceScribe

Clinical Speech-to-Structured-Report AI - A FastAPI web application that converts doctor's voice into organized medical documentation using advanced speech-to-text and Large Language Models.

Features

Audio Recording: Record audio directly in the browser using the microphone
Audio File Upload: Upload existing audio files in various medical industry formats (MP3, MP4, WAV, M4A, OGG, WebM, etc.)
Speech-to-Text: Uses Faster Whisper (local, open-source Whisper model) for accurate transcription
Report Generation: Uses free-tier LLMs via OpenRouter to generate structured medical reports

Requirements

Python 3.8+
Microphone access (for recording feature)
OpenRouter API key (for free LLM access)

Installation

Clone this repository
Create a virtual environment:
```
python3 -m venv venv
```

Activate the virtual environment:

source venv/bin/activate  # On macOS/Linux
# venv\Scripts\activate on Windows

Install dependencies:
```
pip install -r requirements.txt
```
Set up environment variables: Create a .env file or set the environment variable:
```
export OPENROUTER_API_KEY=your_openrouter_api_key_here
```

Usage

Run the application:
```
uvicorn app:app --reload
```
Open your browser and go to http://127.0.0.1:8000
Choose one of two options:
- Upload Audio File: Select an existing audio file and click "Upload and Process"
- Record Audio: Click "Start Recording", speak into your microphone, then click "Stop Recording"
View the transcription and generated medical report

Configuration

Whisper Model

The app uses Faster Whisper with the "medium" model by default. You can change this in app.py by modifying:

whisper_model = WhisperModel("medium", device="cpu", compute_type="int8")

Available models: tiny, base, small, medium, large-v1, large-v2

LLM Provider

Uses OpenRouter with Google's Gemini 2.0 Flash (free tier) by default. Change the model in app.py:

default_model = "google/gemini-2.0-flash-exp:free"

Other free models available: meta-llama/llama-3.2-3b-instruct:free

API Endpoints

GET /: Main web interface
POST /process: Process audio file and return transcription + report
- Accepts: multipart/form-data with file field
- Returns: JSON with transcription and report fields

Security Notes

The Whisper model runs locally, so audio files are not sent to external services
Only the transcription text is sent to OpenRouter (no original audio)
Consider HIPAA compliance and local data privacy regulations

Troubleshooting

Common Issues:

"Error accessing microphone": Browser permissions denied. Allow microphone access and ensure HTTPS for production
"Report generation failed": Check OPENROUTER_API_KEY is set correctly
Whisper errors: Ensure sufficient RAM/storage for the model (medium model requires ~2GB RAM)

Performance:

First transcription may be slow due to model loading
Large audio files may require more memory
GPU acceleration available if CUDA installed (change device="cuda" in WhisperModel)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
model_providers.py		model_providers.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MediVoiceScribe

Features

Requirements

Installation

Usage

Configuration

Whisper Model

LLM Provider

API Endpoints

Security Notes

Troubleshooting

Common Issues:

Performance:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

julka01/MediVoiceScribe

Folders and files

Latest commit

History

Repository files navigation

MediVoiceScribe

Features

Requirements

Installation

Usage

Configuration

Whisper Model

LLM Provider

API Endpoints

Security Notes

Troubleshooting

Common Issues:

Performance:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages