-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Add ElevenLabs Low Latency Voice Assistant Integration #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Interactive voice assistant with enter-to-start/stop recording - ElevenLabs speech-to-text using Scribe model - Claude Haiku 4.5 for intelligent responses - WebSocket streaming TTS for minimal latency - Comprehensive notebook demonstrating optimization techniques - API key validation and placeholder setup 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Security improvements: - Replace hardcoded API keys with environment variables - Add .env.example template with setup instructions - Add python-dotenv dependency for environment management Code quality improvements: - Add missing docstring to on_close function - Extract magic numbers to named constants in AudioQueue class - Make voice ID dynamically fetched from available voices - Make TTS model and output format configurable constants 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Use dynamically selected VOICE_ID variable instead of hardcoded voice ID in the "Generate Input Audio" section for consistency. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Remove redundant resource links and streamline documentation references to focus on main website and API overview. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
- Change sounddevice requirement from >=0.5.2 to >=0.5.1 to fix installation issues (version 0.5.2 doesn't exist) - Update sentence-by-sentence streaming cell to use mp3_44100_128 format instead of pcm_44100 (free tier compatible) - Add pip upgrade cell to notebook for better package management - Clean up notebook cell execution outputs Co-Authored-By: ashprabaker <[email protected]>
Added a detailed "How to Use This Cookbook" section that guides users through: - Step 1: Environment setup with API keys and dependencies - Step 2: Working through the notebook to learn concepts - Step 3: Running the production script for hands-on experience Also expanded the "More About ElevenLabs" section with additional resources including Voice Library, API Playground, and SDK links. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
The script was using pcm_44100 format which requires ElevenLabs Pro tier, causing WebSocket connections to close with error 1008. Fixed by: - Changed TTS_OUTPUT_FORMAT from pcm_44100 to mp3_44100_128 (free tier) - Added pydub dependency for MP3 decoding - Updated AudioQueue.add() to decode MP3 chunks before playback - Enhanced WebSocket close handler to log error details - Updated docstring to reflect MP3 format usage The script now works with free tier ElevenLabs accounts. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]>
Summary
Errors per inputErrors in temp_md/low_latency_stt_claude_tts.md
|
Model Check Results ✅I've reviewed the Claude model usage in the changed files for this PR. Files Reviewed
Model References FoundAll files use: AnalysisStatus: ✅ All model references are valid and follow best practices The code uses
ReferencesValidated against the current model list at: https://docs.claude.com/en/docs/about-claude/models/overview.md No changes needed! 🎉 |
|
@Adriaan-ANT thanks for an awesome cookbook! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @zealoushacker!
Pull Request
Description
This PR adds a comprehensive cookbook demonstrating how to build a low-latency voice assistant by integrating ElevenLabs' speech processing capabilities with Claude's conversational AI. The cookbook progressively optimizes for real-time performance, teaching developers how to minimize latency through various streaming techniques.
What this cookbook demonstrates:
Type of Change
Cookbook Checklist (if applicable)
Testing
Additional Context
This cookbook includes two main components:
Interactive Notebook (
low_latency_stt_claude_tts.ipynb) - A tutorial-style notebook that walks through building a voice assistant step-by-step, demonstrating various optimization techniques with performance metrics at each stage.Production WebSocket Script (
stream_voice_assistant_websocket.py) - A fully functional voice assistant using WebSocket streaming for minimal latency, featuring continuous microphone input and gapless audio playback.The cookbook is particularly valuable for developers building real-time voice applications who need to understand the tradeoffs between different streaming approaches and how to optimize for latency.
Key features: