Skip to content

Conversation

@zealoushacker
Copy link
Collaborator

Pull Request

Description

This PR adds a comprehensive cookbook demonstrating how to build a low-latency voice assistant by integrating ElevenLabs' speech processing capabilities with Claude's conversational AI. The cookbook progressively optimizes for real-time performance, teaching developers how to minimize latency through various streaming techniques.

What this cookbook demonstrates:

  • Integration of ElevenLabs speech-to-text and text-to-speech APIs with Claude
  • Step-by-step optimization techniques to reduce latency in voice applications
  • Comparison of different streaming approaches (HTTP streaming vs WebSocket streaming)
  • Production-ready implementation of a continuous conversational voice assistant

Type of Change

  • New cookbook
  • Bug fix (fixes an issue in existing cookbook)
  • Documentation update
  • Code quality improvement (refactoring, optimization)
  • Dependency update
  • Other (please describe):

Cookbook Checklist (if applicable)

  • Cookbook has a clear, descriptive title
  • Includes a problem statement or use case description
  • Code is well-commented and easy to follow
  • Includes expected outputs or results

Testing

  • I have tested this cookbook/change locally
  • All cells execute without errors

Additional Context

This cookbook includes two main components:

  1. Interactive Notebook (low_latency_stt_claude_tts.ipynb) - A tutorial-style notebook that walks through building a voice assistant step-by-step, demonstrating various optimization techniques with performance metrics at each stage.

  2. Production WebSocket Script (stream_voice_assistant_websocket.py) - A fully functional voice assistant using WebSocket streaming for minimal latency, featuring continuous microphone input and gapless audio playback.

The cookbook is particularly valuable for developers building real-time voice applications who need to understand the tradeoffs between different streaming approaches and how to optimize for latency.

Key features:

  • Compatible with ElevenLabs free tier
  • Includes comprehensive setup instructions and environment configuration
  • Well-documented code with security best practices (environment variables, input validation)
  • Performance comparisons between different implementation approaches

Adriaan-ANT and others added 9 commits October 21, 2025 06:11
- Interactive voice assistant with enter-to-start/stop recording
- ElevenLabs speech-to-text using Scribe model
- Claude Haiku 4.5 for intelligent responses
- WebSocket streaming TTS for minimal latency
- Comprehensive notebook demonstrating optimization techniques
- API key validation and placeholder setup

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Security improvements:
- Replace hardcoded API keys with environment variables
- Add .env.example template with setup instructions
- Add python-dotenv dependency for environment management

Code quality improvements:
- Add missing docstring to on_close function
- Extract magic numbers to named constants in AudioQueue class
- Make voice ID dynamically fetched from available voices
- Make TTS model and output format configurable constants

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Use dynamically selected VOICE_ID variable instead of hardcoded
voice ID in the "Generate Input Audio" section for consistency.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Remove redundant resource links and streamline documentation references to focus on main website and API overview.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
- Change sounddevice requirement from >=0.5.2 to >=0.5.1 to fix installation issues (version 0.5.2 doesn't exist)
- Update sentence-by-sentence streaming cell to use mp3_44100_128 format instead of pcm_44100 (free tier compatible)
- Add pip upgrade cell to notebook for better package management
- Clean up notebook cell execution outputs

Co-Authored-By: ashprabaker <[email protected]>
Added a detailed "How to Use This Cookbook" section that guides users through:
- Step 1: Environment setup with API keys and dependencies
- Step 2: Working through the notebook to learn concepts
- Step 3: Running the production script for hands-on experience

Also expanded the "More About ElevenLabs" section with additional resources including Voice Library, API Playground, and SDK links.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
The script was using pcm_44100 format which requires ElevenLabs Pro tier,
causing WebSocket connections to close with error 1008. Fixed by:

- Changed TTS_OUTPUT_FORMAT from pcm_44100 to mp3_44100_128 (free tier)
- Added pydub dependency for MP3 decoding
- Updated AudioQueue.add() to decode MP3 chunks before playback
- Enhanced WebSocket close handler to log error details
- Updated docstring to reflect MP3 format usage

The script now works with free tier ElevenLabs accounts.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@github-actions
Copy link

github-actions bot commented Nov 1, 2025

Summary

Status Count
🔍 Total 2
✅ Successful 0
⏳ Timeouts 0
🔀 Redirected 0
👻 Excluded 1
❓ Unknown 0
🚫 Errors 1
⛔ Unsupported 0

Errors per input

Errors in temp_md/low_latency_stt_claude_tts.md

@github-actions
Copy link

github-actions bot commented Nov 1, 2025

Model Check Results ✅

I've reviewed the Claude model usage in the changed files for this PR.

Files Reviewed

  • third_party/ElevenLabs/README.md
  • third_party/ElevenLabs/low_latency_stt_claude_tts.ipynb
  • third_party/ElevenLabs/stream_voice_assistant_websocket.py

Model References Found

All files use: claude-haiku-4-5

Analysis

Status: ✅ All model references are valid and follow best practices

The code uses claude-haiku-4-5, which is the official API alias for Claude Haiku 4.5 (full model ID: claude-haiku-4-5-20251001). This is the recommended approach as:

  • ✅ It's a current public model from the latest generation
  • ✅ It's not deprecated
  • ✅ It uses the generation alias format which automatically points to the latest version
  • ✅ This provides better maintainability without needing to update date-stamped model IDs

References

Validated against the current model list at: https://docs.claude.com/en/docs/about-claude/models/overview.md

No changes needed! 🎉

@zealoushacker
Copy link
Collaborator Author

@Adriaan-ANT thanks for an awesome cookbook!

Copy link
Contributor

@Adriaan-ANT Adriaan-ANT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @zealoushacker!

@Adriaan-ANT Adriaan-ANT merged commit 279a36e into main Nov 1, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants