v0.0.50
Added
-
Added
GeminiMultimodalLiveLLMService. This is an integration for Google's Gemini Multimodal Live API, supporting:- Real-time audio and video input processing
- Streaming text responses with TTS
- Audio transcription for both user and bot speech
- Function calling
- System instructions and context management
- Dynamic parameter updates (temperature, top_p, etc.)
-
Added
AudioTranscriberutility class for handling audio transcription with Gemini models. -
Added new context classes for Gemini:
GeminiMultimodalLiveContextGeminiMultimodalLiveUserContextAggregatorGeminiMultimodalLiveAssistantContextAggregatorGeminiMultimodalLiveContextAggregatorPair
-
Added new foundational examples for
GeminiMultimodalLiveLLMService:26-gemini-multimodal-live.py26a-gemini-multimodal-live-transcription.py26b-gemini-multimodal-live-video.py26c-gemini-multimodal-live-video.py
-
Added
SimliVideoService. This is an integration for Simli AI avatars.
(see https://www.simli.com) -
Added NVIDIA Riva's
FastPitchTTSServiceandParakeetSTTService.
(see https://www.nvidia.com/en-us/ai-data-science/products/riva/) -
Added
IdentityFilter. This is the simplest frame filter that lets through all incoming frames. -
New
STTMuteStrategycalledFUNCTION_CALLwhich mutes the STT service during LLM function calls. -
DeepgramSTTServicenow exposes two event handlerson_speech_startedandon_utterance_endthat could be used to implement interruptions. See new exampleexamples/foundational/07c-interruptible-deepgram-vad.py. -
Added
GroqLLMService,GrokLLMService, andNimLLMServicefor Groq, Grok, and NVIDIA NIM API integration, with an OpenAI-compatible interface. -
New examples demonstrating function calling with Groq, Grok, Azure OpenAI, Fireworks, and NVIDIA NIM:
14f-function-calling-groq.py,14g-function-calling-grok.py,14h-function-calling-azure.py,14i-function-calling-fireworks.py, and14j-function-calling-nvidia.py. -
In order to obtain the audio stored by the
AudioBufferProcessoryou can now also register anon_audio_dataevent handler. Theon_audio_datahandler will be called every timebuffer_size(a new constructor argument) is reached. Ifbuffer_sizeis 0 (default) you need to manually get the audio as before usingAudioBufferProcessor.merge_audio_buffers().
@audiobuffer.event_handler("on_audio_data")
async def on_audio_data(processor, audio, sample_rate, num_channels):
await save_audio(audio, sample_rate, num_channels)
- Added a new RTVI message called
disconnect-bot, which when handled pushes anEndFrameto trigger the pipeline to stop.
Changed
-
STTMuteFilternow supports multiple simultaneous muting strategies. -
XTTSServicelanguage now defaults toLanguage.EN. -
SoundfileMixerdoesn't resample input files anymore to avoid startup delays. The sample rate of the provided sound files now need to match the sample rate of the output transport. -
Input frames (audio, image and transport messages) are now system frames. This means they are processed immediately by all processors instead of being queued internally.
-
Expanded the transcriptions.language module to support a superset of languages.
-
Updated STT and TTS services with language options that match the supported languages for each service.
-
Updated the
AzureLLMServiceto use theOpenAILLMService. Updated theapi_versionto2024-09-01-preview. -
Updated the
FireworksLLMServiceto use theOpenAILLMService. Updated the default model toaccounts/fireworks/models/firefunction-v2. -
Updated the
simple-chatbotexample to include a Javascript and React client example, using RTVI JS and React.
Removed
- Removed
AppFrame. This was used as a special user custom frame, but there's actually no use case for that.
Fixed
-
Fixed a
ParallelPipelineissue that would cause system frames to be queued. -
Fixed
FastAPIWebsocketTransportso it can work with binary data (e.g. using the protobuf serializer). -
Fixed an issue in
CartesiaTTSServicethat could cause previous audio to be received after an interruption. -
Fixed Cartesia, ElevenLabs, LMNT and PlayHT TTS websocket reconnection. Before, if an error occurred no reconnection was happening.
-
Fixed a
BaseOutputTransportissue that was causing audio to be discarded after anEndFramewas received. -
Fixed an issue in
WebsocketServerTransportandFastAPIWebsocketTransportthat would cause a busy loop when using audio mixer. -
Fixed a
DailyTransportandLiveKitTransportissue where connections were being closed in the input transport prematurely. This was causing frames queued inside the pipeline being discarded. -
Fixed an issue in
DailyTransportthat would cause some internal callbacks to not be executed. -
Fixed an issue where other frames were being processed while a
CancelFramewas being pushed down the pipeline. -
AudioBufferProcessornow handles interruptions properly. -
Fixed a
WebsocketServerTransportissue that would prevent interruptions withTwilioSerializerfrom working. -
DailyTransport.capture_participant_videonow allows capturing user's screen share by simply passingvideo_source="screenVideo". -
Fixed Google Gemini message handling to properly convert appended messages to Gemini's required format.
-
Fixed an issue with
FireworksLLMServicewhere chat completions were failing by removing thestream_optionsfrom the chat completion options.