Releases: pipecat-ai/pipecat
v0.0.93
Added
-
Added support for Sarvam Speech-to-Text service (
SarvamSTTService) with streaming WebSocket support forsaarika(STT) andsaaras(STT-translate) models. -
Added support for passing in a
ToolsSchemain lieu of a list of provider- specific dicts when initializingOpenAIRealtimeLLMServiceor when updating it usingLLMUpdateSettingsFrame. -
Added
TransportParams.audio_out_silence_secs, which specifies how many seconds of silence to output when anEndFramereaches the output transport. This can help ensure that all audio data is fully delivered to clients. -
Added new
FrameProcessor.broadcast_frame()method. This will push two instances of a given frame class, one upstream and the other downstream.await self.broadcast_frame(UserSpeakingFrame)
-
Added
MetricsLogObserverfor logging performance metrics fromMetricsFrameinstances. Supports filtering viainclude_metricsparameter to control which metrics types are logged (TTFB, processing time, LLM token usage, TTS usage, smart turn metrics). -
Added
pronunciation_dictionary_locatorstoElevenLabsTTSServiceandElevenLabsHttpTTSService. -
Added support for loading external observers. You can now register custom pipeline observers by setting the
PIPECAT_OBSERVER_FILESenvironment variable. This variable should contain a colon-separated list of Python files (e.g.export PIPECAT_OBSERVER_FILES="observer1.py:observer2.py:..."). Each file must define a function with the following signature:async def create_observers(task: PipelineTask) -> Iterable[BaseObserver]: ...
-
Added support for new sonic-3 languages in
CartesiaTTSServiceandCartesiaHttpTTSService. -
EndFrameandEndTaskFramehave an optionalreasonfield to indicate why the pipeline is being ended. -
CancelFrameandCancelTaskFramehave an optionalreasonfield to indicate why the pipeline is being canceled. This can be also specified when you cancel a task withPipelineTask.cancel(reason="cancellation reason"). -
Added
include_prob_metricsparameter to Whisper STT services to enable access to probability metrics from transcription results. -
Added utility functions
extract_whisper_probability(),extract_openai_gpt4o_probability(), andextract_deepgram_probability()to extract probability metrics fromTranscriptionFrameobjects for Whisper-based, OpenAI GPT-4o-transcribe, and Deepgram STT services respectively. -
Added
LLMSwitcher.register_direct_function(). It works much likeLLMSwitcher.register_function()in that it's a shorthand for registering functions on all LLMs in the switcher, but for direct functions. -
Added
LLMSwitcher.register_direct_function(). It works much likeLLMSwitcher.register_function()in that it's a shorthand for registering a function on all LLMs in the switcher, except this new method takes a direct function (aFunctionSchema-less function). -
Added
MCPClient.get_tools_schema()andMCPClient.register_tools_schema()as a two-step alternative toMCPClient.register_tools(), to allow users to pass MCP tools to, say,GeminiLiveLLMService(as well as other speech-to-speech services) in the constructor. -
Added support for passing in an
LLMSwichertoMCPClient.register_tools()(as well as the newMCPClient.register_tools_schema()). -
Added
cpu_countparameter toLocalSmartTurnAnalyzerV3. This is set to1by default for more predictable performance on low-CPU systems.
Changed
-
Improved
concatenate_aggregated_text()to one word outputs from OpenAI Realtime and Gemini Live. Text fragments are now correctly concatenated without spaces when these patterns are detected. -
STTMuteFilterno longer sendsSTTMuteFrameto the STT service. The filter now blocks frames locally without instructing the STT service to stop processing audio. This prevents inactivity-related errors (such as 409 errors from Google STT) while maintaining the same muting behavior at the application level. Important: The STTMuteFilter should be placed after the STT service itself. -
Improved
GoogleSTTServiceerror handling to properly catch gRPCAbortedexceptions (corresponding to 409 errors) caused by stream inactivity. These exceptions are now logged at DEBUG level instead of ERROR level, since they indicate expected behavior when no audio is sent for 10+ seconds (e.g., during long silences or when audio input is blocked). The service automatically reconnects when this occurs. -
Bumped the
fastapidependency's upperbound to<0.122.0. -
Updated the default model for
GoogleVertexLLMServicetogemini-2.5-flash. -
Updated the
GoogleVertexLLMServiceto use theGoogleLLMServiceas a base
class instead of theOpenAILLMService. -
Updated STT and TTS services to pass through unverified language codes with a warning instead of returning None. This allows developers to use newly supported languages before Pipecat's service classes are updated, while still providing guidance on verified languages.
Removed
- Removed
needs_mcp_alternate_schema()fromLLMService. The mechanism that relied on it went away.
Fixed
-
Restore backwards compatibility for vision/image features (broken in 0.0.92) when using non-universal context and assistant aggregators.
-
Fixed
DeepgramSTTService._disconnect()to properly awaitis_connected()method call, which is an async coroutine in the Deepgram SDK. -
Fixed an issue where the
SmallWebRTCRequestdataclass in runner would scrub arbitrary request data from client due to camelCase typing. This fixes data passthrough for JS clients whereAPIRequestis used. -
Fixed a bug in
GeminiLiveLLMServicewhere in some circumstances it wouldn't respond after a tool call. -
Fixed
GeminiLiveLLMServicesession resumption after a connection timeout. -
GeminiLiveLLMServicenow properly supports context-provided system instruction and tools. -
Fixed
GoogleLLMServicetoken counting to avoid double-counting tokens when Gemini sends usage metadata across multiple streaming chunks.
v0.0.92
🎃 The Haunted Edition 👻
Added
-
Added a new
DeepgramHttpTTSService, which delivers a meaningful reduction in latency when compared to theDeepgramTTSService. -
Add support for
speaking_rateinput parameter inGoogleHttpTTSService. -
Added
enable_speaker_diarizationandenable_language_identificationtoSonioxSTTService. -
Added
SpeechmaticsTTSService, which uses Speechmatic's TTS API. Updated examples 07a* to use the new TTS service. -
Added support for including images or audio to LLM context messages using
LLMContext.create_image_message()orLLMContext.create_image_url_message()(not all LLMs support URLs) andLLMContext.create_audio_message(). For example, when creatingLLMMessagesAppendFrame:message = LLMContext.create_image_message(image=..., size= ...) await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True))
-
New event handlers for the
DeepgramFluxSTTService:on_start_of_turn,on_turn_resumed,on_end_of_turn,on_eager_end_of_turn,on_update. -
Added
generation_configparameter support toCartesiaTTSServiceandCartesiaHttpTTSServicefor Cartesia Sonic-3 models. Includes a newGenerationConfigclass withvolume(0.5-2.0),speed(0.6-1.5), andemotion(60+ options) parameters for fine-grained speech generation control. -
Expanded support for univeral
LLMContexttoOpenAIRealtimeLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
(Note that even though
OpenAIRealtimeLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime withLLMSwitcher.)Note:
TranscriptionFrames andInterimTranscriptionFrames now go upstream fromOpenAIRealtimeLLMService, so if you're usingTranscriptProcessor, say, you'll want to adjust accordingly:pipeline = Pipeline( [ transport.input(), context_aggregator.user(), # BEFORE llm, transcript.user(), # AFTER transcript.user(), llm, transport.output(), transcript.assistant(), context_aggregator.assistant(), ] )
Also worth noting: whether or not you use the new context-setup pattern with
OpenAIRealtimeLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: OpenAIContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: OpenAIRealtimeLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext
Also note that
RealtimeMessagesUpdateFrameandRealtimeFunctionCallResultFramehave been deprecated, since they're no longer used byOpenAIRealtimeLLMService. OpenAI Realtime now works more like other LLM services in Pipecat, relying on updates to its context, pushed by context aggregators, to update its internal state. Listen forLLMContextFrames for context updates.Finally,
LLMTextFrames are no longer pushed fromOpenAIRealtimeLLMServicewhen it's configured withoutput_modalities=['audio']. If you need to process its output, listen forTTSTextFrames instead. -
Expanded support for universal
LLMContexttoGeminiLiveLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
(Note that even though
GeminiLiveLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime withLLMSwitcher.)Worth noting: whether or not you use the new context-setup pattern with
GeminiLiveLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: GeminiLiveContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: GeminiLiveLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext
Also note that
LLMTextFrames are no longer pushed fromGeminiLiveLLMServicewhen it's configured withmodalities=GeminiModalities.AUDIO. If you need to process its output, listen forTTSTextFrames instead.
Changed
-
The development runner's
/startendpoint now supports passingdailyRoomPropertiesanddailyMeetingTokenPropertiesin the request body whencreateDailyRoomis true. Properties are validated against theDailyRoomPropertiesandDailyMeetingTokenPropertiestypes respectively and passed to Daily's room and token creation APIs. -
UserImageRawFramenew fieldsappend_to_contextandtext. Theappend_to_contextfield indicates if this image and text should be added to the LLM context (by the LLM assistant aggregator). Thetextfield, if set, might also guide the LLM or the vision service on how to analyze the image. -
UserImageRequestFramenew fielsappend_to_contextandtext. Both fields will be used to set the same fields on the capturedUserImageRawFrame. -
UserImageRequestFramedon't require function call name and ID anymore. -
Updated
MoondreamServiceto processUserImageRawFrame. -
VisionServiceexpectsUserImageRawFramein order to analyze images. -
DailyTransporttriggerson_errorevent if transcription can't be started or stopped. -
DailyTransportupdates:start_dialout()now returns two values:session_idanderror.start_recording()now returns two values:stream_idanderror. -
Updated
daily-pythonto 0.21.0. -
SimliVideoServicenow acceptsapi_keyandface_idparameters directly, with optionalparamsformax_session_lengthandmax_idle_timeconfiguration, aligning with other Pipecat service patterns. -
Updated the default model to
sonic-3forCartesiaTTSServiceandCartesiaHttpTTSService. -
FunctionFilternow has afilter_system_framesarg, which controls whether or not SystemFrames are filtered. -
Upgraded
aws_sdk_bedrock_runtimeto v0.1.1 to resolve potential CPU issues when runningAWSNovaSonicLLMService.
Deprecated
-
The
expect_stripped_wordsparameter ofLLMAssistantAggregatorParamsis ignored when used with the newerLLMAssistantAggregator, which now handles word spacing automatically. -
LLMService.request_image_frame()is deprecated, push aUserImageRequestFrameinstead. -
UserResponseAggregatoris deprecated and will be removed in a future version. -
The
send_transcription_framesargument toOpenAIRealtimeLLMServiceis deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details. -
Types in
pipecat.services.openai.realtime.contextandpipecat.services.openai.realtime.framesare deprecated, as they're no longer used byOpenAIRealtimeLLMService. See "Added" section for details. -
SimliVideoServicesimli_configparameter is deprecated. Useapi_keyandface_idparameters instead.
Removed
-
Removed
enable_non_final_tokensandmax_non_final_tokens_duration_msfromSonioxSTTService. -
Removed the
aiohttp_sessionarg fromSarvamTTSServiceas it's no longer used.
Fixed
-
Fixed a
PipelineTaskissue that was causing an idle timeout for frames that were being generated but not reaching the end of the pipeline. Since the exact point when frames are discarded is unknown, we now monitor pipeline frames using an observer. If the observer detects frames are being generated, it will prevent the pipeline from being considered idle. -
Fixed an issue in
HumeTTSServicethat was only using Octave 2, which does not support thedescriptionfield. Now, if a description is provided, it switches to Octave 1. -
Fixed an issue where
DailyTransportwould timeout prematurely on join and on leave. -
Fixed an issue in the runner where starting a DailyTransport room via
/startdidn't support using theDAILY_SAMPLE_ROOM_URLenv var. -
Fixed an issue in
ServiceSwitcherwhere theSTTServices would result in all STT services producingTranscriptionFrames.
Other
-
Updated all vision 12-series foundational examples to load images from a file.
-
Added 14-series video examples for different services. These new examples request an image from the user camera through a function call.
v0.0.91
Added
-
It is now possible to start a bot from the
/startendpoint when using the runner Daily's transport. This follows the Pipecat Cloud format withcreateDailyRoomandbodyfields in the POST request body. -
Added an ellipsis character (
…) to the end of sentence detection in the string utils. -
Expanded support for universal
LLMContexttoAWSNovaSonicLLMService. As a reminder, the context-setup pattern when usingLLMContextis:context = LLMContext(messages, tools) context_aggregator = LLMContextAggregatorPair(context)
(Note that even though
AWSNovaSonicLLMServicenow supports the universalLLMContext, it is not meant to be swapped out for another LLM service at runtime.)Worth noting: whether or not you use the new context-setup pattern with
AWSNovaSonicLLMService, some types have changed under the hood:## BEFORE: # Context aggregator type context_aggregator: AWSNovaSonicContextAggregatorPair # Context frame type frame: OpenAILLMContextFrame # Context type context: AWSNovaSonicLLMContext # or context: OpenAILLMContext ## AFTER: # Context aggregator type context_aggregator: LLMContextAggregatorPair # Context frame type frame: LLMContextFrame # Context type context: LLMContext
-
Added support for
bulbul:v3model inSarvamTTSServiceandSarvamHttpTTSService. -
Added
keyterms_promptparameter toAssemblyAIConnectionParams. -
Added
speech_modelparameter toAssemblyAIConnectionParamsto access the multilingual model. -
Added support for trickle ICE to the
SmallWebRTCTransport. -
Added support for updating
OpenAITTSServicesettings (instructionsandspeed) at runtime viaTTSUpdateSettingsFrame. -
Added
--whatsappflag to runner to better surface WhatsApp transport logs. -
Added
on_connectedandon_disconnectedevents to TTS and STT websocket-based services. -
Added an
aggregate_sentencesarg inElevenLabsHttpTTSService, where the default value is True. -
Added a
room_propertiesarg to the Daily runner'sconfigure()method, allowingDailyRoomPropertiesto be provided. -
The runner
--folderargument now supports downloading files from subdirectories.
Changed
-
RunnerArgumentsnow include thebodyfield, so there's no need to add it to subclasses. Also, allRunnerArgumentsfields are now keyword-only. -
CartesiaSTTServicenow inherits fromWebsocketSTTService. -
Package upgrades:
daily-pythonupgraded to 0.20.0.openaiupgraded to support up to 2.x.x.openpipeupgraded to support up to 5.x.x.
-
SpeechmaticsSTTServiceupdated dependencies forspeechmatics-rt>=0.5.0.
Deprecated
-
The
send_transcription_framesargument toAWSNovaSonicLLMServiceis deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details. -
Types in
pipecat.services.aws.nova_sonic.contexthave been deprecated due to changes to supportLLMContext. See "Changed" section for details.
Fixed
-
Fixed an issue where the
RTVIProcessorwas sending duplicateUserStartedSpeakingFrameandUserStoppedSpeakingFramemessages. -
Fixed an issue in
AWSBedrockLLMServicewhere bothtemperatureandtop_pwere always sent together, causing conflicts with models like Claude Sonnet 4.5 that don't allow both parameters simultaneously. The service now only includes inference parameters that are explicitly set, andInputParamsdefaults have been changed toNoneto rely on AWS Bedrock's built-in model defaults. -
Fixed an issue in
RivaSegmentedSTTServicewhere a runtime error occurred due to a mismatch in the_handle_transcriptionmethod's signature. -
Fixed multiple pipeline task cancellation issues.
asyncio.CancelledErroris now handled properly inPipelineTaskmaking it possible to cancel an asyncio task that it's executing aPipelineRunnercleanly. Also,PipelineTask.cancel()does not block anymore waiting for theCancelFrameto reach the end of the pipeline (going back to the behavior in < 0.0.83). -
Fixed an issue in
ElevenLabsTTSServiceandElevenLabsHttpTTSServicewhere the Flash models would split words, resulting in a space being inserted between words. -
Fixed an issue where audio filters'
stop()would not be called when usingCancelFrame. -
Fixed an issue in
ElevenLabsHttpTTSService, whereapply_text_normalizationwas incorrectly set as a query parameter. It's now being added as a request parameter. -
Fixed an issue where
RimeHttpTTSServiceandPiperTTSServicecould generate incorrectly 16-bit aligned audio frames, potentially leading to internal errors or static audio. -
Fixed an issue in
SpeechmaticsSTTServicewhereAdditionalVocabEntryitems needed to havesounds_likefor the session to start.
Other
-
Added foundational example
47-sentry-metrics.py, demonstrating how to use theSentryMetricsprocessor. -
Added foundational example
14x-function-calling-openpipe.py.
v0.0.90
Added
-
Added audio filter
KrispVivaFilterusing the Krisp VIVA SDK. -
Added
--folderargument to the runner, allowing files saved in that folder to be downloaded fromhttp://HOST:PORT/file/FILE. -
Added
GeminiLiveVertexLLMService, for accessing Gemini Live via Google Vertex AI. -
Added some new configuration options to
GeminiLiveLLMService:thinkingenable_affective_dialogproactivity
Note that these new configuration options require using a newer model than the default, like "gemini-2.5-flash-native-audio-preview-09-2025". The last two require specifying
http_options=HttpOptions(api_version="v1alpha"). -
Added
on_pipeline_errorevent toPipelineTask. This event will get fired when anErrorFrameis pushed (useFrameProcessor.push_error()).@task.event_handler("on_pipeline_error") async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame): ...
-
Added a
service_tierInputParamto theBaseOpenAILLMService. This parameter can influence the latency of the response. For example"priority"will result in faster completions, but in exchange for a higher price.
Changed
- Updated
GeminiLiveLLMServiceto use thegoogle-genailibrary rather than use WebSockets directly.
Deprecated
-
LivekitFrameSerializeris now deprecated. UseLiveKitTransportinstead. -
pipecat.service.openai_realtimeis now deprecated, usepipecat.services.openai.realtimeinstead orpipecat.services.azure.realtimefor Azure Realtime. -
pipecat.service.aws_nova_sonicis now deprecated, usepipecat.services.aws.nova_sonicinstead. -
GeminiMultimodalLiveLLMServiceis now deprecated, useGeminiLiveLLMService.
Fixed
-
Fixed a
GoogleVertexLLMServiceissue that would generate an error if no token information was returned. -
GeminiLiveLLMServicewill now end gracefully (i.e. after the bot has finished) upon receiving anEndFrame. -
GeminiLiveLLMServicewill try to seamlessly reconnect when it loses its connection.
v0.0.89
Fixed
- Reverted a change introduced in 0.0.88 that was causing pipelines to be frozen when using interruption strategies and processors that block interruption frames (e.g.
STTMuteFilter).
v0.0.88
Added
-
Added support for Nano Banana models to
GoogleLLMService. For example, you can now use thegemini-2.5-flash-imagemodel to generate images. -
Added
HumeTTSServicefor text-to-speech synthesis using Hume AI's expressive voice models. Provides high-quality, emotionally expressive speech synthesis with support for various voice models. Includes example inexamples/foundational/07ad-interruptible-hume.py. Use withuv pip install pipecat-ai[hume].
Changed
- Updated default
GoogleLLMServicemodel togemini-2.5-flash.
Deprecated
- PlayHT is shutting down their API on December 31st, 2025. As a result,
PlayHTTTSServiceandPlayHTHttpTTSServiceare deprecated and will be removed in a future version.
Fixed
-
Fixed an issue with
AWSNovaSonicLLMServicewhere the client wouldn't connect due to a breaking change in the AWS dependency chain. -
PermissionErroris now caught if NLTK'spunkt_tabcan't be downloaded. -
Fixed an issue that would cause wrong user/assistant context ordering when using interruption strategies.
-
Fixed RTVI incoming message handling, broken in 0.0.87.
v0.0.87
Added
-
Added
WebsocketSTTServicebase class for websocket-based STT services. Combines STT functionality with websocket connectivity, providing automatic error handling and reconnection capabilities with exponential backoff. -
Added
DeepgramFluxSTTServicefor real-time speech recognition using Deepgram's Flux WebSocket API. Flux understands conversational flow and automatically handles turn-taking. -
Added RTVI messages for user/bot audio levels and system logs.
-
Include OpenAI-based LLM services cached tokens to
MetricsFrame.
Changed
- Updated the default model for
AnthropicLLMServicetoclaude-sonnet-4-5-20250929.
Deprecated
-
DailyTransportMessageFrameandDailyTransportMessageUrgentFrameare deprecated, useDailyOutputTransportMessageFrameandDailyOutputTransportMessageUrgentFramerespectively instead. -
LiveKitTransportMessageFrameandLiveKitTransportMessageUrgentFrameare deprecated, useLiveKitOutputTransportMessageFrameandLiveKitOutputTransportMessageUrgentFramerespectively instead. -
TransportMessageFrameandTransportMessageUrgentFrameare deprecated, useOutputTransportMessageFrameandOutputTransportMessageUrgentFramerespectively instead. -
InputTransportMessageUrgentFrameis deprecated, useInputTransportMessageFrameinstead. -
DailyUpdateRemoteParticipantsFrameis deprecated and will be removed in a future version. Instead, create your own custom frame and handle it in the@transport.output().event_handler("on_after_push_frame")event handler or a custom processor.
Fixed
-
Fixed an issue in
AWSBedrockLLMServicewhere timeout exceptions weren't being detected. -
Fixed a
PipelineTaskissue that could prevent the application to exit iftask.cancel()was called when the task was already finished. -
Fixed an issue where local SmartTurn was not being ran in a separate thread.
v0.0.86
Added
-
Added
HeyGenTransport. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/). When used, the Pipecat bot joins the same virtual room as the HeyGen Avatar and the user. -
Added support to
TwilioFrameSerializerforregionandedgesettings. -
Added support for using universal
LLMContextwith:LLMLogObserverGatedLLMContextAggregator(formerlyGatedOpenAILLMContextAggregator)LangchainProcessorMem0MemoryService
-
Added
StrandsAgentProcessorwhich allows you to use the Strands Agents framework to build your voice agents.
See https://strandsagents.com -
Added
ElevenLabsSTTServicefor speech-to-text transcription. -
Added a peer connection monitor to the
SmallWebRTCConnectionthat automatically disconnects if the connection fails to establish within the timeout (1 minute by default). -
Added memory cleanup improvements to reduce memory peaks.
-
Added
on_before_process_frame,on_after_process_frame,on_before_push_frameandon_after_push_frame. These are synchronous events that get called before and after a frame is processed or pushed. Note that these events are synchrnous so they should ideally perform lightweight tasks in order to not block the pipeline. Seeexamples/foundational/45-before-and-after-events.py. -
Added
on_before_leavesynchronous event toDailyTransport. -
Added
on_before_disconnectsynchronous event toLiveKitTransport. -
It is now possible to register synchronous event handlers. By default, all event handlers are executed in a separate task. However, in some cases we want to guarantee order of execution, for example, executing something before disconnecting a transport.
self._register_event_handler("on_event_name", sync=True)
-
Added support for global location in
GoogleVertexLLMService. The service now supports both regional locations (e.g., "us-east4") and the "global" location for Vertex AI endpoints. When using "global" location, the service will useaiplatform.googleapis.comas the API host instead of the regional format. -
Added
on_pipeline_finishedevent toPipelineTask. This event will get fired when the pipeline is done running. This can be the result of aStopFrame,CancelFrameorEndFrame.@task.event_handler("on_pipeline_finished") async def on_pipeline_finished(task: PipelineTask, frame: Frame): ...
-
Added support for new RTVI
send-textevent, along with the ability to toggle the audio response off (skip tts) while handling the new context.
Changed
-
Updated
aiortcto 1.13.0. -
Updated
sentryto 2.38.0. -
BaseOutputTransportmethodswrite_audio_frameandwrite_video_framenow return a boolean to indicate if the transport implementation was able to write the given frame or not. -
Updated Silero VAD model to v6.
-
Updated
livekitto 1.0.13. -
torchandtorchaudioare no longer required for running Smart Turn locally. This avoids gigabytes of dependencies being installed. -
Updated
websocketsdependency to support version 15.0. Removed deprecated usage ofConnectionClosed.codeandConnectionClosed.reasonattributes inAWSTranscribeSTTServicefor compatibility. -
Refactored
pyproject.tomlto reduce websockets dependency repetition using self-referencing extras. All websockets-dependent services now reference a sharedwebsockets-baseextra.
Deprecated
-
GladiaSTTService'sconfidencearg is deprecated.confidenceis no longer needed to determine which transcription or translation frames to emit. -
PipelineTaskeventson_pipeline_stopped,on_pipeline_endedandon_pipeline_cancelledare now deprecated. Useon_pipeline_finishedinstead. -
Support for the RTVI
append-to-contextevent, in lieu of the newsend-textevent and making way for future events likesend-image.
Fixed
-
Fixed an issue where the pipeline could freeze if a task cancellation never completed because a third-party library swallowed asyncio.CancelledError. We now apply a timeout to task cancellations to prevent these freezes. If the timeout is reached, the system logs warnings and leaves dangling tasks behind, which can help diagnose where cancellation is being blocked.
-
Fixed an
AudioBufferProcessorissues that was causing user audio to be missing in stereo recordings causing bot and user overlaps. -
Fixed a
BaseOutputTransportissue that could produce large savedAudioBufferProcessorfiles when using an audio mixer. -
Fixed a
PipelineRunnerissue on Windows where setting up SIGINT and SIGTERM was raising an exception. -
Fixed an issue where multiple handlers for an event would not run in parallel.
-
Fixed
DailyTransport.sip_call_transfer()to automatically use the session ID from theon_dialin_connectedevent, when not explicitly provided. Now supports cold transfers (from incoming dial-in calls) by automatically tracking session IDs from connection events. -
Fixed a memory leak in
SmallWebRTCTransport. Inaiortc, when you receive aMediaStreamTrack(audio or video), frames are produced asynchronously. If the code never consumes these frames, they are queued in memory, causing a memory leak. -
Fixed an issue in
AsyncAITTSService, whereTTSTextFrameswere not being pushed. -
Fixed an issue that would cause
push_interruption_task_frame_and_wait()to not wait if a previous interruption had already happened. -
Fixed a couple of bugs in
ServiceSwitcher:- Using multiple
ServiceSwitchers in a pipeline would result in an error. ServiceSwitcherFrames (such asManuallySwitchServiceFrames) were having an effect too early, essentially "jumping the queue" in terms of pipeline frame ordering.
- Using multiple
-
Fixed a self-cancellation deadlock in
UserIdleProcessorwhen returningFalsefrom an idle callback. The task now terminates naturally instead of attempting to cancel itself. -
Fixed an issue in
AudioBufferProcessorwhere a recording is not created when a bot speaks and user input is blocked. -
Fixed a
FastAPIWebsocketTransportandSmallWebRTCTransportissue whereon_client_disconnectedwould be triggered when the bot ends the conversation. That is,on_client_disconnectedshould only be triggered when the remote client actually disconnects. -
Fixed an issue in
HeyGenVideoServicewhere theBotStartedSpeakingFramewas blocked from moving through the Pipeline.
v0.0.85
Added
-
AzureSTTServicenow pushes interim transcriptions. -
Added
voice_cloning_keytoGoogleTTSServiceto support custom cloned voices. -
Added
speaking_ratetoGoogleTTSService.InputParamsto control the speaking rate. -
Added a
speedarg toOpenAITTSServiceto control the speed of the voice response. -
Added
FrameProcessor.push_interruption_task_frame_and_wait(). Use this method to programatically interrupt the bot from any part of the pipeline. This guarantees that all the processors in the pipeline are interrupted in order (from upstream to downstream). Internally, this works by first pushing anInterruptionTaskFrameupstream until it reaches the pipeline task. The pipeline task then generates anInterruptionFrame, which flows downstream through all processors. Once theInterruptionFramehas reaches the processor waiting for the interruption, the function returns and execution continues after the call. Think of it as sending an upstream request for interruption and waiting until the acknowledgment flows back downstream. -
Added new base
TaskFrame(which is a system frame). This is the base class for all task frames (EndTaskFrame,CancelTaskFrame, etc.) that are meant to be pushed upstream to reach the pipeline task. -
Expanded support for universal
LLMContextto the AWS Bedrock LLM service. Using the universalLLMContextand associatedLLMContextAggregatorPairis a pre-requisite for usingLLMSwitcherto switch between LLMs at runtime. -
Added new fields to the development runner's
parse_telephony_websocketmethod in support of providing dynamic data to a bot.- Twilio: Added a new
bodyparameter, which parses the websocket message forcustomParameters. Provide data via theParameternouns in your TwiML to use this feature. - Telnyx & Exotel: Both providers make the
toandfromphone numbers available in the websocket messages. You can now access these numbers ascall_data["to"]andcall_data["from"].
Note: Each telephony provider offers different features. Refer to the corresponding example in
pipecat-examplesto see how to pass custom data to your bot. - Twilio: Added a new
-
Added
bodyto theWebsocketRunnerArgumentsas an optional parameter. Custombodyinformation can be passed from the server into the bot file via thebot()method using this new parameter. -
Added video streaming support to
LiveKitTransport. -
Added
OpenAIRealtimeLLMServiceandAzureRealtimeLLMServicewhich provide access to OpenAI Realtime.
Changed
pipeline.tests.utils.run_test()now allows passingPipelineParamsinstead of individual parameters.
Removed
- Remove
VisionImageRawFramein favor of context frames (LLMContextFrameorOpenAILLMContextFrame).
Deprecated
-
BotInterruptionFrameis now deprecated, useInterruptionTaskFrameinstead. -
StartInterruptionFrameis now deprected, useInterruptionFrameinstead. -
Deprecate
VisionImageFrameAggregatorbecauseVisionImageRawFramehas been removed. See the12*examples for the new recommended replacement pattern. -
NoisereduceFilteris now deprecated and will be removed in a future version. Use other audio filters likeKrispFilterorAICFilter. -
Deprecated
OpenAIRealtimeBetaLLMServiceandAzureRealtimeBetaLLMService. UseOpenAIRealtimeLLMServiceandAzureRealtimeLLMService, respectively. Each service will be removed in an upcoming version, 1.0.0.
Fixed
-
Fixed a
BaseOutputTransportissue that caused incorrect detection of when the bot stopped talking while using an audio mixer. -
Fixed a
LiveKitTransportissue where RTVI messages were not properly encoded. -
Add additional fixups to Mistral context messages to ensure they meet Mistral-specific requirements, avoiding Mistral "invalid request" errors.
-
Fixed
DailyTransporttranscription handling to gracefully handle missingrawResponsefield in transcription messages, preventing KeyError crashes.
v0.0.84
Added
-
Add the ability to send DTMF to
LiveKitTransport. -
Expanded support for universal
LLMContextto the Anthropic LLM service. Using the universalLLMContextand associatedLLMContextAggregatorPairis a pre-requisite for usingLLMSwit\ cherto switch between LLMs at runtime.
Changed
-
Updated
daily-pythonto 0.19.9. -
Restored
DailyTransport's native DTMF support using Daily'ssend_dtmf()method instead of generated audio tones.
Fixed
-
Fixed a
AWSBedrockLLMServicecrash caused by an extraawait. -
Fixed a
OpenAIImageGenServiceissue where it was not creatingURLImageRawFramecorrectly.