Skip to content

Releases: pipecat-ai/pipecat

v0.0.93

07 Nov 21:26
16e2d5b

Choose a tag to compare

Added

  • Added support for Sarvam Speech-to-Text service (SarvamSTTService) with streaming WebSocket support for saarika (STT) and saaras (STT-translate) models.

  • Added support for passing in a ToolsSchema in lieu of a list of provider- specific dicts when initializing OpenAIRealtimeLLMService or when updating it using LLMUpdateSettingsFrame.

  • Added TransportParams.audio_out_silence_secs, which specifies how many seconds of silence to output when an EndFrame reaches the output transport. This can help ensure that all audio data is fully delivered to clients.

  • Added new FrameProcessor.broadcast_frame() method. This will push two instances of a given frame class, one upstream and the other downstream.

    await self.broadcast_frame(UserSpeakingFrame)
  • Added MetricsLogObserver for logging performance metrics from MetricsFrame instances. Supports filtering via include_metrics parameter to control which metrics types are logged (TTFB, processing time, LLM token usage, TTS usage, smart turn metrics).

  • Added pronunciation_dictionary_locators to ElevenLabsTTSService and ElevenLabsHttpTTSService.

  • Added support for loading external observers. You can now register custom pipeline observers by setting the PIPECAT_OBSERVER_FILES environment variable. This variable should contain a colon-separated list of Python files (e.g. export PIPECAT_OBSERVER_FILES="observer1.py:observer2.py:..."). Each file must define a function with the following signature:

    async def create_observers(task: PipelineTask) -> Iterable[BaseObserver]:
        ...
  • Added support for new sonic-3 languages in CartesiaTTSService and CartesiaHttpTTSService.

  • EndFrame and EndTaskFrame have an optional reason field to indicate why the pipeline is being ended.

  • CancelFrame and CancelTaskFrame have an optional reason field to indicate why the pipeline is being canceled. This can be also specified when you cancel a task with PipelineTask.cancel(reason="cancellation reason").

  • Added include_prob_metrics parameter to Whisper STT services to enable access to probability metrics from transcription results.

  • Added utility functions extract_whisper_probability(), extract_openai_gpt4o_probability(), and extract_deepgram_probability() to extract probability metrics from TranscriptionFrame objects for Whisper-based, OpenAI GPT-4o-transcribe, and Deepgram STT services respectively.

  • Added LLMSwitcher.register_direct_function(). It works much like LLMSwitcher.register_function() in that it's a shorthand for registering functions on all LLMs in the switcher, but for direct functions.

  • Added LLMSwitcher.register_direct_function(). It works much like LLMSwitcher.register_function() in that it's a shorthand for registering a function on all LLMs in the switcher, except this new method takes a direct function (a FunctionSchema-less function).

  • Added MCPClient.get_tools_schema() and MCPClient.register_tools_schema() as a two-step alternative to MCPClient.register_tools(), to allow users to pass MCP tools to, say, GeminiLiveLLMService (as well as other speech-to-speech services) in the constructor.

  • Added support for passing in an LLMSwicher to MCPClient.register_tools() (as well as the new MCPClient.register_tools_schema()).

  • Added cpu_count parameter to LocalSmartTurnAnalyzerV3. This is set to 1 by default for more predictable performance on low-CPU systems.

Changed

  • Improved concatenate_aggregated_text() to one word outputs from OpenAI Realtime and Gemini Live. Text fragments are now correctly concatenated without spaces when these patterns are detected.

  • STTMuteFilter no longer sends STTMuteFrame to the STT service. The filter now blocks frames locally without instructing the STT service to stop processing audio. This prevents inactivity-related errors (such as 409 errors from Google STT) while maintaining the same muting behavior at the application level. Important: The STTMuteFilter should be placed after the STT service itself.

  • Improved GoogleSTTService error handling to properly catch gRPC Aborted exceptions (corresponding to 409 errors) caused by stream inactivity. These exceptions are now logged at DEBUG level instead of ERROR level, since they indicate expected behavior when no audio is sent for 10+ seconds (e.g., during long silences or when audio input is blocked). The service automatically reconnects when this occurs.

  • Bumped the fastapi dependency's upperbound to <0.122.0.

  • Updated the default model for GoogleVertexLLMService to gemini-2.5-flash.

  • Updated the GoogleVertexLLMService to use the GoogleLLMService as a base
    class instead of the OpenAILLMService.

  • Updated STT and TTS services to pass through unverified language codes with a warning instead of returning None. This allows developers to use newly supported languages before Pipecat's service classes are updated, while still providing guidance on verified languages.

Removed

  • Removed needs_mcp_alternate_schema() from LLMService. The mechanism that relied on it went away.

Fixed

  • Restore backwards compatibility for vision/image features (broken in 0.0.92) when using non-universal context and assistant aggregators.

  • Fixed DeepgramSTTService._disconnect() to properly await is_connected() method call, which is an async coroutine in the Deepgram SDK.

  • Fixed an issue where the SmallWebRTCRequest dataclass in runner would scrub arbitrary request data from client due to camelCase typing. This fixes data passthrough for JS clients where APIRequest is used.

  • Fixed a bug in GeminiLiveLLMService where in some circumstances it wouldn't respond after a tool call.

  • Fixed GeminiLiveLLMService session resumption after a connection timeout.

  • GeminiLiveLLMService now properly supports context-provided system instruction and tools.

  • Fixed GoogleLLMService token counting to avoid double-counting tokens when Gemini sends usage metadata across multiple streaming chunks.

v0.0.92

31 Oct 16:49
334167e

Choose a tag to compare

🎃 The Haunted Edition 👻

Added

  • Added a new DeepgramHttpTTSService, which delivers a meaningful reduction in latency when compared to the DeepgramTTSService.

  • Add support for speaking_rate input parameter in GoogleHttpTTSService.

  • Added enable_speaker_diarization and enable_language_identification to SonioxSTTService.

  • Added SpeechmaticsTTSService, which uses Speechmatic's TTS API. Updated examples 07a* to use the new TTS service.

  • Added support for including images or audio to LLM context messages using LLMContext.create_image_message() or LLMContext.create_image_url_message() (not all LLMs support URLs) and LLMContext.create_audio_message(). For example, when creating LLMMessagesAppendFrame:

    message = LLMContext.create_image_message(image=..., size= ...)
    await self.push_frame(LLMMessagesAppendFrame(messages=[message], run_llm=True))
  • New event handlers for the DeepgramFluxSTTService: on_start_of_turn, on_turn_resumed, on_end_of_turn, on_eager_end_of_turn, on_update.

  • Added generation_config parameter support to CartesiaTTSService and CartesiaHttpTTSService for Cartesia Sonic-3 models. Includes a new GenerationConfig class with volume (0.5-2.0), speed (0.6-1.5), and emotion (60+ options) parameters for fine-grained speech generation control.

  • Expanded support for univeral LLMContext to OpenAIRealtimeLLMService. As a reminder, the context-setup pattern when using LLMContext is:

    context = LLMContext(messages, tools)
    context_aggregator = LLMContextAggregatorPair(context)

    (Note that even though OpenAIRealtimeLLMService now supports the universal LLMContext, it is not meant to be swapped out for another LLM service at runtime with LLMSwitcher.)

    Note: TranscriptionFrames and InterimTranscriptionFrames now go upstream from OpenAIRealtimeLLMService, so if you're using TranscriptProcessor, say, you'll want to adjust accordingly:

    pipeline = Pipeline(
      [
        transport.input(),
        context_aggregator.user(),
    
        # BEFORE
        llm,
        transcript.user(),
    
        # AFTER
        transcript.user(),
        llm,
    
        transport.output(),
        transcript.assistant(),
        context_aggregator.assistant(),
      ]
    )

    Also worth noting: whether or not you use the new context-setup pattern with OpenAIRealtimeLLMService, some types have changed under the hood:

    ## BEFORE:
    
    # Context aggregator type
    context_aggregator: OpenAIContextAggregatorPair
    
    # Context frame type
    frame: OpenAILLMContextFrame
    
    # Context type
    context: OpenAIRealtimeLLMContext
    # or
    context: OpenAILLMContext
    
    ## AFTER:
    
    # Context aggregator type
    context_aggregator: LLMContextAggregatorPair
    
    # Context frame type
    frame: LLMContextFrame
    
    # Context type
    context: LLMContext

    Also note that RealtimeMessagesUpdateFrame and RealtimeFunctionCallResultFrame have been deprecated, since they're no longer used by OpenAIRealtimeLLMService. OpenAI Realtime now works more like other LLM services in Pipecat, relying on updates to its context, pushed by context aggregators, to update its internal state. Listen for LLMContextFrames for context updates.

    Finally, LLMTextFrames are no longer pushed from OpenAIRealtimeLLMService when it's configured with output_modalities=['audio']. If you need to process its output, listen for TTSTextFrames instead.

  • Expanded support for universal LLMContext to GeminiLiveLLMService. As a reminder, the context-setup pattern when using LLMContext is:

    context = LLMContext(messages, tools)
    context_aggregator = LLMContextAggregatorPair(context)

    (Note that even though GeminiLiveLLMService now supports the universal LLMContext, it is not meant to be swapped out for another LLM service at runtime with LLMSwitcher.)

    Worth noting: whether or not you use the new context-setup pattern with GeminiLiveLLMService, some types have changed under the hood:

    ## BEFORE:
    
    # Context aggregator type
    context_aggregator: GeminiLiveContextAggregatorPair
    
    # Context frame type
    frame: OpenAILLMContextFrame
    
    # Context type
    context: GeminiLiveLLMContext
    # or
    context: OpenAILLMContext
    
    ## AFTER:
    
    # Context aggregator type
    context_aggregator: LLMContextAggregatorPair
    
    # Context frame type
    frame: LLMContextFrame
    
    # Context type
    context: LLMContext

    Also note that LLMTextFrames are no longer pushed from GeminiLiveLLMService when it's configured with modalities=GeminiModalities.AUDIO. If you need to process its output, listen for TTSTextFrames instead.

Changed

  • The development runner's /start endpoint now supports passing dailyRoomProperties and dailyMeetingTokenProperties in the request body when createDailyRoom is true. Properties are validated against the DailyRoomProperties and DailyMeetingTokenProperties types respectively and passed to Daily's room and token creation APIs.

  • UserImageRawFrame new fields append_to_context and text. The append_to_context field indicates if this image and text should be added to the LLM context (by the LLM assistant aggregator). The text field, if set, might also guide the LLM or the vision service on how to analyze the image.

  • UserImageRequestFrame new fiels append_to_context and text. Both fields will be used to set the same fields on the captured UserImageRawFrame.

  • UserImageRequestFrame don't require function call name and ID anymore.

  • Updated MoondreamService to process UserImageRawFrame.

  • VisionService expects UserImageRawFrame in order to analyze images.

  • DailyTransport triggers on_error event if transcription can't be started or stopped.

  • DailyTransport updates: start_dialout() now returns two values: session_id and error. start_recording() now returns two values: stream_id and error.

  • Updated daily-python to 0.21.0.

  • SimliVideoService now accepts api_key and face_id parameters directly, with optional params for max_session_length and max_idle_time configuration, aligning with other Pipecat service patterns.

  • Updated the default model to sonic-3 for CartesiaTTSService and CartesiaHttpTTSService.

  • FunctionFilter now has a filter_system_frames arg, which controls whether or not SystemFrames are filtered.

  • Upgraded aws_sdk_bedrock_runtime to v0.1.1 to resolve potential CPU issues when running AWSNovaSonicLLMService.

Deprecated

  • The expect_stripped_words parameter of LLMAssistantAggregatorParams is ignored when used with the newer LLMAssistantAggregator, which now handles word spacing automatically.

  • LLMService.request_image_frame() is deprecated, push a UserImageRequestFrame instead.

  • UserResponseAggregator is deprecated and will be removed in a future version.

  • The send_transcription_frames argument to OpenAIRealtimeLLMService is deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details.

  • Types in pipecat.services.openai.realtime.context and pipecat.services.openai.realtime.frames are deprecated, as they're no longer used by OpenAIRealtimeLLMService. See "Added" section for details.

  • SimliVideoService simli_config parameter is deprecated. Use api_key and face_id parameters instead.

Removed

  • Removed enable_non_final_tokens and max_non_final_tokens_duration_ms from SonioxSTTService.

  • Removed the aiohttp_session arg from SarvamTTSService as it's no longer used.

Fixed

  • Fixed a PipelineTask issue that was causing an idle timeout for frames that were being generated but not reaching the end of the pipeline. Since the exact point when frames are discarded is unknown, we now monitor pipeline frames using an observer. If the observer detects frames are being generated, it will prevent the pipeline from being considered idle.

  • Fixed an issue in HumeTTSService that was only using Octave 2, which does not support the description field. Now, if a description is provided, it switches to Octave 1.

  • Fixed an issue where DailyTransport would timeout prematurely on join and on leave.

  • Fixed an issue in the runner where starting a DailyTransport room via /start didn't support using the DAILY_SAMPLE_ROOM_URL env var.

  • Fixed an issue in ServiceSwitcher where the STTServices would result in all STT services producing TranscriptionFrames.

Other

  • Updated all vision 12-series foundational examples to load images from a file.

  • Added 14-series video examples for different services. These new examples request an image from the user camera through a function call.

v0.0.91

22 Oct 02:14
e0c43de

Choose a tag to compare

Added

  • It is now possible to start a bot from the /start endpoint when using the runner Daily's transport. This follows the Pipecat Cloud format with createDailyRoom and body fields in the POST request body.

  • Added an ellipsis character () to the end of sentence detection in the string utils.

  • Expanded support for universal LLMContext to AWSNovaSonicLLMService. As a reminder, the context-setup pattern when using LLMContext is:

    context = LLMContext(messages, tools)
    context_aggregator = LLMContextAggregatorPair(context)

    (Note that even though AWSNovaSonicLLMService now supports the universal LLMContext, it is not meant to be swapped out for another LLM service at runtime.)

    Worth noting: whether or not you use the new context-setup pattern with AWSNovaSonicLLMService, some types have changed under the hood:

    ## BEFORE:
    
    # Context aggregator type
    context_aggregator: AWSNovaSonicContextAggregatorPair
    
    # Context frame type
    frame: OpenAILLMContextFrame
    
    # Context type
    context: AWSNovaSonicLLMContext
    # or
    context: OpenAILLMContext
    
    ## AFTER:
    
    # Context aggregator type
    context_aggregator: LLMContextAggregatorPair
    
    # Context frame type
    frame: LLMContextFrame
    
    # Context type
    context: LLMContext
  • Added support for bulbul:v3 model in SarvamTTSService and SarvamHttpTTSService.

  • Added keyterms_prompt parameter to AssemblyAIConnectionParams.

  • Added speech_model parameter to AssemblyAIConnectionParams to access the multilingual model.

  • Added support for trickle ICE to the SmallWebRTCTransport.

  • Added support for updating OpenAITTSService settings (instructions and speed) at runtime via TTSUpdateSettingsFrame.

  • Added --whatsapp flag to runner to better surface WhatsApp transport logs.

  • Added on_connected and on_disconnected events to TTS and STT websocket-based services.

  • Added an aggregate_sentences arg in ElevenLabsHttpTTSService, where the default value is True.

  • Added a room_properties arg to the Daily runner's configure() method, allowing DailyRoomProperties to be provided.

  • The runner --folder argument now supports downloading files from subdirectories.

Changed

  • RunnerArguments now include the body field, so there's no need to add it to subclasses. Also, all RunnerArguments fields are now keyword-only.

  • CartesiaSTTService now inherits from WebsocketSTTService.

  • Package upgrades:

    • daily-python upgraded to 0.20.0.
    • openai upgraded to support up to 2.x.x.
    • openpipe upgraded to support up to 5.x.x.
  • SpeechmaticsSTTService updated dependencies for speechmatics-rt>=0.5.0.

Deprecated

  • The send_transcription_frames argument to AWSNovaSonicLLMService is deprecated. Transcription frames are now always sent. They go upstream, to be handled by the user context aggregator. See "Added" section for details.

  • Types in pipecat.services.aws.nova_sonic.context have been deprecated due to changes to support LLMContext. See "Changed" section for details.

Fixed

  • Fixed an issue where the RTVIProcessor was sending duplicate UserStartedSpeakingFrame and UserStoppedSpeakingFrame messages.

  • Fixed an issue in AWSBedrockLLMService where both temperature and top_p were always sent together, causing conflicts with models like Claude Sonnet 4.5 that don't allow both parameters simultaneously. The service now only includes inference parameters that are explicitly set, and InputParams defaults have been changed to None to rely on AWS Bedrock's built-in model defaults.

  • Fixed an issue in RivaSegmentedSTTService where a runtime error occurred due to a mismatch in the _handle_transcription method's signature.

  • Fixed multiple pipeline task cancellation issues. asyncio.CancelledError is now handled properly in PipelineTask making it possible to cancel an asyncio task that it's executing a PipelineRunner cleanly. Also, PipelineTask.cancel() does not block anymore waiting for the CancelFrame to reach the end of the pipeline (going back to the behavior in < 0.0.83).

  • Fixed an issue in ElevenLabsTTSService and ElevenLabsHttpTTSService where the Flash models would split words, resulting in a space being inserted between words.

  • Fixed an issue where audio filters' stop() would not be called when using CancelFrame.

  • Fixed an issue in ElevenLabsHttpTTSService, where apply_text_normalization was incorrectly set as a query parameter. It's now being added as a request parameter.

  • Fixed an issue where RimeHttpTTSService and PiperTTSService could generate incorrectly 16-bit aligned audio frames, potentially leading to internal errors or static audio.

  • Fixed an issue in SpeechmaticsSTTService where AdditionalVocabEntry items needed to have sounds_like for the session to start.

Other

  • Added foundational example 47-sentry-metrics.py, demonstrating how to use the SentryMetrics processor.

  • Added foundational example 14x-function-calling-openpipe.py.

v0.0.90

10 Oct 17:24
fdaa4e4

Choose a tag to compare

Added

  • Added audio filter KrispVivaFilter using the Krisp VIVA SDK.

  • Added --folder argument to the runner, allowing files saved in that folder to be downloaded from http://HOST:PORT/file/FILE.

  • Added GeminiLiveVertexLLMService, for accessing Gemini Live via Google Vertex AI.

  • Added some new configuration options to GeminiLiveLLMService:

    • thinking
    • enable_affective_dialog
    • proactivity

    Note that these new configuration options require using a newer model than the default, like "gemini-2.5-flash-native-audio-preview-09-2025". The last two require specifying http_options=HttpOptions(api_version="v1alpha").

  • Added on_pipeline_error event to PipelineTask. This event will get fired when an ErrorFrame is pushed (use FrameProcessor.push_error()).

    @task.event_handler("on_pipeline_error")
    async def on_pipeline_error(task: PipelineTask, frame: ErrorFrame):
        ...
  • Added a service_tier InputParam to the BaseOpenAILLMService. This parameter can influence the latency of the response. For example "priority" will result in faster completions, but in exchange for a higher price.

Changed

  • Updated GeminiLiveLLMService to use the google-genai library rather than use WebSockets directly.

Deprecated

  • LivekitFrameSerializer is now deprecated. Use LiveKitTransport instead.

  • pipecat.service.openai_realtime is now deprecated, use pipecat.services.openai.realtime instead or pipecat.services.azure.realtime for Azure Realtime.

  • pipecat.service.aws_nova_sonic is now deprecated, use pipecat.services.aws.nova_sonic instead.

  • GeminiMultimodalLiveLLMService is now deprecated, use GeminiLiveLLMService.

Fixed

  • Fixed a GoogleVertexLLMService issue that would generate an error if no token information was returned.

  • GeminiLiveLLMService will now end gracefully (i.e. after the bot has finished) upon receiving an EndFrame.

  • GeminiLiveLLMService will try to seamlessly reconnect when it loses its connection.

v0.0.89

08 Oct 01:57
4b2379c

Choose a tag to compare

Fixed

  • Reverted a change introduced in 0.0.88 that was causing pipelines to be frozen when using interruption strategies and processors that block interruption frames (e.g. STTMuteFilter).

v0.0.88

07 Oct 21:27
91ae3f8

Choose a tag to compare

Added

  • Added support for Nano Banana models to GoogleLLMService. For example, you can now use the gemini-2.5-flash-image model to generate images.

  • Added HumeTTSService for text-to-speech synthesis using Hume AI's expressive voice models. Provides high-quality, emotionally expressive speech synthesis with support for various voice models. Includes example in examples/foundational/07ad-interruptible-hume.py. Use with uv pip install pipecat-ai[hume].

Changed

  • Updated default GoogleLLMService model to gemini-2.5-flash.

Deprecated

  • PlayHT is shutting down their API on December 31st, 2025. As a result, PlayHTTTSService and PlayHTHttpTTSService are deprecated and will be removed in a future version.

Fixed

  • Fixed an issue with AWSNovaSonicLLMService where the client wouldn't connect due to a breaking change in the AWS dependency chain.

  • PermissionError is now caught if NLTK's punkt_tab can't be downloaded.

  • Fixed an issue that would cause wrong user/assistant context ordering when using interruption strategies.

  • Fixed RTVI incoming message handling, broken in 0.0.87.

v0.0.87

02 Oct 16:40
63bc825

Choose a tag to compare

Added

  • Added WebsocketSTTService base class for websocket-based STT services. Combines STT functionality with websocket connectivity, providing automatic error handling and reconnection capabilities with exponential backoff.

  • Added DeepgramFluxSTTService for real-time speech recognition using Deepgram's Flux WebSocket API. Flux understands conversational flow and automatically handles turn-taking.

  • Added RTVI messages for user/bot audio levels and system logs.

  • Include OpenAI-based LLM services cached tokens to MetricsFrame.

Changed

  • Updated the default model for AnthropicLLMService to claude-sonnet-4-5-20250929.

Deprecated

  • DailyTransportMessageFrame and DailyTransportMessageUrgentFrame are deprecated, use DailyOutputTransportMessageFrame and DailyOutputTransportMessageUrgentFrame respectively instead.

  • LiveKitTransportMessageFrame and LiveKitTransportMessageUrgentFrame are deprecated, use LiveKitOutputTransportMessageFrame and LiveKitOutputTransportMessageUrgentFrame respectively instead.

  • TransportMessageFrame and TransportMessageUrgentFrame are deprecated, use OutputTransportMessageFrame and OutputTransportMessageUrgentFrame respectively instead.

  • InputTransportMessageUrgentFrame is deprecated, use InputTransportMessageFrame instead.

  • DailyUpdateRemoteParticipantsFrame is deprecated and will be removed in a future version. Instead, create your own custom frame and handle it in the @transport.output().event_handler("on_after_push_frame") event handler or a custom processor.

Fixed

  • Fixed an issue in AWSBedrockLLMService where timeout exceptions weren't being detected.

  • Fixed a PipelineTask issue that could prevent the application to exit if task.cancel() was called when the task was already finished.

  • Fixed an issue where local SmartTurn was not being ran in a separate thread.

v0.0.86

24 Sep 19:21
3797f41

Choose a tag to compare

Added

  • Added HeyGenTransport. This is an integration for HeyGen Interactive Avatar. A video service that handles audio streaming and requests HeyGen to generate avatar video responses. (see https://www.heygen.com/). When used, the Pipecat bot joins the same virtual room as the HeyGen Avatar and the user.

  • Added support to TwilioFrameSerializer for region and edge settings.

  • Added support for using universal LLMContext with:

    • LLMLogObserver
    • GatedLLMContextAggregator (formerly GatedOpenAILLMContextAggregator)
    • LangchainProcessor
    • Mem0MemoryService
  • Added StrandsAgentProcessor which allows you to use the Strands Agents framework to build your voice agents.
    See https://strandsagents.com

  • Added ElevenLabsSTTService for speech-to-text transcription.

  • Added a peer connection monitor to the SmallWebRTCConnection that automatically disconnects if the connection fails to establish within the timeout (1 minute by default).

  • Added memory cleanup improvements to reduce memory peaks.

  • Added on_before_process_frame, on_after_process_frame, on_before_push_frame and on_after_push_frame. These are synchronous events that get called before and after a frame is processed or pushed. Note that these events are synchrnous so they should ideally perform lightweight tasks in order to not block the pipeline. See examples/foundational/45-before-and-after-events.py.

  • Added on_before_leave synchronous event to DailyTransport.

  • Added on_before_disconnect synchronous event to LiveKitTransport.

  • It is now possible to register synchronous event handlers. By default, all event handlers are executed in a separate task. However, in some cases we want to guarantee order of execution, for example, executing something before disconnecting a transport.

    self._register_event_handler("on_event_name", sync=True)
  • Added support for global location in GoogleVertexLLMService. The service now supports both regional locations (e.g., "us-east4") and the "global" location for Vertex AI endpoints. When using "global" location, the service will use aiplatform.googleapis.com as the API host instead of the regional format.

  • Added on_pipeline_finished event to PipelineTask. This event will get fired when the pipeline is done running. This can be the result of a StopFrame, CancelFrame or EndFrame.

    @task.event_handler("on_pipeline_finished")
    async def on_pipeline_finished(task: PipelineTask, frame: Frame):
        ...
  • Added support for new RTVI send-text event, along with the ability to toggle the audio response off (skip tts) while handling the new context.

Changed

  • Updated aiortc to 1.13.0.

  • Updated sentry to 2.38.0.

  • BaseOutputTransport methods write_audio_frame and write_video_frame now return a boolean to indicate if the transport implementation was able to write the given frame or not.

  • Updated Silero VAD model to v6.

  • Updated livekit to 1.0.13.

  • torch and torchaudio are no longer required for running Smart Turn locally. This avoids gigabytes of dependencies being installed.

  • Updated websockets dependency to support version 15.0. Removed deprecated usage of ConnectionClosed.code and ConnectionClosed.reason attributes in AWSTranscribeSTTService for compatibility.

  • Refactored pyproject.toml to reduce websockets dependency repetition using self-referencing extras. All websockets-dependent services now reference a shared websockets-base extra.

Deprecated

  • GladiaSTTService's confidence arg is deprecated. confidence is no longer needed to determine which transcription or translation frames to emit.

  • PipelineTask events on_pipeline_stopped, on_pipeline_ended and on_pipeline_cancelled are now deprecated. Use on_pipeline_finished instead.

  • Support for the RTVI append-to-context event, in lieu of the new send-text event and making way for future events like send-image.

Fixed

  • Fixed an issue where the pipeline could freeze if a task cancellation never completed because a third-party library swallowed asyncio.CancelledError. We now apply a timeout to task cancellations to prevent these freezes. If the timeout is reached, the system logs warnings and leaves dangling tasks behind, which can help diagnose where cancellation is being blocked.

  • Fixed an AudioBufferProcessor issues that was causing user audio to be missing in stereo recordings causing bot and user overlaps.

  • Fixed a BaseOutputTransport issue that could produce large saved AudioBufferProcessor files when using an audio mixer.

  • Fixed a PipelineRunner issue on Windows where setting up SIGINT and SIGTERM was raising an exception.

  • Fixed an issue where multiple handlers for an event would not run in parallel.

  • Fixed DailyTransport.sip_call_transfer() to automatically use the session ID from the on_dialin_connected event, when not explicitly provided. Now supports cold transfers (from incoming dial-in calls) by automatically tracking session IDs from connection events.

  • Fixed a memory leak in SmallWebRTCTransport. In aiortc, when you receive a MediaStreamTrack (audio or video), frames are produced asynchronously. If the code never consumes these frames, they are queued in memory, causing a memory leak.

  • Fixed an issue in AsyncAITTSService, where TTSTextFrames were not being pushed.

  • Fixed an issue that would cause push_interruption_task_frame_and_wait() to not wait if a previous interruption had already happened.

  • Fixed a couple of bugs in ServiceSwitcher:

    • Using multiple ServiceSwitchers in a pipeline would result in an error.
    • ServiceSwitcherFrames (such as ManuallySwitchServiceFrames) were having an effect too early, essentially "jumping the queue" in terms of pipeline frame ordering.
  • Fixed a self-cancellation deadlock in UserIdleProcessor when returning False from an idle callback. The task now terminates naturally instead of attempting to cancel itself.

  • Fixed an issue in AudioBufferProcessor where a recording is not created when a bot speaks and user input is blocked.

  • Fixed a FastAPIWebsocketTransport and SmallWebRTCTransport issue where on_client_disconnected would be triggered when the bot ends the conversation. That is, on_client_disconnected should only be triggered when the remote client actually disconnects.

  • Fixed an issue in HeyGenVideoService where the BotStartedSpeakingFrame was blocked from moving through the Pipeline.

v0.0.85

12 Sep 18:10
d8cd28b

Choose a tag to compare

Added

  • AzureSTTService now pushes interim transcriptions.

  • Added voice_cloning_key to GoogleTTSService to support custom cloned voices.

  • Added speaking_rate to GoogleTTSService.InputParams to control the speaking rate.

  • Added a speed arg to OpenAITTSService to control the speed of the voice response.

  • Added FrameProcessor.push_interruption_task_frame_and_wait(). Use this method to programatically interrupt the bot from any part of the pipeline. This guarantees that all the processors in the pipeline are interrupted in order (from upstream to downstream). Internally, this works by first pushing an InterruptionTaskFrame upstream until it reaches the pipeline task. The pipeline task then generates an InterruptionFrame, which flows downstream through all processors. Once the InterruptionFrame has reaches the processor waiting for the interruption, the function returns and execution continues after the call. Think of it as sending an upstream request for interruption and waiting until the acknowledgment flows back downstream.

  • Added new base TaskFrame (which is a system frame). This is the base class for all task frames (EndTaskFrame, CancelTaskFrame, etc.) that are meant to be pushed upstream to reach the pipeline task.

  • Expanded support for universal LLMContext to the AWS Bedrock LLM service. Using the universal LLMContext and associated LLMContextAggregatorPair is a pre-requisite for using LLMSwitcher to switch between LLMs at runtime.

  • Added new fields to the development runner's parse_telephony_websocket method in support of providing dynamic data to a bot.

    • Twilio: Added a new body parameter, which parses the websocket message for customParameters. Provide data via the Parameter nouns in your TwiML to use this feature.
    • Telnyx & Exotel: Both providers make the to and from phone numbers available in the websocket messages. You can now access these numbers as call_data["to"] and call_data["from"].

    Note: Each telephony provider offers different features. Refer to the corresponding example in pipecat-examples to see how to pass custom data to your bot.

  • Added body to the WebsocketRunnerArguments as an optional parameter. Custom body information can be passed from the server into the bot file via the bot() method using this new parameter.

  • Added video streaming support to LiveKitTransport.

  • Added OpenAIRealtimeLLMService and AzureRealtimeLLMService which provide access to OpenAI Realtime.

Changed

  • pipeline.tests.utils.run_test() now allows passing PipelineParams instead of individual parameters.

Removed

  • Remove VisionImageRawFrame in favor of context frames (LLMContextFrame or OpenAILLMContextFrame).

Deprecated

  • BotInterruptionFrame is now deprecated, use InterruptionTaskFrame instead.

  • StartInterruptionFrame is now deprected, use InterruptionFrame instead.

  • Deprecate VisionImageFrameAggregator because VisionImageRawFrame has been removed. See the 12* examples for the new recommended replacement pattern.

  • NoisereduceFilter is now deprecated and will be removed in a future version. Use other audio filters like KrispFilter or AICFilter.

  • Deprecated OpenAIRealtimeBetaLLMService and AzureRealtimeBetaLLMService. Use OpenAIRealtimeLLMService and AzureRealtimeLLMService, respectively. Each service will be removed in an upcoming version, 1.0.0.

Fixed

  • Fixed a BaseOutputTransport issue that caused incorrect detection of when the bot stopped talking while using an audio mixer.

  • Fixed a LiveKitTransport issue where RTVI messages were not properly encoded.

  • Add additional fixups to Mistral context messages to ensure they meet Mistral-specific requirements, avoiding Mistral "invalid request" errors.

  • Fixed DailyTransport transcription handling to gracefully handle missing rawResponse field in transcription messages, preventing KeyError crashes.

v0.0.84

06 Sep 02:38
85d8282

Choose a tag to compare

Added

  • Add the ability to send DTMF to LiveKitTransport.

  • Expanded support for universal LLMContext to the Anthropic LLM service. Using the universal LLMContext and associated LLMContextAggregatorPair is a pre-requisite for using LLMSwit\ cher to switch between LLMs at runtime.

Changed

  • Updated daily-python to 0.19.9.

  • Restored DailyTransport's native DTMF support using Daily's send_dtmf() method instead of generated audio tones.

Fixed

  • Fixed a AWSBedrockLLMService crash caused by an extra await.

  • Fixed a OpenAIImageGenService issue where it was not creating URLImageRawFrame correctly.