Skip to content

Conversation

@vipyne
Copy link
Member

@vipyne vipyne commented Nov 5, 2025

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

minimax and others added 10 commits November 4, 2025 15:39
- Add support for speech-2.6-hd and speech-2.6-turbo models
- Add 16 new languages (total 40): Afrikaans, Bulgarian, Catalan, Danish, Persian, Filipino, Hebrew, Croatian, Hungarian, Malay, Norwegian, Nynorsk, Slovak, Slovenian, Swedish, Tamil
- Add new emotions: calm and fluent
- Add new parameters: text_normalization (renamed from english_normalization), latex_read, force_cbr, exclude_aggregated_audio, subtitle_enable, subtitle_type
- Extract trace_id from response headers for all requests
- Improve error handling for non-streaming error responses
- Add detailed extra_info logging (audio_length, audio_size, usage_characters, word_count)
- Add validation warnings for language/model compatibility
- Fix silent error issue where HTTP 200 responses with errors were ignored

BREAKING CHANGE: Renamed parameter english_normalization to text_normalization
@codecov
Copy link

codecov bot commented Nov 5, 2025

Codecov Report

❌ Patch coverage is 0% with 76 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pipecat/services/minimax/tts.py 0.00% 76 Missing ⚠️
Files with missing lines Coverage Δ
src/pipecat/services/minimax/tts.py 0.00% <0.00%> (ø)

... and 1 file with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

latex_read: Enable LaTeX formula reading.
force_cbr: Enable Constant Bitrate (CBR) for audio encoding (MP3 only).
exclude_aggregated_audio: Whether to exclude aggregated audio in final chunk.
subtitle_enable: Enable subtitle generation (non-streaming only).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should consider removing since this parameter and subtitle_type as they're for non-streaming only. This MiniMax implementation is streaming only.

The alternative would be to add a streaming arg to allow for a non-streaming mode`, but I don't think that's a good idea.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a mismatch. Our model can support streaming subtitle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps (non-streaming only). should be omitted from comment then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhenyujia23-crypto is there a way subtitles can be added to the example 07y-interruptible-minimax.py to exercise this parameter?

"disgusted", "surprised", "calm", "fluent").
english_normalization: Deprecated; use `text_normalization` instead
text_normalization: Enable text normalization (Chinese/English).
latex_read: Enable LaTeX formula reading.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually applicable? I guess LLMs can output LaTeX format, but I don't know if it makes sense for a TTS to do the same 🤷

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we do support. but need transform to the
Clipboard_Screenshot_1762355106

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess there's no harm in including it, but generally voice agents are best if they skip pronouncing complex math (code or tables).

english_normalization: Deprecated; use `text_normalization` instead
text_normalization: Enable text normalization (Chinese/English).
latex_read: Enable LaTeX formula reading.
force_cbr: Enable Constant Bitrate (CBR) for audio encoding (MP3 only).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mp3 and option? Again, this probably doesn't make sense for a streaming use case.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This para default is false, doesn't effect streaming. For streaming use case, user can just ignore it.

Clipboard_Screenshot_1762355257

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's remove if it doesn't apply to streaming then.

if service_lang:
self._settings["language_boost"] = service_lang

# Validate language-model compatibility
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we should include this type of checking. It's likely to become stale and need to be maintained. It would be better to rely on the service returning an error about unsupported languages for a given model.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove it

if params.text_normalization is not None:
self._settings["voice_setting"]["text_normalization"] = params.text_normalization

# Add latex_read if provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider removing based if the parameter isn't included.

if params.latex_read is not None:
self._settings["voice_setting"]["latex_read"] = params.latex_read

# Add force_cbr if provided (for MP3 format only)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. Consider removing based if the parameter isn't included.

if params.force_cbr is not None:
self._settings["audio_setting"]["force_cbr"] = params.force_cbr

# Add subtitle settings if provided
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should remove the parameter and this logic since non-streaming isn't supported.

yield TTSStartedFrame()

# Process the streaming response
logger.trace(f"Starting to read streaming response, status={response.status}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove


async for chunk in response.content.iter_chunked(CHUNK_SIZE):
chunk_count += 1
logger.trace(f"Received chunk #{chunk_count}, size={len(chunk)} bytes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.


# Log raw buffer content for debugging
if chunk_count == 1:
logger.trace(f"Raw buffer content: {buffer[:200]}") # First 200 bytes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove.

data = json.loads(data_block[5:].decode("utf-8"))
# Skip data blocks containing extra_info
data_str = data_block[5:].decode("utf-8")
logger.trace(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this and other trace logging.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I think we still need a place to log the trace_id, coz, it will help us to know the problem from our service.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the tracing code necessary for other folks who use this class?
Perhaps a better approach is to create a git patch with all the tracing code that can be applied locally to a user's pipecat code.
I see that it is useful, but it feels like it should be a more "opt-in" feature.

if not chunk_data:
continue

# Check for subtitle file (if subtitle generation is enabled)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

Copy link
Contributor

@markbackman markbackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think the biggest issue to sort out is what the scope of this service is. Pipecat is for real-time use cases, so non-streaming is not really applicable. I think we should remove the non-streaming items from this class to keep it focused and simple.

Also, remove the trace logging, as it's only applicable during development and isn't practical for real-word debugging.

zhenyujia23-crypto pushed a commit to zhenyujia23-crypto/pipecat that referenced this pull request Nov 10, 2025
- Remove non-essential parameters: latex_read, force_cbr, exclude_aggregated_audio
- Simplify trace_id logging (keep for errors only, remove from success logs)
- Make subtitle generation support streaming with word-level timestamps
- Remove non-streaming logic warnings and validation
- Clean up code structure and remove duplicate parameter handling
- Reduce code complexity while maintaining core functionality

Responds to PR pipecat-ai#2981 reviewer comments about keeping TTS service focused on real-time use cases.
zhenyujia23-crypto added a commit to zhenyujia23-crypto/pipecat that referenced this pull request Nov 10, 2025
- Remove force_cbr parameter and complex audio encoding options
- Remove exclude_aggregated_audio streaming control
- Remove all logger.debug statements for cleaner logging
- Simplify InputParams class (11 → 7 parameters)
- Streamline subtitle support to word-level only
- Remove sentence-level subtitle functionality
- Fix geographic comment: "west of unite state" → "western United States"
- Update documentation for streaming-only word-level subtitles
- Maintain all core TTS functionality for real-time use cases
- Preserve language support (40 languages), emotions, and trace ID tracking

This addresses PR pipecat-ai#2981 feedback by simplifying the service to focus purely on
streaming TTS with word-level subtitles, removing unnecessary complexity and
debugging code while maintaining essential functionality.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants