[Model] Qwen3TTS

### Detailed description of the requested feature

Support for quantization and deployment of Qwen3-TTS-style models within the NVIDIA optimization stack, ideally including compatibility with TensorRT-LLM or a clearly defined alternative pipeline.

Specifically, the request is for:

Ability to quantize non-Transformer / non-text-generation models (e.g., TTS pipelines) using a unified workflow similar to LLMs
Support for multi-component models, including:
text encoder (Transformer-based)
acoustic model (autoregressive / diffusion / codec-based)
vocoder (CNN-based waveform generator)

End-to-end export pipeline:

PyTorch → Quantization → ONNX → TensorRT engine(s)
Guidance or tooling for:
handling models not implemented in Hugging Face Transformers
exporting models with custom forward passes or generation loops
Optional: partial support for prefill/decode-style optimization where applicable (e.g., transformer submodules)

This would enable efficient deployment of modern TTS systems on NVIDIA GPUs with reduced latency and memory usage.

### Describe alternatives you've considered
1. torch AO library

### Target hardware/use case

1. NVIDIA GPUs (eg. A5000, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Model] Qwen3TTS #1090

Detailed description of the requested feature

Describe alternatives you've considered

Target hardware/use case

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Model] Qwen3TTS #1090

Description

Detailed description of the requested feature

Describe alternatives you've considered

Target hardware/use case

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions