Skip to content

Conversation

@dagil-nvidia
Copy link
Contributor

fix: Cherry-pick multimodal audio support and vLLM v0.12.0 fixes to release/0.7.1

Overview

This PR cherry-picks two related changes to release/0.7.1:

  1. PR feat: Add vLLM multimodal audio support #2760 - feat: Add vLLM multimodal audio support
  2. PR fix: Fix multimodal EPD examples for vllm version bump #4849 - fix: Fix multimodal EPD examples for vllm version bump

Why Both PRs?

PR #4849 fixes vLLM v0.12.0 compatibility issues in multimodal examples, including audio multimodal files. However, the audio feature (PR #2760) was never cherry-picked to the 0.7.x release branch. To properly apply the fixes from #4849, the audio feature must be present first.

Changes Included

From PR #2760 (Audio Multimodal Support)

  • New AudioEncodeWorker for processing audio inputs
  • Audio aggregated serving example (audio_agg.sh)
  • Audio disaggregated serving example (audio_disagg.sh)
  • Audio loader utilities (audio_loader.py)
  • Documentation updates for audio multimodal serving
  • Rust async-openai types for audio messages

From PR #4849 (vLLM v0.12.0 Fixes)

  • Updated AnyTokenizer import from vllm.tokenizers
  • Updated FlexibleArgumentParser import from vllm.utils.argparse_utils
  • Updated RequestMetrics to RequestStateStats
  • Fixed GPU device assignments in launch scripts
  • Added StubEngineClient for preprocessing-only operations

Conflict Resolutions

  1. docs/backends/vllm/multimodal.md - Fixed relative path references (kept README.md instead of ../../docs/backends/vllm/README.md)
  2. examples/multimodal/launch/audio_agg.sh - Applied GPU device assignment fix from fix: Fix multimodal EPD examples for vllm version bump #4849

Testing

  • CI passes
  • Manual verification of multimodal examples (if applicable)

Related Issues

Original PRs

Note on GPG Signing

These commits are not GPG-signed. To trigger automated backend tests, please comment:

/ok to test <commit-sha>

Cherry-pick of PR #2760 to release/0.7.1.

Adds audio multimodal support for vLLM including:
- AudioEncodeWorker for processing audio inputs
- Audio aggregated and disaggregated serving examples
- Audio loader utilities
- Documentation for audio multimodal serving

Original PR: #2760
Original author: Yuekai Zhang <[email protected]>
Co-authored-by: Kris Hung <[email protected]>

Signed-off-by: Dan Gil <[email protected]>
…4849)

Cherry-pick of PR #4849 to release/0.7.1 for vLLM v0.12.0 compatibility.

Updates import paths and fixes for vLLM v0.12.0:
- Updated AnyTokenizer import from vllm.tokenizers
- Updated FlexibleArgumentParser import from vllm.utils.argparse_utils
- Updated RequestMetrics to RequestStateStats
- Fixed GPU device assignments in launch scripts
- Added StubEngineClient for preprocessing-only operations

Closes DIS-1160

Original PR: #4849
Original author: Kris Hung <[email protected]>

Signed-off-by: Dan Gil <[email protected]>
@dagil-nvidia dagil-nvidia requested review from a team as code owners December 10, 2025 20:33
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@dagil-nvidia
Copy link
Contributor Author

/ok to test 84c405c

@dagil-nvidia
Copy link
Contributor Author

Closing PR - will revisit the cherry-pick approach

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants