fix: Cherry-pick multimodal audio support and vLLM v0.12.0 fixes to release/0.7.1 #4871

dagil-nvidia · 2025-12-10T20:33:23Z

fix: Cherry-pick multimodal audio support and vLLM v0.12.0 fixes to release/0.7.1

Overview

This PR cherry-picks two related changes to release/0.7.1:

PR feat: Add vLLM multimodal audio support #2760 - feat: Add vLLM multimodal audio support
PR fix: Fix multimodal EPD examples for vllm version bump #4849 - fix: Fix multimodal EPD examples for vllm version bump

Why Both PRs?

PR #4849 fixes vLLM v0.12.0 compatibility issues in multimodal examples, including audio multimodal files. However, the audio feature (PR #2760) was never cherry-picked to the 0.7.x release branch. To properly apply the fixes from #4849, the audio feature must be present first.

Changes Included

From PR #2760 (Audio Multimodal Support)

New AudioEncodeWorker for processing audio inputs
Audio aggregated serving example (audio_agg.sh)
Audio disaggregated serving example (audio_disagg.sh)
Audio loader utilities (audio_loader.py)
Documentation updates for audio multimodal serving
Rust async-openai types for audio messages

From PR #4849 (vLLM v0.12.0 Fixes)

Updated AnyTokenizer import from vllm.tokenizers
Updated FlexibleArgumentParser import from vllm.utils.argparse_utils
Updated RequestMetrics to RequestStateStats
Fixed GPU device assignments in launch scripts
Added StubEngineClient for preprocessing-only operations

Conflict Resolutions

docs/backends/vllm/multimodal.md - Fixed relative path references (kept README.md instead of ../../docs/backends/vllm/README.md)
examples/multimodal/launch/audio_agg.sh - Applied GPU device assignment fix from fix: Fix multimodal EPD examples for vllm version bump #4849

Testing

CI passes
Manual verification of multimodal examples (if applicable)

Related Issues

Closes DIS-1160

Original PRs

Note on GPG Signing

These commits are not GPG-signed. To trigger automated backend tests, please comment:

/ok to test <commit-sha>

Cherry-pick of PR #2760 to release/0.7.1. Adds audio multimodal support for vLLM including: - AudioEncodeWorker for processing audio inputs - Audio aggregated and disaggregated serving examples - Audio loader utilities - Documentation for audio multimodal serving Original PR: #2760 Original author: Yuekai Zhang <[email protected]> Co-authored-by: Kris Hung <[email protected]> Signed-off-by: Dan Gil <[email protected]>

…4849) Cherry-pick of PR #4849 to release/0.7.1 for vLLM v0.12.0 compatibility. Updates import paths and fixes for vLLM v0.12.0: - Updated AnyTokenizer import from vllm.tokenizers - Updated FlexibleArgumentParser import from vllm.utils.argparse_utils - Updated RequestMetrics to RequestStateStats - Fixed GPU device assignments in launch scripts - Added StubEngineClient for preprocessing-only operations Closes DIS-1160 Original PR: #4849 Original author: Kris Hung <[email protected]> Signed-off-by: Dan Gil <[email protected]>

copy-pr-bot · 2025-12-10T20:33:27Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

dagil-nvidia · 2025-12-10T20:33:41Z

/ok to test 84c405c

dagil-nvidia · 2025-12-10T20:37:47Z

Closing PR - will revisit the cherry-pick approach

dagil-nvidia added 2 commits December 10, 2025 14:27

dagil-nvidia requested a review from krishung5 December 10, 2025 20:33

dagil-nvidia requested review from a team as code owners December 10, 2025 20:33

pull-request-size bot added the size/XL label Dec 10, 2025

github-actions bot added the fix label Dec 10, 2025

dagil-nvidia closed this Dec 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Cherry-pick multimodal audio support and vLLM v0.12.0 fixes to release/0.7.1 #4871

fix: Cherry-pick multimodal audio support and vLLM v0.12.0 fixes to release/0.7.1 #4871

Uh oh!

dagil-nvidia commented Dec 10, 2025

Uh oh!

copy-pr-bot bot commented Dec 10, 2025

Uh oh!

dagil-nvidia commented Dec 10, 2025

Uh oh!

dagil-nvidia commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: Cherry-pick multimodal audio support and vLLM v0.12.0 fixes to release/0.7.1 #4871

fix: Cherry-pick multimodal audio support and vLLM v0.12.0 fixes to release/0.7.1 #4871

Uh oh!

Conversation

dagil-nvidia commented Dec 10, 2025

fix: Cherry-pick multimodal audio support and vLLM v0.12.0 fixes to release/0.7.1

Overview

Why Both PRs?

Changes Included

From PR #2760 (Audio Multimodal Support)

From PR #4849 (vLLM v0.12.0 Fixes)

Conflict Resolutions

Testing

Related Issues

Original PRs

Note on GPG Signing

Uh oh!

copy-pr-bot bot commented Dec 10, 2025

Uh oh!

dagil-nvidia commented Dec 10, 2025

Uh oh!

dagil-nvidia commented Dec 10, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants