fix: Fix multimodal EPD examples for vllm version bump #4849

krishung5 · 2025-12-10T09:38:53Z

Overview:

Fix multimodal EPD examples for vllm version bump to v0.12.0

Closes DIS-1160

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added multimodal embeddings support with a new feature flag for enhanced audio and video processing.
Updates
- Updated GPU device assignments in example deployment scripts for optimized resource allocation.
- Updated internal metric type handling for improved request statistics tracking.
Chores
- Refactored import paths for better modularity and compatibility.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

krishung5 · 2025-12-10T09:39:18Z

/ok to test bf2bccf

coderabbitai · 2025-12-10T09:44:42Z

Walkthrough

The PR updates vLLM module import paths to newer locations, introduces a StubEngineClient for preprocessing-only operations, adjusts GPU device assignments in launch scripts, modifies method signatures to accept variadic arguments, replaces RequestMetrics type hints with RequestStateStats, and removes certain initialization calls.

Changes

Cohort / File(s)	Summary
Import Path Updates (Tokenizer & Argument Parser) `components/src/dynamo/vllm/multimodal_handlers/processor_handler.py`, `examples/multimodal/components/processor.py`, `examples/multimodal/components/audio_encode_worker.py`, `examples/multimodal/components/encode_worker.py`, `examples/multimodal/components/video_encode_worker.py`, `examples/multimodal/components/worker.py`	Updated import paths: AnyTokenizer now imported from `vllm.tokenizers` as TokenizerLike, and FlexibleArgumentParser moved from `vllm.utils` to `vllm.utils.argparse_utils`.
StubEngineClient Introduction `components/src/dynamo/vllm/multimodal_utils/chat_processor.py`, `examples/multimodal/utils/chat_processor.py`	Introduced new StubEngineClient class with model_config, input_processor, and io_processor attributes. Instantiated stub engine and OpenAIServingModels in ChatProcessor and CompletionsProcessor initialization; removed enable_force_include_usage flag from stream generation calls.
Protocol Type Hint Updates `components/src/dynamo/vllm/multimodal_utils/protocol.py`, `examples/multimodal/utils/protocol.py`	Replaced RequestMetrics import with RequestStateStats from `vllm.v1.metrics.stats`; updated MyRequestOutput.metrics field type from `Optional[RequestMetrics]` to `Optional[RequestStateStats]`.
Launch Script GPU Device Adjustments `examples/backends/vllm/launch/agg_multimodal_epd.sh`, `examples/backends/vllm/launch/disagg_multimodal_epd.sh`, `examples/multimodal/launch/audio_agg.sh`	Adjusted CUDA_VISIBLE_DEVICES assignments: shifted GPU indices lower (e.g., GPU 1→0, GPU 2→1, GPU 3→2) and added --enable-mm-embeds flag to worker invocations.
Method Signature Updates `examples/multimodal/components/publisher.py`	Added `args, *kwargs` to NullStatLogger.record and DynamoStatLoggerPublisher.record method signatures to accept variadic arguments.
Feature Toggle Defaults `examples/multimodal/utils/args.py`	Added new default key `enable_mm_embeds: True` in overwrite_args function for multimodal embeddings feature toggle.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

StubEngineClient initialization: Verify stub engine attributes (model_config, input_processor, io_processor) are correctly instantiated and compatible with OpenAIServingChat/OpenAIServingCompletion expectations
Removed enable_force_include_usage flag: Confirm that removing this flag from stream generation calls does not alter delta/content processing behavior unintentionally
RequestStateStats compatibility: Ensure RequestStateStats is a proper replacement for RequestMetrics in terms of serialization and usage patterns
GPU device assignments: Verify shifted indices across all scripts are consistent and intentional, particularly in disagg_multimodal_epd.sh where multiple workers are coordinated

Poem

🐰 A rabbit hops through vllm's halls today,
Moving imports to their modern way,
With StubEngineClient, we preprocess with glee,
GPU lanes shift smoothly—one, two, three!
RequestStateStats now measure the flow,
As embeddings bloom where features grow. ✨

Pre-merge checks

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.
Description check	⚠️ Warning	The pull request description is incomplete. While it includes an Overview and Related Issues section, the Details and 'Where should the reviewer start?' sections lack substantive content.	Fill in the Details section with specific changes made (e.g., updated import paths, dependency updates) and provide guidance on which files reviewers should examine first (e.g., chat_processor.py, protocol.py files).

✅ Passed checks (1 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: Fix multimodal EPD examples for vllm version bump' directly summarizes the main change: updating multimodal EPD examples to be compatible with a vllm version bump.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (3)

examples/multimodal/components/video_encode_worker.py (1)

19-19: Import path update matches changes across other components.

This import path update for FlexibleArgumentParser is consistent with the broader refactoring for vLLM v0.12.0 compatibility.

examples/multimodal/components/publisher.py (1)

74-81: Consistent forward-compatible signature applied to production logger.

The same *args and **kwargs pattern applied to DynamoStatLoggerPublisher.record maintains consistency with NullStatLogger and ensures compatibility with vLLM v0.12.0's stat logger interface.

examples/multimodal/components/worker.py (1)

18-18: Import path update consistent with vLLM v0.12.0 refactoring.

The import path change for FlexibleArgumentParser aligns with similar updates across all multimodal components.

🧹 Nitpick comments (3)

examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)
73-104: Consider adding runtime validation for GPU availability.

The script assumes at least 3 GPUs are available starting from device 0. If the system has fewer GPUs or non-contiguous indexing, the script will fail at runtime with unclear error messages.

Consider adding a validation check near the start of the worker launch section:
# Validate GPU availability
if ! command -v nvidia-smi &> /dev/null; then
  echo "Warning: nvidia-smi not found. Cannot validate GPU availability."
else
  GPU_COUNT=$(nvidia-smi --list-gpus | wc -l)
  if [[ $GPU_COUNT -lt 3 ]]; then
    echo "Error: This script requires at least 3 GPUs. Found: $GPU_COUNT"
    exit 1
  fi
fi
Alternatively, you could make GPU device assignments configurable via environment variables or command-line options (similar to MODEL_NAME and PROMPT_TEMPLATE).
components/src/dynamo/vllm/multimodal_utils/protocol.py (1)

121-141: Note: Feature parity differences with examples/multimodal/utils/protocol.py.

This file lacks AudioContent in the MessageContent union (Line 121) and audio_url in MultiModalInput (Lines 138-141), which exist in the examples version. If audio support is intended for the components module, consider synchronizing these definitions.

components/src/dynamo/vllm/multimodal_utils/chat_processor.py (1)

37-46: StubEngineClient definition is duplicated from examples/multimodal/utils/chat_processor.py.

Both files define identical StubEngineClient classes. Consider extracting this to a shared utility module to avoid code duplication and ensure consistent behavior across examples and components.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe10dbf and bf2bccf.

📒 Files selected for processing (15)

components/src/dynamo/vllm/multimodal_handlers/processor_handler.py (1 hunks)
components/src/dynamo/vllm/multimodal_utils/chat_processor.py (3 hunks)
components/src/dynamo/vllm/multimodal_utils/protocol.py (3 hunks)
examples/backends/vllm/launch/agg_multimodal_epd.sh (1 hunks)
examples/backends/vllm/launch/disagg_multimodal_epd.sh (1 hunks)
examples/multimodal/components/audio_encode_worker.py (1 hunks)
examples/multimodal/components/encode_worker.py (1 hunks)
examples/multimodal/components/processor.py (1 hunks)
examples/multimodal/components/publisher.py (2 hunks)
examples/multimodal/components/video_encode_worker.py (1 hunks)
examples/multimodal/components/worker.py (1 hunks)
examples/multimodal/launch/audio_agg.sh (1 hunks)
examples/multimodal/utils/args.py (1 hunks)
examples/multimodal/utils/chat_processor.py (3 hunks)
examples/multimodal/utils/protocol.py (3 hunks)

🧰 Additional context used

🧠 Learnings (3)

📚 Learning: 2025-10-28T04:09:48.264Z

Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

examples/multimodal/components/processor.py
components/src/dynamo/vllm/multimodal_handlers/processor_handler.py

📚 Learning: 2025-08-25T23:24:42.076Z

Learnt from: tzulingk
Repo: ai-dynamo/dynamo PR: 2666
File: components/backends/trtllm/src/dynamo/trtllm/publisher.py:0-0
Timestamp: 2025-08-25T23:24:42.076Z
Learning: WorkerMetricsPublisher.create_endpoint method signature has been updated in _core.pyi to include metrics_labels parameter: `def create_endpoint(self, component: str, metrics_labels: Optional[List[Tuple[str, str]]] = None)`, making the metrics_labels parameter optional with default value of None.

Applied to files:

examples/multimodal/components/publisher.py

📚 Learning: 2025-07-22T10:22:28.972Z

Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Applied to files:

examples/multimodal/components/video_encode_worker.py

🧬 Code graph analysis (2)

examples/multimodal/utils/chat_processor.py (3)

lib/llm/src/grpc/service/kserve.rs (1)

model_config (577-677)

lib/llm/src/local_model.rs (1)

model_path (92-95)

lib/async-openai/src/client.rs (1)

models (86-88)

components/src/dynamo/vllm/multimodal_utils/chat_processor.py (3)

examples/multimodal/utils/chat_processor.py (1)

StubEngineClient (37-46)

lib/llm/src/local_model.rs (1)

model_path (92-95)

lib/async-openai/src/client.rs (1)

models (86-88)

🪛 Ruff (0.14.8)

examples/multimodal/components/publisher.py

79-79: Unused method argument: args

(ARG002)

80-80: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (19)

examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)

86-97: GPU device assignments and multimodal embeddings flag correctly updated for vLLM v0.12.0.

The changes align with the PR objective:

GPU devices shifted from (1, 2, 3) to (0, 1, 2) for all workers (encode, prefill, decode)

Echo messages updated to reflect new device assignments

--enable-mm-embeds flag added to prefill worker (line 92), which is appropriate since the prefill stage handles multimodal embeddings

Encode and decode workers correctly omit this flag as they handle encoding and token generation respectively

Please verify the following for vLLM v0.12.0 compatibility:

All command-line flags (--multimodal-encode-worker, --multimodal-worker, --enable-mm-embeds, etc.) are valid in vLLM v0.12.0

The GPU device numbering (0, 1, 2) aligns with the system's expected GPU layout

The ZMQ topic endpoints (20080, 20081, 20082) remain compatible with v0.12.0

examples/multimodal/launch/audio_agg.sh (1)

94-94: No verification needed for GPU allocation consistency across scripts in the specified locations.

The file examples/multimodal/launch/audio_agg.sh line 94 correctly assigns CUDA_VISIBLE_DEVICES=0. However, the original comment requests verification of consistency across scripts like agg_multimodal_epd.sh and disagg_multimodal_epd.sh in the same directory—these scripts do not exist in the codebase. The verification check is based on a mistaken assumption about the repository structure and cannot be completed as written. If GPU allocation changes are part of this PR, verify them against the scripts that actually exist in examples/multimodal/launch/.

Likely an incorrect or invalid review comment.

examples/multimodal/utils/args.py (1)

162-163: New default enables multimodal embeddings for EPD workflow.

The enable_mm_embeds: True default activates multimodal embeddings input handling, which aligns with the EPD (Encode-Prefill-Decode) architecture where embeddings are transferred between workers. This is consistent with the --enable-mm-embeds flag added to launch scripts.

examples/multimodal/components/processor.py (1)

20-21: Import paths updated for vLLM v0.12.0 module reorganization.

Both import path changes maintain backward compatibility through aliasing and align with the broader refactoring across multimodal components.

examples/multimodal/components/publisher.py (1)

36-43: Forward-compatible method signature for vLLM stat logger interface.

Adding *args and **kwargs to the record method provides forward compatibility with vLLM's StatLoggerBase interface, allowing additional parameters to be passed without breaking the implementation.

Note: The unused parameter warnings from static analysis are expected and safe to ignore—these parameters ensure the signature remains compatible with the vLLM interface, even if additional arguments are added in future versions.

examples/multimodal/components/worker.py (1)

227-257: Connector initializes automatically via __init__; explicit initialize() call was unnecessary.

The Connector class completes all initialization within its __init__ method, which is synchronous and requires no additional setup. There is no initialize() method on the Connector class, so removing the awaited self._connector.initialize() call was correct. The current code self._connector = connect.Connector() is sufficient.

components/src/dynamo/vllm/multimodal_handlers/processor_handler.py (1)

14-14: Import path verified for vLLM v0.12.0.

The import from vllm.tokenizers import TokenizerLike as AnyTokenizer is correct. TokenizerLike is a Protocol type defined in vllm.tokenizers.protocol and is used in the file as a return type annotation for the _create_tokenizer method.

examples/multimodal/components/encode_worker.py (1)

15-15: Import path is valid for vLLM v0.12.0.

The import from vllm.utils.argparse_utils import FlexibleArgumentParser is correct and functional. vLLM v0.12.0 supports both vllm.utils and vllm.utils.argparse_utils as valid import paths for FlexibleArgumentParser. The more specific submodule import is appropriately explicit.

examples/multimodal/utils/protocol.py (2)

29-29: Import path update for vLLM v0.12.0 metrics type.

The RequestMetrics type has been replaced with RequestStateStats from the new vllm.v1.metrics.stats module, aligning with vLLM v0.12.0's restructured metrics API.

169-180: Type and documentation updates are consistent.

The docstring and field type for metrics have been correctly updated to reference RequestStateStats. This change is synchronized with the parallel file at components/src/dynamo/vllm/multimodal_utils/protocol.py.

components/src/dynamo/vllm/multimodal_utils/protocol.py (2)

29-29: Import path update for vLLM v0.12.0 metrics type is correct.

Consistent with the parallel change in examples/multimodal/utils/protocol.py.

159-170: Type and documentation updates are consistent with the examples version.

examples/multimodal/utils/chat_processor.py (3)

136-151: ChatProcessor initialization correctly wires up stub engine and serving models.

The initialization properly creates a StubEngineClient and OpenAIServingModels instance with a BaseModelPath derived from the model config. This aligns with vLLM v0.12.0's updated OpenAIServingChat constructor requirements.

288-300: CompletionsProcessor initialization mirrors ChatProcessor pattern correctly.

The same stub engine and serving models pattern is applied consistently for completions, ensuring both processors are compatible with vLLM v0.12.0.

37-46: StubEngineClient provides a safe minimal interface for preprocessing-only use.

The stub correctly provides the required model_config, input_processor, and io_processor attributes. The None values for the processor attributes are safe—they are not accessed during preprocessing operations (_preprocess_chat, _preprocess_completion), which only use the tokenizer and request metadata. This design mirrors the pattern used in router_standalone/api.py, which successfully passes engine_client=None to OpenAIServingModels and OpenAIServingChat, confirming these classes handle missing engine attributes gracefully.

components/src/dynamo/vllm/multimodal_utils/chat_processor.py (2)

136-151: ChatProcessor initialization changes are consistent with the examples version.

288-300: CompletionsProcessor initialization changes are consistent with the examples version.

examples/multimodal/components/audio_encode_worker.py (2)

199-205: Connector.initialize() removal is safe and correct.

The Connector.initialize() method is marked as deprecated and is a no-op. Actual initialization of the underlying Connection objects is handled internally by _create_connection(), which is called by create_readable() and create_writable(). No explicit initialization is required on the Connector instance itself.

28-28: Import path is correct for vLLM v0.12.0.

The import from vllm.utils.argparse_utils import FlexibleArgumentParser is valid in vLLM v0.12.0. Both the direct import and from vllm.utils import FlexibleArgumentParser work correctly.

components/src/dynamo/vllm/multimodal_handlers/processor_handler.py

rmccorm4

Thanks Kris!

Follow-up sounds good for #4849 (comment)

…4849) Cherry-pick of PR #4849 to release/0.7.1 for vLLM v0.12.0 compatibility. Updates import paths and fixes for vLLM v0.12.0: - Updated AnyTokenizer import from vllm.tokenizers - Updated FlexibleArgumentParser import from vllm.utils.argparse_utils - Updated RequestMetrics to RequestStateStats - Fixed GPU device assignments in launch scripts - Added StubEngineClient for preprocessing-only operations Closes DIS-1160 Original PR: #4849 Original author: Kris Hung <[email protected]> Signed-off-by: Dan Gil <[email protected]>

Co-authored-by: Anant Sharma <[email protected]>

Fix multimodal epd for vllm version bump

bf2bccf

krishung5 requested review from a team as code owners December 10, 2025 09:38

pull-request-size bot added the size/L label Dec 10, 2025

github-actions bot added the fix label Dec 10, 2025

coderabbitai bot reviewed Dec 10, 2025

View reviewed changes

rmccorm4 reviewed Dec 10, 2025

View reviewed changes

components/src/dynamo/vllm/multimodal_handlers/processor_handler.py Show resolved Hide resolved

rmccorm4 approved these changes Dec 10, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 18:46 Inactive

krishung5 enabled auto-merge (squash) December 10, 2025 18:46

copy-pr-bot bot temporarily deployed to GITLAB December 10, 2025 18:46 Inactive

krishung5 force-pushed the krish/fix-multimodal-vllm-bump branch from 46209a3 to bf2bccf Compare December 10, 2025 19:43

krishung5 merged commit 0173d5e into main Dec 10, 2025
57 of 58 checks passed

krishung5 deleted the krish/fix-multimodal-vllm-bump branch December 10, 2025 20:24

krishung5 added a commit that referenced this pull request Dec 10, 2025

fix: Fix multimodal EPD examples for vllm version bump (#4849)

06aaac3

krishung5 mentioned this pull request Dec 10, 2025

fix: Fix multimodal EPD examples for vllm version bump (#4849) #4870

Merged

dagil-nvidia mentioned this pull request Dec 10, 2025

fix: Cherry-pick multimodal audio support and vLLM v0.12.0 fixes to release/0.7.1 #4871

Closed

2 tasks

nv-anants added a commit that referenced this pull request Dec 11, 2025

fix: Fix multimodal EPD examples for vllm version bump (#4849) (#4870)

fdaab63

Co-authored-by: Anant Sharma <[email protected]>

zxue2 pushed a commit to zxue2/dynamo that referenced this pull request Dec 11, 2025

fix: Fix multimodal EPD examples for vllm version bump (ai-dynamo#4849)

10c2c7b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Fix multimodal EPD examples for vllm version bump #4849

fix: Fix multimodal EPD examples for vllm version bump #4849

Uh oh!

krishung5 commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

krishung5 commented Dec 10, 2025

Uh oh!

coderabbitai bot commented Dec 10, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

rmccorm4 left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Fix multimodal EPD examples for vllm version bump #4849

fix: Fix multimodal EPD examples for vllm version bump #4849

Uh oh!

Conversation

krishung5 commented Dec 10, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

krishung5 commented Dec 10, 2025

Uh oh!

coderabbitai bot commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rmccorm4 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

krishung5 commented Dec 10, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 10, 2025 •

edited

Loading

rmccorm4 left a comment •

edited

Loading