Skip to content

Conversation

@krishung5
Copy link
Contributor

@krishung5 krishung5 commented Dec 10, 2025

Overview:

Fix multimodal EPD examples for vllm version bump to v0.12.0

Closes DIS-1160

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added multimodal embeddings support with a new feature flag for enhanced audio and video processing.
  • Updates

    • Updated GPU device assignments in example deployment scripts for optimized resource allocation.
    • Updated internal metric type handling for improved request statistics tracking.
  • Chores

    • Refactored import paths for better modularity and compatibility.

✏️ Tip: You can customize this high-level summary in your review settings.

@krishung5 krishung5 requested review from a team as code owners December 10, 2025 09:38
@github-actions github-actions bot added the fix label Dec 10, 2025
@krishung5
Copy link
Contributor Author

/ok to test bf2bccf

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

Walkthrough

The PR updates vLLM module import paths to newer locations, introduces a StubEngineClient for preprocessing-only operations, adjusts GPU device assignments in launch scripts, modifies method signatures to accept variadic arguments, replaces RequestMetrics type hints with RequestStateStats, and removes certain initialization calls.

Changes

Cohort / File(s) Summary
Import Path Updates (Tokenizer & Argument Parser)
components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, examples/multimodal/components/processor.py, examples/multimodal/components/audio_encode_worker.py, examples/multimodal/components/encode_worker.py, examples/multimodal/components/video_encode_worker.py, examples/multimodal/components/worker.py
Updated import paths: AnyTokenizer now imported from vllm.tokenizers as TokenizerLike, and FlexibleArgumentParser moved from vllm.utils to vllm.utils.argparse_utils.
StubEngineClient Introduction
components/src/dynamo/vllm/multimodal_utils/chat_processor.py, examples/multimodal/utils/chat_processor.py
Introduced new StubEngineClient class with model_config, input_processor, and io_processor attributes. Instantiated stub engine and OpenAIServingModels in ChatProcessor and CompletionsProcessor initialization; removed enable_force_include_usage flag from stream generation calls.
Protocol Type Hint Updates
components/src/dynamo/vllm/multimodal_utils/protocol.py, examples/multimodal/utils/protocol.py
Replaced RequestMetrics import with RequestStateStats from vllm.v1.metrics.stats; updated MyRequestOutput.metrics field type from Optional[RequestMetrics] to Optional[RequestStateStats].
Launch Script GPU Device Adjustments
examples/backends/vllm/launch/agg_multimodal_epd.sh, examples/backends/vllm/launch/disagg_multimodal_epd.sh, examples/multimodal/launch/audio_agg.sh
Adjusted CUDA_VISIBLE_DEVICES assignments: shifted GPU indices lower (e.g., GPU 1→0, GPU 2→1, GPU 3→2) and added --enable-mm-embeds flag to worker invocations.
Method Signature Updates
examples/multimodal/components/publisher.py
Added *args, **kwargs to NullStatLogger.record and DynamoStatLoggerPublisher.record method signatures to accept variadic arguments.
Feature Toggle Defaults
examples/multimodal/utils/args.py
Added new default key enable_mm_embeds: True in overwrite_args function for multimodal embeddings feature toggle.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

  • StubEngineClient initialization: Verify stub engine attributes (model_config, input_processor, io_processor) are correctly instantiated and compatible with OpenAIServingChat/OpenAIServingCompletion expectations
  • Removed enable_force_include_usage flag: Confirm that removing this flag from stream generation calls does not alter delta/content processing behavior unintentionally
  • RequestStateStats compatibility: Ensure RequestStateStats is a proper replacement for RequestMetrics in terms of serialization and usage patterns
  • GPU device assignments: Verify shifted indices across all scripts are consistent and intentional, particularly in disagg_multimodal_epd.sh where multiple workers are coordinated

Poem

🐰 A rabbit hops through vllm's halls today,
Moving imports to their modern way,
With StubEngineClient, we preprocess with glee,
GPU lanes shift smoothly—one, two, three!
RequestStateStats now measure the flow,
As embeddings bloom where features grow. ✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 7.69% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Description check ⚠️ Warning The pull request description is incomplete. While it includes an Overview and Related Issues section, the Details and 'Where should the reviewer start?' sections lack substantive content. Fill in the Details section with specific changes made (e.g., updated import paths, dependency updates) and provide guidance on which files reviewers should examine first (e.g., chat_processor.py, protocol.py files).
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: Fix multimodal EPD examples for vllm version bump' directly summarizes the main change: updating multimodal EPD examples to be compatible with a vllm version bump.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
examples/multimodal/components/video_encode_worker.py (1)

19-19: Import path update matches changes across other components.

This import path update for FlexibleArgumentParser is consistent with the broader refactoring for vLLM v0.12.0 compatibility.

examples/multimodal/components/publisher.py (1)

74-81: Consistent forward-compatible signature applied to production logger.

The same *args and **kwargs pattern applied to DynamoStatLoggerPublisher.record maintains consistency with NullStatLogger and ensures compatibility with vLLM v0.12.0's stat logger interface.

examples/multimodal/components/worker.py (1)

18-18: Import path update consistent with vLLM v0.12.0 refactoring.

The import path change for FlexibleArgumentParser aligns with similar updates across all multimodal components.

🧹 Nitpick comments (3)
examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)

73-104: Consider adding runtime validation for GPU availability.

The script assumes at least 3 GPUs are available starting from device 0. If the system has fewer GPUs or non-contiguous indexing, the script will fail at runtime with unclear error messages.

Consider adding a validation check near the start of the worker launch section:

# Validate GPU availability
if ! command -v nvidia-smi &> /dev/null; then
  echo "Warning: nvidia-smi not found. Cannot validate GPU availability."
else
  GPU_COUNT=$(nvidia-smi --list-gpus | wc -l)
  if [[ $GPU_COUNT -lt 3 ]]; then
    echo "Error: This script requires at least 3 GPUs. Found: $GPU_COUNT"
    exit 1
  fi
fi

Alternatively, you could make GPU device assignments configurable via environment variables or command-line options (similar to MODEL_NAME and PROMPT_TEMPLATE).

components/src/dynamo/vllm/multimodal_utils/protocol.py (1)

121-141: Note: Feature parity differences with examples/multimodal/utils/protocol.py.

This file lacks AudioContent in the MessageContent union (Line 121) and audio_url in MultiModalInput (Lines 138-141), which exist in the examples version. If audio support is intended for the components module, consider synchronizing these definitions.

components/src/dynamo/vllm/multimodal_utils/chat_processor.py (1)

37-46: StubEngineClient definition is duplicated from examples/multimodal/utils/chat_processor.py.

Both files define identical StubEngineClient classes. Consider extracting this to a shared utility module to avoid code duplication and ensure consistent behavior across examples and components.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe10dbf and bf2bccf.

📒 Files selected for processing (15)
  • components/src/dynamo/vllm/multimodal_handlers/processor_handler.py (1 hunks)
  • components/src/dynamo/vllm/multimodal_utils/chat_processor.py (3 hunks)
  • components/src/dynamo/vllm/multimodal_utils/protocol.py (3 hunks)
  • examples/backends/vllm/launch/agg_multimodal_epd.sh (1 hunks)
  • examples/backends/vllm/launch/disagg_multimodal_epd.sh (1 hunks)
  • examples/multimodal/components/audio_encode_worker.py (1 hunks)
  • examples/multimodal/components/encode_worker.py (1 hunks)
  • examples/multimodal/components/processor.py (1 hunks)
  • examples/multimodal/components/publisher.py (2 hunks)
  • examples/multimodal/components/video_encode_worker.py (1 hunks)
  • examples/multimodal/components/worker.py (1 hunks)
  • examples/multimodal/launch/audio_agg.sh (1 hunks)
  • examples/multimodal/utils/args.py (1 hunks)
  • examples/multimodal/utils/chat_processor.py (3 hunks)
  • examples/multimodal/utils/protocol.py (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

  • examples/multimodal/components/processor.py
  • components/src/dynamo/vllm/multimodal_handlers/processor_handler.py
📚 Learning: 2025-08-25T23:24:42.076Z
Learnt from: tzulingk
Repo: ai-dynamo/dynamo PR: 2666
File: components/backends/trtllm/src/dynamo/trtllm/publisher.py:0-0
Timestamp: 2025-08-25T23:24:42.076Z
Learning: WorkerMetricsPublisher.create_endpoint method signature has been updated in _core.pyi to include metrics_labels parameter: `def create_endpoint(self, component: str, metrics_labels: Optional[List[Tuple[str, str]]] = None)`, making the metrics_labels parameter optional with default value of None.

Applied to files:

  • examples/multimodal/components/publisher.py
📚 Learning: 2025-07-22T10:22:28.972Z
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Applied to files:

  • examples/multimodal/components/video_encode_worker.py
🧬 Code graph analysis (2)
examples/multimodal/utils/chat_processor.py (3)
lib/llm/src/grpc/service/kserve.rs (1)
  • model_config (577-677)
lib/llm/src/local_model.rs (1)
  • model_path (92-95)
lib/async-openai/src/client.rs (1)
  • models (86-88)
components/src/dynamo/vllm/multimodal_utils/chat_processor.py (3)
examples/multimodal/utils/chat_processor.py (1)
  • StubEngineClient (37-46)
lib/llm/src/local_model.rs (1)
  • model_path (92-95)
lib/async-openai/src/client.rs (1)
  • models (86-88)
🪛 Ruff (0.14.8)
examples/multimodal/components/publisher.py

79-79: Unused method argument: args

(ARG002)


80-80: Unused method argument: kwargs

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (19)
examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)

86-97: GPU device assignments and multimodal embeddings flag correctly updated for vLLM v0.12.0.

The changes align with the PR objective:

  • GPU devices shifted from (1, 2, 3) to (0, 1, 2) for all workers (encode, prefill, decode)
  • Echo messages updated to reflect new device assignments
  • --enable-mm-embeds flag added to prefill worker (line 92), which is appropriate since the prefill stage handles multimodal embeddings
  • Encode and decode workers correctly omit this flag as they handle encoding and token generation respectively

Please verify the following for vLLM v0.12.0 compatibility:

  1. All command-line flags (--multimodal-encode-worker, --multimodal-worker, --enable-mm-embeds, etc.) are valid in vLLM v0.12.0
  2. The GPU device numbering (0, 1, 2) aligns with the system's expected GPU layout
  3. The ZMQ topic endpoints (20080, 20081, 20082) remain compatible with v0.12.0
examples/multimodal/launch/audio_agg.sh (1)

94-94: No verification needed for GPU allocation consistency across scripts in the specified locations.

The file examples/multimodal/launch/audio_agg.sh line 94 correctly assigns CUDA_VISIBLE_DEVICES=0. However, the original comment requests verification of consistency across scripts like agg_multimodal_epd.sh and disagg_multimodal_epd.sh in the same directory—these scripts do not exist in the codebase. The verification check is based on a mistaken assumption about the repository structure and cannot be completed as written. If GPU allocation changes are part of this PR, verify them against the scripts that actually exist in examples/multimodal/launch/.

Likely an incorrect or invalid review comment.

examples/multimodal/utils/args.py (1)

162-163: New default enables multimodal embeddings for EPD workflow.

The enable_mm_embeds: True default activates multimodal embeddings input handling, which aligns with the EPD (Encode-Prefill-Decode) architecture where embeddings are transferred between workers. This is consistent with the --enable-mm-embeds flag added to launch scripts.

examples/multimodal/components/processor.py (1)

20-21: Import paths updated for vLLM v0.12.0 module reorganization.

Both import path changes maintain backward compatibility through aliasing and align with the broader refactoring across multimodal components.

examples/multimodal/components/publisher.py (1)

36-43: Forward-compatible method signature for vLLM stat logger interface.

Adding *args and **kwargs to the record method provides forward compatibility with vLLM's StatLoggerBase interface, allowing additional parameters to be passed without breaking the implementation.

Note: The unused parameter warnings from static analysis are expected and safe to ignore—these parameters ensure the signature remains compatible with the vLLM interface, even if additional arguments are added in future versions.

examples/multimodal/components/worker.py (1)

227-257: Connector initializes automatically via __init__; explicit initialize() call was unnecessary.

The Connector class completes all initialization within its __init__ method, which is synchronous and requires no additional setup. There is no initialize() method on the Connector class, so removing the awaited self._connector.initialize() call was correct. The current code self._connector = connect.Connector() is sufficient.

components/src/dynamo/vllm/multimodal_handlers/processor_handler.py (1)

14-14: Import path verified for vLLM v0.12.0.

The import from vllm.tokenizers import TokenizerLike as AnyTokenizer is correct. TokenizerLike is a Protocol type defined in vllm.tokenizers.protocol and is used in the file as a return type annotation for the _create_tokenizer method.

examples/multimodal/components/encode_worker.py (1)

15-15: Import path is valid for vLLM v0.12.0.

The import from vllm.utils.argparse_utils import FlexibleArgumentParser is correct and functional. vLLM v0.12.0 supports both vllm.utils and vllm.utils.argparse_utils as valid import paths for FlexibleArgumentParser. The more specific submodule import is appropriately explicit.

examples/multimodal/utils/protocol.py (2)

29-29: Import path update for vLLM v0.12.0 metrics type.

The RequestMetrics type has been replaced with RequestStateStats from the new vllm.v1.metrics.stats module, aligning with vLLM v0.12.0's restructured metrics API.


169-180: Type and documentation updates are consistent.

The docstring and field type for metrics have been correctly updated to reference RequestStateStats. This change is synchronized with the parallel file at components/src/dynamo/vllm/multimodal_utils/protocol.py.

components/src/dynamo/vllm/multimodal_utils/protocol.py (2)

29-29: Import path update for vLLM v0.12.0 metrics type is correct.

Consistent with the parallel change in examples/multimodal/utils/protocol.py.


159-170: Type and documentation updates are consistent with the examples version.

examples/multimodal/utils/chat_processor.py (3)

136-151: ChatProcessor initialization correctly wires up stub engine and serving models.

The initialization properly creates a StubEngineClient and OpenAIServingModels instance with a BaseModelPath derived from the model config. This aligns with vLLM v0.12.0's updated OpenAIServingChat constructor requirements.


288-300: CompletionsProcessor initialization mirrors ChatProcessor pattern correctly.

The same stub engine and serving models pattern is applied consistently for completions, ensuring both processors are compatible with vLLM v0.12.0.


37-46: StubEngineClient provides a safe minimal interface for preprocessing-only use.

The stub correctly provides the required model_config, input_processor, and io_processor attributes. The None values for the processor attributes are safe—they are not accessed during preprocessing operations (_preprocess_chat, _preprocess_completion), which only use the tokenizer and request metadata. This design mirrors the pattern used in router_standalone/api.py, which successfully passes engine_client=None to OpenAIServingModels and OpenAIServingChat, confirming these classes handle missing engine attributes gracefully.

components/src/dynamo/vllm/multimodal_utils/chat_processor.py (2)

136-151: ChatProcessor initialization changes are consistent with the examples version.


288-300: CompletionsProcessor initialization changes are consistent with the examples version.

examples/multimodal/components/audio_encode_worker.py (2)

199-205: Connector.initialize() removal is safe and correct.

The Connector.initialize() method is marked as deprecated and is a no-op. Actual initialization of the underlying Connection objects is handled internally by _create_connection(), which is called by create_readable() and create_writable(). No explicit initialization is required on the Connector instance itself.


28-28: Import path is correct for vLLM v0.12.0.

The import from vllm.utils.argparse_utils import FlexibleArgumentParser is valid in vLLM v0.12.0. Both the direct import and from vllm.utils import FlexibleArgumentParser work correctly.

Copy link
Contributor

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Kris!

Follow-up sounds good for #4849 (comment)

@krishung5 krishung5 enabled auto-merge (squash) December 10, 2025 18:46
@krishung5 krishung5 force-pushed the krish/fix-multimodal-vllm-bump branch from 46209a3 to bf2bccf Compare December 10, 2025 19:43
@krishung5 krishung5 merged commit 0173d5e into main Dec 10, 2025
57 of 58 checks passed
@krishung5 krishung5 deleted the krish/fix-multimodal-vllm-bump branch December 10, 2025 20:24
dagil-nvidia added a commit that referenced this pull request Dec 10, 2025
…4849)

Cherry-pick of PR #4849 to release/0.7.1 for vLLM v0.12.0 compatibility.

Updates import paths and fixes for vLLM v0.12.0:
- Updated AnyTokenizer import from vllm.tokenizers
- Updated FlexibleArgumentParser import from vllm.utils.argparse_utils
- Updated RequestMetrics to RequestStateStats
- Fixed GPU device assignments in launch scripts
- Added StubEngineClient for preprocessing-only operations

Closes DIS-1160

Original PR: #4849
Original author: Kris Hung <[email protected]>

Signed-off-by: Dan Gil <[email protected]>
nv-anants added a commit that referenced this pull request Dec 11, 2025
zxue2 pushed a commit to zxue2/dynamo that referenced this pull request Dec 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants