-
Notifications
You must be signed in to change notification settings - Fork 737
fix: Fix multimodal EPD examples for vllm version bump #4849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ok to test bf2bccf |
WalkthroughThe PR updates vLLM module import paths to newer locations, introduces a StubEngineClient for preprocessing-only operations, adjusts GPU device assignments in launch scripts, modifies method signatures to accept variadic arguments, replaces RequestMetrics type hints with RequestStateStats, and removes certain initialization calls. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes
Poem
Pre-merge checks❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
examples/multimodal/components/video_encode_worker.py (1)
19-19: Import path update matches changes across other components.This import path update for
FlexibleArgumentParseris consistent with the broader refactoring for vLLM v0.12.0 compatibility.examples/multimodal/components/publisher.py (1)
74-81: Consistent forward-compatible signature applied to production logger.The same
*argsand**kwargspattern applied toDynamoStatLoggerPublisher.recordmaintains consistency withNullStatLoggerand ensures compatibility with vLLM v0.12.0's stat logger interface.examples/multimodal/components/worker.py (1)
18-18: Import path update consistent with vLLM v0.12.0 refactoring.The import path change for
FlexibleArgumentParseraligns with similar updates across all multimodal components.
🧹 Nitpick comments (3)
examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)
73-104: Consider adding runtime validation for GPU availability.The script assumes at least 3 GPUs are available starting from device 0. If the system has fewer GPUs or non-contiguous indexing, the script will fail at runtime with unclear error messages.
Consider adding a validation check near the start of the worker launch section:
# Validate GPU availability if ! command -v nvidia-smi &> /dev/null; then echo "Warning: nvidia-smi not found. Cannot validate GPU availability." else GPU_COUNT=$(nvidia-smi --list-gpus | wc -l) if [[ $GPU_COUNT -lt 3 ]]; then echo "Error: This script requires at least 3 GPUs. Found: $GPU_COUNT" exit 1 fi fiAlternatively, you could make GPU device assignments configurable via environment variables or command-line options (similar to
MODEL_NAMEandPROMPT_TEMPLATE).components/src/dynamo/vllm/multimodal_utils/protocol.py (1)
121-141: Note: Feature parity differences withexamples/multimodal/utils/protocol.py.This file lacks
AudioContentin theMessageContentunion (Line 121) andaudio_urlinMultiModalInput(Lines 138-141), which exist in the examples version. If audio support is intended for the components module, consider synchronizing these definitions.components/src/dynamo/vllm/multimodal_utils/chat_processor.py (1)
37-46:StubEngineClientdefinition is duplicated fromexamples/multimodal/utils/chat_processor.py.Both files define identical
StubEngineClientclasses. Consider extracting this to a shared utility module to avoid code duplication and ensure consistent behavior across examples and components.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
components/src/dynamo/vllm/multimodal_handlers/processor_handler.py(1 hunks)components/src/dynamo/vllm/multimodal_utils/chat_processor.py(3 hunks)components/src/dynamo/vllm/multimodal_utils/protocol.py(3 hunks)examples/backends/vllm/launch/agg_multimodal_epd.sh(1 hunks)examples/backends/vllm/launch/disagg_multimodal_epd.sh(1 hunks)examples/multimodal/components/audio_encode_worker.py(1 hunks)examples/multimodal/components/encode_worker.py(1 hunks)examples/multimodal/components/processor.py(1 hunks)examples/multimodal/components/publisher.py(2 hunks)examples/multimodal/components/video_encode_worker.py(1 hunks)examples/multimodal/components/worker.py(1 hunks)examples/multimodal/launch/audio_agg.sh(1 hunks)examples/multimodal/utils/args.py(1 hunks)examples/multimodal/utils/chat_processor.py(3 hunks)examples/multimodal/utils/protocol.py(3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.
Applied to files:
examples/multimodal/components/processor.pycomponents/src/dynamo/vllm/multimodal_handlers/processor_handler.py
📚 Learning: 2025-08-25T23:24:42.076Z
Learnt from: tzulingk
Repo: ai-dynamo/dynamo PR: 2666
File: components/backends/trtllm/src/dynamo/trtllm/publisher.py:0-0
Timestamp: 2025-08-25T23:24:42.076Z
Learning: WorkerMetricsPublisher.create_endpoint method signature has been updated in _core.pyi to include metrics_labels parameter: `def create_endpoint(self, component: str, metrics_labels: Optional[List[Tuple[str, str]]] = None)`, making the metrics_labels parameter optional with default value of None.
Applied to files:
examples/multimodal/components/publisher.py
📚 Learning: 2025-07-22T10:22:28.972Z
Learnt from: ptarasiewiczNV
Repo: ai-dynamo/dynamo PR: 2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
Applied to files:
examples/multimodal/components/video_encode_worker.py
🧬 Code graph analysis (2)
examples/multimodal/utils/chat_processor.py (3)
lib/llm/src/grpc/service/kserve.rs (1)
model_config(577-677)lib/llm/src/local_model.rs (1)
model_path(92-95)lib/async-openai/src/client.rs (1)
models(86-88)
components/src/dynamo/vllm/multimodal_utils/chat_processor.py (3)
examples/multimodal/utils/chat_processor.py (1)
StubEngineClient(37-46)lib/llm/src/local_model.rs (1)
model_path(92-95)lib/async-openai/src/client.rs (1)
models(86-88)
🪛 Ruff (0.14.8)
examples/multimodal/components/publisher.py
79-79: Unused method argument: args
(ARG002)
80-80: Unused method argument: kwargs
(ARG002)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (19)
examples/backends/vllm/launch/disagg_multimodal_epd.sh (1)
86-97: GPU device assignments and multimodal embeddings flag correctly updated for vLLM v0.12.0.The changes align with the PR objective:
- GPU devices shifted from (1, 2, 3) to (0, 1, 2) for all workers (encode, prefill, decode)
- Echo messages updated to reflect new device assignments
--enable-mm-embedsflag added to prefill worker (line 92), which is appropriate since the prefill stage handles multimodal embeddings- Encode and decode workers correctly omit this flag as they handle encoding and token generation respectively
Please verify the following for vLLM v0.12.0 compatibility:
- All command-line flags (
--multimodal-encode-worker,--multimodal-worker,--enable-mm-embeds, etc.) are valid in vLLM v0.12.0- The GPU device numbering (0, 1, 2) aligns with the system's expected GPU layout
- The ZMQ topic endpoints (20080, 20081, 20082) remain compatible with v0.12.0
examples/multimodal/launch/audio_agg.sh (1)
94-94: No verification needed for GPU allocation consistency across scripts in the specified locations.The file
examples/multimodal/launch/audio_agg.shline 94 correctly assignsCUDA_VISIBLE_DEVICES=0. However, the original comment requests verification of consistency across scripts likeagg_multimodal_epd.shanddisagg_multimodal_epd.shin the same directory—these scripts do not exist in the codebase. The verification check is based on a mistaken assumption about the repository structure and cannot be completed as written. If GPU allocation changes are part of this PR, verify them against the scripts that actually exist inexamples/multimodal/launch/.Likely an incorrect or invalid review comment.
examples/multimodal/utils/args.py (1)
162-163: New default enables multimodal embeddings for EPD workflow.The
enable_mm_embeds: Truedefault activates multimodal embeddings input handling, which aligns with the EPD (Encode-Prefill-Decode) architecture where embeddings are transferred between workers. This is consistent with the--enable-mm-embedsflag added to launch scripts.examples/multimodal/components/processor.py (1)
20-21: Import paths updated for vLLM v0.12.0 module reorganization.Both import path changes maintain backward compatibility through aliasing and align with the broader refactoring across multimodal components.
examples/multimodal/components/publisher.py (1)
36-43: Forward-compatible method signature for vLLM stat logger interface.Adding
*argsand**kwargsto therecordmethod provides forward compatibility with vLLM'sStatLoggerBaseinterface, allowing additional parameters to be passed without breaking the implementation.Note: The unused parameter warnings from static analysis are expected and safe to ignore—these parameters ensure the signature remains compatible with the vLLM interface, even if additional arguments are added in future versions.
examples/multimodal/components/worker.py (1)
227-257: Connector initializes automatically via__init__; explicitinitialize()call was unnecessary.The
Connectorclass completes all initialization within its__init__method, which is synchronous and requires no additional setup. There is noinitialize()method on theConnectorclass, so removing the awaitedself._connector.initialize()call was correct. The current codeself._connector = connect.Connector()is sufficient.components/src/dynamo/vllm/multimodal_handlers/processor_handler.py (1)
14-14: Import path verified for vLLM v0.12.0.The import
from vllm.tokenizers import TokenizerLike as AnyTokenizeris correct. TokenizerLike is a Protocol type defined in vllm.tokenizers.protocol and is used in the file as a return type annotation for the_create_tokenizermethod.examples/multimodal/components/encode_worker.py (1)
15-15: Import path is valid for vLLM v0.12.0.The import
from vllm.utils.argparse_utils import FlexibleArgumentParseris correct and functional. vLLM v0.12.0 supports bothvllm.utilsandvllm.utils.argparse_utilsas valid import paths forFlexibleArgumentParser. The more specific submodule import is appropriately explicit.examples/multimodal/utils/protocol.py (2)
29-29: Import path update for vLLM v0.12.0 metrics type.The
RequestMetricstype has been replaced withRequestStateStatsfrom the newvllm.v1.metrics.statsmodule, aligning with vLLM v0.12.0's restructured metrics API.
169-180: Type and documentation updates are consistent.The docstring and field type for
metricshave been correctly updated to referenceRequestStateStats. This change is synchronized with the parallel file atcomponents/src/dynamo/vllm/multimodal_utils/protocol.py.components/src/dynamo/vllm/multimodal_utils/protocol.py (2)
29-29: Import path update for vLLM v0.12.0 metrics type is correct.Consistent with the parallel change in
examples/multimodal/utils/protocol.py.
159-170: Type and documentation updates are consistent with the examples version.examples/multimodal/utils/chat_processor.py (3)
136-151:ChatProcessorinitialization correctly wires up stub engine and serving models.The initialization properly creates a
StubEngineClientandOpenAIServingModelsinstance with aBaseModelPathderived from the model config. This aligns with vLLM v0.12.0's updatedOpenAIServingChatconstructor requirements.
288-300:CompletionsProcessorinitialization mirrorsChatProcessorpattern correctly.The same stub engine and serving models pattern is applied consistently for completions, ensuring both processors are compatible with vLLM v0.12.0.
37-46:StubEngineClientprovides a safe minimal interface for preprocessing-only use.The stub correctly provides the required
model_config,input_processor, andio_processorattributes. TheNonevalues for the processor attributes are safe—they are not accessed during preprocessing operations (_preprocess_chat,_preprocess_completion), which only use the tokenizer and request metadata. This design mirrors the pattern used inrouter_standalone/api.py, which successfully passesengine_client=NonetoOpenAIServingModelsandOpenAIServingChat, confirming these classes handle missing engine attributes gracefully.components/src/dynamo/vllm/multimodal_utils/chat_processor.py (2)
136-151:ChatProcessorinitialization changes are consistent with the examples version.
288-300:CompletionsProcessorinitialization changes are consistent with the examples version.examples/multimodal/components/audio_encode_worker.py (2)
199-205: Connector.initialize() removal is safe and correct.The
Connector.initialize()method is marked as deprecated and is a no-op. Actual initialization of the underlyingConnectionobjects is handled internally by_create_connection(), which is called bycreate_readable()andcreate_writable(). No explicit initialization is required on theConnectorinstance itself.
28-28: Import path is correct for vLLM v0.12.0.The import
from vllm.utils.argparse_utils import FlexibleArgumentParseris valid in vLLM v0.12.0. Both the direct import andfrom vllm.utils import FlexibleArgumentParserwork correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Kris!
Follow-up sounds good for #4849 (comment)
46209a3 to
bf2bccf
Compare
…4849) Cherry-pick of PR #4849 to release/0.7.1 for vLLM v0.12.0 compatibility. Updates import paths and fixes for vLLM v0.12.0: - Updated AnyTokenizer import from vllm.tokenizers - Updated FlexibleArgumentParser import from vllm.utils.argparse_utils - Updated RequestMetrics to RequestStateStats - Fixed GPU device assignments in launch scripts - Added StubEngineClient for preprocessing-only operations Closes DIS-1160 Original PR: #4849 Original author: Kris Hung <[email protected]> Signed-off-by: Dan Gil <[email protected]>
Co-authored-by: Anant Sharma <[email protected]>
Overview:
Fix multimodal EPD examples for vllm version bump to v0.12.0
Closes DIS-1160
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
New Features
Updates
Chores
✏️ Tip: You can customize this high-level summary in your review settings.