Skip to content

Conversation

@nealvaidya
Copy link
Contributor

@nealvaidya nealvaidya commented Dec 10, 2025

Overview:

Pull together all the various multimodal docs into a single folder. Also standardize on EPD terminology using EPD, E/PD, E/P/D, and EP/D to describe the different disagg strategies.

Where should the reviewer start?

Start with multimodal/index.md. Would appreciate if framework PICs could look at the respective framework pages to ensure no inaccuracies were added during the migration.

Summary by CodeRabbit

  • Documentation
    • Reorganized multimodal inference documentation into a centralized guide with backend-specific sections for vLLM, TensorRT-LLM, and SGLang.
    • Consolidated multimodal deployment patterns, architecture guidance, and workflow examples under a single location.
    • Added automatic redirects to maintain compatibility with previously linked documentation pages.

✏️ Tip: You can customize this high-level summary in your review settings.

@nealvaidya nealvaidya requested review from a team as code owners December 10, 2025 04:36
@github-actions github-actions bot added the docs label Dec 10, 2025
Signed-off-by: Neal Vaidya <[email protected]>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 10, 2025

Walkthrough

This pull request reorganizes multimodal documentation by consolidating backend-specific guides scattered across individual backend folders into a centralized docs/multimodal/ directory, with new unified backend documentation files for vLLM, TensorRT-LLM, and SGLang. Old documentation files are removed, links are updated, and HTTP redirects are configured to preserve backward compatibility.

Changes

Cohort / File(s) Summary
Centralized Multimodal Documentation
docs/multimodal/index.md, docs/multimodal/sglang.md, docs/multimodal/trtllm.md, docs/multimodal/vllm.md
Four new documentation files created in a dedicated multimodal directory providing unified guides for multimodal inference, architecture patterns, deployment strategies, component roles, and supported models across vLLM, TensorRT-LLM, and SGLang backends.
Deleted Backend-Specific Multimodal Documentation
docs/backends/sglang/multimodal_epd.md, docs/backends/sglang/multimodal_sglang_guide.md, docs/backends/trtllm/multimodal_support.md, docs/backends/trtllm/multimodal_trtllm_guide.md, docs/backends/trtllm/multinode/multinode-multimodal-example.md, docs/backends/vllm/multimodal.md, docs/backends/vllm/multimodal_vllm_guide.md, docs/multimodal/multimodal_intro.md
Eight documentation files removed from dispersed locations; content consolidated into centralized multimodal guides.
Backend README Link Updates
docs/backends/sglang/README.md, docs/backends/trtllm/README.md
Hyperlinks updated to reference new centralized multimodal documentation paths instead of local backend-specific files.
Navigation and Redirect Configuration
docs/conf.py
Eight new redirect mappings added to preserve backward compatibility for removed and relocated multimodal documentation files.
Documentation Structure Updates
docs/hidden_toctree.rst, docs/index.rst
Removed dispersed multimodal entries from toctree; updated primary multimodal link from multimodal/multimodal_intro.md to multimodal/index.md.
Project Metadata
docs/project.json
No functional changes detected.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Verify redirect mappings in docs/conf.py are correct and cover all deleted/moved files
  • Check link consistency in updated backend README files and toctree entries
  • Validate content consolidation by confirming key multimodal concepts are preserved in centralized files
  • Confirm navigation structure maintains logical organization for users navigating multimodal documentation

Poem

🐰 Documentation paths did intertwine,
So we gathered them in one place divine,
Old links now redirect with care,
Multimodal guides consolidated there! ✨
No code was changed, just files rearranged,
For cleaner docs—beautifully arranged!

Pre-merge checks

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers the Overview and Where should the reviewer start sections, but is missing the Details and Related Issues sections required by the template. Add a Details section explaining the specific changes (file reorganization, terminology standardization) and include a Related Issues section with relevant GitHub issue references.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main objective: consolidating multimodal documentation into a centralized folder structure.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f57fd72 and 36a4b9d.

📒 Files selected for processing (18)
  • docs/backends/sglang/README.md (2 hunks)
  • docs/backends/sglang/multimodal_epd.md (0 hunks)
  • docs/backends/sglang/multimodal_sglang_guide.md (0 hunks)
  • docs/backends/trtllm/README.md (1 hunks)
  • docs/backends/trtllm/multimodal_support.md (0 hunks)
  • docs/backends/trtllm/multimodal_trtllm_guide.md (0 hunks)
  • docs/backends/trtllm/multinode/multinode-multimodal-example.md (0 hunks)
  • docs/backends/vllm/multimodal.md (0 hunks)
  • docs/backends/vllm/multimodal_vllm_guide.md (0 hunks)
  • docs/conf.py (1 hunks)
  • docs/hidden_toctree.rst (0 hunks)
  • docs/index.rst (1 hunks)
  • docs/multimodal/index.md (1 hunks)
  • docs/multimodal/multimodal_intro.md (0 hunks)
  • docs/multimodal/sglang.md (1 hunks)
  • docs/multimodal/trtllm.md (1 hunks)
  • docs/multimodal/vllm.md (1 hunks)
  • docs/project.json (1 hunks)
💤 Files with no reviewable changes (9)
  • docs/backends/vllm/multimodal_vllm_guide.md
  • docs/backends/sglang/multimodal_epd.md
  • docs/backends/trtllm/multinode/multinode-multimodal-example.md
  • docs/backends/sglang/multimodal_sglang_guide.md
  • docs/backends/trtllm/multimodal_support.md
  • docs/hidden_toctree.rst
  • docs/multimodal/multimodal_intro.md
  • docs/backends/vllm/multimodal.md
  • docs/backends/trtllm/multimodal_trtllm_guide.md
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-09-16T19:47:30.312Z
Learnt from: KrishnanPrash
Repo: ai-dynamo/dynamo PR: 3067
File: lib/llm/src/preprocessor/prompt/template/oai.rs:87-134
Timestamp: 2025-09-16T19:47:30.312Z
Learning: In Dynamo, multimodal requests (containing image_url or other non-text content) are processed through a completely different workflow than text-only requests, so the may_be_fix_msg_content function in lib/llm/src/preprocessor/prompt/template/oai.rs will only encounter text-only content arrays.

Applied to files:

  • docs/multimodal/trtllm.md
  • docs/multimodal/sglang.md
  • docs/backends/trtllm/README.md
  • docs/multimodal/vllm.md
  • docs/multimodal/index.md
📚 Learning: 2025-10-28T05:48:37.621Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_utils/model.py:39-42
Timestamp: 2025-10-28T05:48:37.621Z
Learning: In components/src/dynamo/vllm/multimodal_utils/model.py, the AutoModel.from_pretrained call with trust_remote_code=True in the load_vision_model function is intentional and expected for the vLLM multimodal implementation.

Applied to files:

  • docs/multimodal/trtllm.md
  • docs/backends/trtllm/README.md
📚 Learning: 2025-10-28T04:09:48.264Z
Learnt from: ayushag-nv
Repo: ai-dynamo/dynamo PR: 3634
File: components/src/dynamo/vllm/multimodal_handlers/processor_handler.py:66-72
Timestamp: 2025-10-28T04:09:48.264Z
Learning: In components/src/dynamo/vllm/multimodal_handlers/processor_handler.py, the AutoTokenizer.from_pretrained call with trust_remote_code=True is intentional and expected for the vLLM multimodal handler implementation.

Applied to files:

  • docs/multimodal/trtllm.md
📚 Learning: 2025-09-04T19:03:06.643Z
Learnt from: biswapanda
Repo: ai-dynamo/dynamo PR: 2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.643Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

  • docs/project.json
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • docs/project.json
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • docs/project.json
📚 Learning: 2025-12-02T18:13:40.065Z
Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 4698
File: .github/workflows/container-validation-dynamo.yml:68-68
Timestamp: 2025-12-02T18:13:40.065Z
Learning: In the ai-dynamo/dynamo repository, backend-specific tests (vllm, sglang, trtllm) are intentionally excluded from the container-validation-dynamo.yml workflow using "not (vllm or sglang or trtllm)" because they run in a separate container-validation-backends.yml workflow that has dedicated jobs for each backend. This separation keeps framework-agnostic tests separate from backend-specific tests.

Applied to files:

  • docs/multimodal/index.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (12)
docs/project.json (1)

1-1: No changes to review.

This file contains only metadata and shows no observable changes in the diff.

docs/multimodal/sglang.md (1)

1-50: Approve new SGLang multimodal documentation with verification note.

The new comprehensive guide is well-structured, clearly explains E/PD and E/P/D patterns (correctly excluding EPD and EP/D per support matrix), and provides good technical detail on bootstrap coordination, vision encoding, and NIXL usage. The content aligns with the centralized multimodal index and appropriate cross-references are in place.

Verify that the launch scripts referenced in the guide exist:

  • examples/backends/sglang/launch/multimodal_agg.sh (line 113)
  • examples/backends/sglang/launch/multimodal_disagg.sh (line 176)

Also confirm the Qwen model references (lines 123, 186) are current and match available versions.

docs/index.rst (1)

61-61: Approve documentation index update.

The toctree entry has been correctly updated to reference the new centralized multimodal documentation index at multimodal/index.md, reflecting the consolidation goal of the PR.

docs/backends/trtllm/README.md (1)

243-243: Approve TRT-LLM documentation link update.

The reference has been correctly updated to point to the new centralized TRT-LLM multimodal guide at ../../multimodal/trtllm.md. The generic description avoids pattern-specific claims and correctly reflects the consolidation.

docs/multimodal/vllm.md (1)

1-100: Approve comprehensive vLLM multimodal documentation.

The new guide is thorough, well-organized, and covers all four deployment patterns with support for images, video, and audio. The security requirement for --enable-multimodal is clearly called out, and experimental features are appropriately marked. Content aligns with the main multimodal index support matrix and terminology standards.

Verify that the launch scripts exist in the referenced locations:

  • examples/backends/vllm/launch/agg_multimodal_epd.sh (line 107)
  • examples/backends/vllm/launch/disagg_multimodal_epd.sh (line 170)
  • examples/backends/vllm/launch/agg_multimodal_llama.sh (line 197)
  • examples/backends/vllm/launch/disagg_multimodal_llama.sh (line 248)
  • Video and audio launch scripts (lines 282-283, 366-367)

Also confirm that the model references (Qwen, LLaVA, Llama versions) are current and widely available.

docs/multimodal/index.md (2)

1-66: Approve centralized multimodal documentation index.

The new index provides an excellent consolidation point with clear definitions of all four EPD terminology variants (EPD, E/PD, E/P/D, EP/D), accurate support matrices for each backend, and proper input format compatibility information. The toctree structure correctly references the three backend-specific guides.

Verify support matrix accuracy by having framework PICs review their respective sections:

  1. vLLM PIC: Confirm line 44 (supports all four patterns, images, video, audio experimental)
  2. TRT-LLM PIC: Confirm line 45 (E/PD not supported, E/P/D WIP, EP/D and EPD supported)
  3. SGLang PIC: Confirm line 46 (E/PD and E/P/D only, images only)

This aligns with the PR guidance that "Framework PICs are asked to check their respective framework pages for any inaccuracies introduced during the migration."


81-179: Approve architecture pattern definitions.

The four deployment patterns are clearly explained with helpful text descriptions, ASCII diagrams, component tables, and guidance on when to use each pattern. Terminology is consistent throughout and aligns with the standardization goal.

docs/multimodal/trtllm.md (3)

1-100: Approve comprehensive TRT-LLM multimodal documentation.

The guide provides thorough coverage of TRT-LLM multimodal capabilities, correctly documents supported patterns (EPD and EP/D for production, E/P/D with WIP caveats), and clearly distinguishes between HTTP/HTTPS URLs and pre-computed embeddings. The experimental nature of E/P/D with embeddings is properly marked, and multi-node SLURM deployment guidance is valuable.

Verify the following engine configuration and script paths exist:

  1. Engine config files referenced for aggregated serving (line 71):

    • examples/backends/trtllm/engine_configs/llama4/multimodal/agg.yaml
  2. Engine config files referenced for disaggregated serving (lines 112-113):

    • examples/backends/trtllm/engine_configs/qwen2-vl-7b-instruct/prefill.yaml
    • examples/backends/trtllm/engine_configs/qwen2-vl-7b-instruct/decode.yaml
  3. Launch scripts:

    • examples/backends/trtllm/launch/disagg.sh (referenced with MODALITY="multimodal", line 116)
    • examples/backends/trtllm/launch/epd_disagg.sh (line 200)
  4. SLURM example scripts (line 270):

    • examples/basics/multinode/trtllm/srun_disaggregated.sh and related scripts

146-175: Approve pre-computed embeddings E/P/D documentation.

The section clearly explains the experimental status, required TensorRT-LLM commit, supported file types, and two embedding formats (simple tensor and dictionary). The zero-copy NIXL architecture is well-explained with sequence diagrams and configuration examples.


266-323: Approve multi-node SLURM deployment guidance.

The environment setup, disaggregated launch procedure, and expected output walkthrough provide clear steps for multi-node deployment. The cleanup section is helpful.

docs/backends/sglang/README.md (1)

41-41: The relative path resolves correctly.

The hyperlink at line 41 (and 279) uses the relative path ../../multimodal/sglang.md from docs/backends/sglang/README.md, which correctly resolves to docs/multimodal/sglang.md.

docs/conf.py (1)

89-97: Multimodal redirect mappings look consistent and depth‑correct

The new redirects correctly resolve from the old HTML locations to the consolidated multimodal pages (including the deeper backends/trtllm/multinode/... case), and multimodal/multimodal_introindex.html keeps users on the new multimodal index. This block is coherent with the PR’s goal of centralizing multimodal docs.

@krishung5
Copy link
Contributor

Thanks for pulling this together! Overall lgtm, left a few comments.

Signed-off-by: Neal Vaidya <[email protected]>
Signed-off-by: Neal Vaidya <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants