fix: Add model registration to SGLang multimodal workers fixing bug #4486 #4512

nancya-nv · 2025-11-20T19:49:27Z

Overview:

Fixes multimodal workers failing to register with the frontend discovery system, causing 503 errors when attempting to use multimodal models. The issue was that three multimodal worker initialization functions were missing the register_llm_with_readiness_gate() call that all other worker types have.

Details:

Added register_llm_with_readiness_gate() calls to three multimodal worker initialization functions to ensure they register with the etcd/NATS discovery service:

init_multimodal_worker() (line ~366-390)
- Added health_check_payload and ready_event
- Wrapped serve_endpoint() in asyncio.gather() with registration call
init_multimodal_prefill_worker() (line ~410-434)
- Added ready_event
- Added registration call to existing asyncio.gather()
init_multimodal_encode_worker() (line ~314-337)
- Removed tasks list pattern
- Added ready_event
- Added registration call with engine=None (encode worker doesn't have engine)

All three now follow the same pattern as working workers (init_worker, init_prefill, init_embedding, init_multimodal_processor), where serve_endpoint() and register_llm_with_readiness_gate() run concurrently via asyncio.gather().

Before: Workers started but never announced themselves to the discovery service
After: Workers properly register, frontend discovers them, requests are routed successfully

Where should the reviewer start?

components/src/dynamo/sglang/main.py - Review the three modified functions:
- init_multimodal_encode_worker() (lines ~314-337)
- init_multimodal_worker() (lines ~366-390)
- init_multimodal_prefill_worker() (lines ~410-434)

Compare these changes against the existing pattern in init_worker(), init_prefill(), and init_embedding() to verify consistency.

Related Issues:

Closes [BUG]: Multimodal workers fail to register with frontend discovery system on SGLang Backend #4486

Summary by CodeRabbit

Chores
- Enhanced multimodal endpoint startup sequencing with improved health status reporting mechanisms.
- Improved system readiness management to support concurrent workflow initialization and startup coordination.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Multimodal workers were missing register_llm_with_readiness_gate() calls, preventing model discovery and causing 503 errors. Fixed in three init functions: - init_multimodal_worker() - init_multimodal_encode_worker() - init_multimodal_prefill_worker() Now matches the pattern used by all other worker types (decode, prefill, embedding, processor). This enables: - Models appear in /v1/models endpoint - Requests route correctly to multimodal workers - Proper health check integration Tested on single H100 with Qwen/Qwen2.5-VL-7B-Instruct Signed-off-by: Nancy Agarwal <[email protected]>

copy-pr-bot · 2025-11-20T19:49:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-11-20T19:49:35Z

👋 Hi nancya-nv! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-11-20T19:56:55Z

Walkthrough

A single-file change to components/src/dynamo/sglang/main.py introduces readiness gating for multimodal workflows. An asyncio.Event coordinates model registration with endpoint startup, health check payloads are propagated across worker initialization paths, and function signatures are updated to accept readiness gates and health status parameters.

Changes

Cohort / File(s)	Summary
SGLang endpoint startup readiness gating `components/src/dynamo/sglang/main.py`	Added asyncio.Event (ready_event) for readiness coordination; updated `serve_endpoint` to accept `health_check_payload` parameter; wired `register_llm_with_readiness_gate` with `readiness_gate` parameter across encode, worker, prefill, and multimodal initialization paths; extended endpoint startup to run concurrently with model registration sequencing

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

Verify concurrent execution ordering between serve_endpoint and register_llm_with_readiness_gate does not introduce race conditions
Validate health check payload propagation across all worker initialization paths (encode, worker, prefill, multimodal)
Confirm readiness_gate signaling occurs at the correct lifecycle point before request handling
Check that asyncio.Event initialization and await semantics are correct across all code paths

Poem

🐰✨ A readiness gate stands tall and proud,
Multimodal workflows gather in a crowd,
With health checks passing, handlers aligned,
Concurrent startup leaves no lag behind!
Bounces happily 🎉

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically identifies the main change: adding model registration to SGLang multimodal workers and references the bug being fixed.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description comprehensively covers all required sections with clear details about the bug fix, specific changes to each function, and guidance for reviewers.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

components/src/dynamo/sglang/main.py (1)

379-396: Multimodal encode worker registration pattern looks correct and consistent

This brings init_multimodal_encode_worker in line with init_multimodal_processor: the endpoint is served and the worker is registered concurrently, with a readiness gate to coordinate discovery. Passing engine=None is consistent with the existing processor path and matches the behavior of register_llm_with_readiness_gate, which already treats engine as optional at runtime.

One follow-up you might consider (in components/src/dynamo/sglang/register.py, not blocking this PR) is updating the type hint for engine to Optional[sgl.Engine] to reflect these None call sites and keep static type-checkers happy.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 796520b and 232bf97.

📒 Files selected for processing (1)

components/src/dynamo/sglang/main.py (4 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

components/src/dynamo/sglang/main.py (4)

lib/bindings/python/rust/lib.rs (2)

serve_endpoint (596-649)

generate (719-727)

lib/bindings/python/src/dynamo/_core.pyi (3)

serve_endpoint (127-139)

generate (1298-1337)

ModelInput (958-960)

components/src/dynamo/sglang/register.py (1)

register_llm_with_readiness_gate (136-176)

components/src/dynamo/sglang/health_check.py (1)

SglangHealthCheckPayload (50-78)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: Build and Test - dynamo

🔇 Additional comments (2)

components/src/dynamo/sglang/main.py (2)

430-447: Multimodal worker now registers with discovery and exposes a health payload

The multimodal worker path now matches the standard decode-worker pattern: it builds a SglangHealthCheckPayload from the engine, starts serve_endpoint with health_check_payload and metrics_labels, and registers via register_llm_with_readiness_gate using an asyncio.Event as the readiness gate. This should fix the missing-registration behavior for multimodal decode/aggregated mode without changing the lifetime or cleanup semantics of the handler.

472-488: Multimodal prefill worker registration aligns with other prefill/decode paths

Adding a readiness gate and wiring init_multimodal_prefill_worker through register_llm_with_readiness_gate plus a SglangPrefillHealthCheckPayload brings this path in line with the existing prefill/decode workers. The concurrent asyncio.gather preserves the existing error handling and ensures the worker is discoverable before it starts serving real traffic.

krishung5 · 2025-11-20T20:30:15Z

/ok to test 232bf97

krishung5

Lgtm, thanks for the fix! I've triggered a CI to check the sglang-2gpu test for multimodal.

indrajit96

LGTM!

pull-request-size bot added the size/M label Nov 20, 2025

github-actions bot added external-contribution Pull request is from an external contributor fix labels Nov 20, 2025

Merge branch 'main' into fix/sglang-multimodal-worker-registration

232bf97

nancya-nv marked this pull request as ready for review November 20, 2025 19:52

nancya-nv requested review from a team as code owners November 20, 2025 19:52

coderabbitai bot reviewed Nov 20, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 20:30 Inactive

krishung5 approved these changes Nov 20, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB November 20, 2025 20:31 Inactive

indrajit96 approved these changes Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Add model registration to SGLang multimodal workers fixing bug #4486 #4512

fix: Add model registration to SGLang multimodal workers fixing bug #4486 #4512

nancya-nv commented Nov 20, 2025 •

edited by rmccorm4

Loading

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

coderabbitai bot commented Nov 20, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

krishung5 commented Nov 20, 2025

Uh oh!

krishung5 left a comment

Uh oh!

indrajit96 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: Add model registration to SGLang multimodal workers fixing bug #4486 #4512

Are you sure you want to change the base?

fix: Add model registration to SGLang multimodal workers fixing bug #4486 #4512

Conversation

nancya-nv commented Nov 20, 2025 • edited by rmccorm4 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

github-actions bot commented Nov 20, 2025

Uh oh!

coderabbitai bot commented Nov 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

krishung5 commented Nov 20, 2025

Uh oh!

krishung5 left a comment

Choose a reason for hiding this comment

Uh oh!

indrajit96 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nancya-nv commented Nov 20, 2025 •

edited by rmccorm4

Loading

coderabbitai bot commented Nov 20, 2025 •

edited

Loading