Skip to content

Conversation

@nancya-nv
Copy link

@nancya-nv nancya-nv commented Nov 20, 2025

Overview:

Fixes multimodal workers failing to register with the frontend discovery system, causing 503 errors when attempting to use multimodal models. The issue was that three multimodal worker initialization functions were missing the register_llm_with_readiness_gate() call that all other worker types have.

Details:

Added register_llm_with_readiness_gate() calls to three multimodal worker initialization functions to ensure they register with the etcd/NATS discovery service:

  1. init_multimodal_worker() (line ~366-390)

    • Added health_check_payload and ready_event
    • Wrapped serve_endpoint() in asyncio.gather() with registration call
  2. init_multimodal_prefill_worker() (line ~410-434)

    • Added ready_event
    • Added registration call to existing asyncio.gather()
  3. init_multimodal_encode_worker() (line ~314-337)

    • Removed tasks list pattern
    • Added ready_event
    • Added registration call with engine=None (encode worker doesn't have engine)

All three now follow the same pattern as working workers (init_worker, init_prefill, init_embedding, init_multimodal_processor), where serve_endpoint() and register_llm_with_readiness_gate() run concurrently via asyncio.gather().

Before: Workers started but never announced themselves to the discovery service
After: Workers properly register, frontend discovers them, requests are routed successfully

Where should the reviewer start?

  • components/src/dynamo/sglang/main.py - Review the three modified functions:
    • init_multimodal_encode_worker() (lines ~314-337)
    • init_multimodal_worker() (lines ~366-390)
    • init_multimodal_prefill_worker() (lines ~410-434)

Compare these changes against the existing pattern in init_worker(), init_prefill(), and init_embedding() to verify consistency.

Related Issues:

Summary by CodeRabbit

  • Chores
    • Enhanced multimodal endpoint startup sequencing with improved health status reporting mechanisms.
    • Improved system readiness management to support concurrent workflow initialization and startup coordination.

✏️ Tip: You can customize this high-level summary in your review settings.

Multimodal workers were missing register_llm_with_readiness_gate() calls,
preventing model discovery and causing 503 errors.

Fixed in three init functions:
- init_multimodal_worker()
- init_multimodal_encode_worker()
- init_multimodal_prefill_worker()

Now matches the pattern used by all other worker types (decode, prefill,
embedding, processor).

This enables:
- Models appear in /v1/models endpoint
- Requests route correctly to multimodal workers
- Proper health check integration

Tested on single H100 with Qwen/Qwen2.5-VL-7B-Instruct

Signed-off-by: Nancy Agarwal <[email protected]>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Nov 20, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi nancya-nv! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor fix labels Nov 20, 2025
@nancya-nv nancya-nv marked this pull request as ready for review November 20, 2025 19:52
@nancya-nv nancya-nv requested review from a team as code owners November 20, 2025 19:52
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

Walkthrough

A single-file change to components/src/dynamo/sglang/main.py introduces readiness gating for multimodal workflows. An asyncio.Event coordinates model registration with endpoint startup, health check payloads are propagated across worker initialization paths, and function signatures are updated to accept readiness gates and health status parameters.

Changes

Cohort / File(s) Summary
SGLang endpoint startup readiness gating
components/src/dynamo/sglang/main.py
Added asyncio.Event (ready_event) for readiness coordination; updated serve_endpoint to accept health_check_payload parameter; wired register_llm_with_readiness_gate with readiness_gate parameter across encode, worker, prefill, and multimodal initialization paths; extended endpoint startup to run concurrently with model registration sequencing

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20–25 minutes

  • Verify concurrent execution ordering between serve_endpoint and register_llm_with_readiness_gate does not introduce race conditions
  • Validate health check payload propagation across all worker initialization paths (encode, worker, prefill, multimodal)
  • Confirm readiness_gate signaling occurs at the correct lifecycle point before request handling
  • Check that asyncio.Event initialization and await semantics are correct across all code paths

Poem

🐰✨ A readiness gate stands tall and proud,
Multimodal workflows gather in a crowd,
With health checks passing, handlers aligned,
Concurrent startup leaves no lag behind!
Bounces happily 🎉

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically identifies the main change: adding model registration to SGLang multimodal workers and references the bug being fixed.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check ✅ Passed The PR description comprehensively covers all required sections with clear details about the bug fix, specific changes to each function, and guidance for reviewers.

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
components/src/dynamo/sglang/main.py (1)

379-396: Multimodal encode worker registration pattern looks correct and consistent

This brings init_multimodal_encode_worker in line with init_multimodal_processor: the endpoint is served and the worker is registered concurrently, with a readiness gate to coordinate discovery. Passing engine=None is consistent with the existing processor path and matches the behavior of register_llm_with_readiness_gate, which already treats engine as optional at runtime.

One follow-up you might consider (in components/src/dynamo/sglang/register.py, not blocking this PR) is updating the type hint for engine to Optional[sgl.Engine] to reflect these None call sites and keep static type-checkers happy.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 796520b and 232bf97.

📒 Files selected for processing (1)
  • components/src/dynamo/sglang/main.py (4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/src/dynamo/sglang/main.py (4)
lib/bindings/python/rust/lib.rs (2)
  • serve_endpoint (596-649)
  • generate (719-727)
lib/bindings/python/src/dynamo/_core.pyi (3)
  • serve_endpoint (127-139)
  • generate (1298-1337)
  • ModelInput (958-960)
components/src/dynamo/sglang/register.py (1)
  • register_llm_with_readiness_gate (136-176)
components/src/dynamo/sglang/health_check.py (1)
  • SglangHealthCheckPayload (50-78)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
components/src/dynamo/sglang/main.py (2)

430-447: Multimodal worker now registers with discovery and exposes a health payload

The multimodal worker path now matches the standard decode-worker pattern: it builds a SglangHealthCheckPayload from the engine, starts serve_endpoint with health_check_payload and metrics_labels, and registers via register_llm_with_readiness_gate using an asyncio.Event as the readiness gate. This should fix the missing-registration behavior for multimodal decode/aggregated mode without changing the lifetime or cleanup semantics of the handler.


472-488: Multimodal prefill worker registration aligns with other prefill/decode paths

Adding a readiness gate and wiring init_multimodal_prefill_worker through register_llm_with_readiness_gate plus a SglangPrefillHealthCheckPayload brings this path in line with the existing prefill/decode workers. The concurrent asyncio.gather preserves the existing error handling and ensures the worker is discoverable before it starts serving real traffic.

@krishung5
Copy link
Contributor

/ok to test 232bf97

Copy link
Contributor

@krishung5 krishung5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm, thanks for the fix! I've triggered a CI to check the sglang-2gpu test for multimodal.

Copy link
Contributor

@indrajit96 indrajit96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor fix size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Multimodal workers fail to register with frontend discovery system on SGLang Backend

3 participants