-
Notifications
You must be signed in to change notification settings - Fork 703
fix: Add model registration to SGLang multimodal workers fixing bug #4486 #4512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: Add model registration to SGLang multimodal workers fixing bug #4486 #4512
Conversation
Multimodal workers were missing register_llm_with_readiness_gate() calls, preventing model discovery and causing 503 errors. Fixed in three init functions: - init_multimodal_worker() - init_multimodal_encode_worker() - init_multimodal_prefill_worker() Now matches the pattern used by all other worker types (decode, prefill, embedding, processor). This enables: - Models appear in /v1/models endpoint - Requests route correctly to multimodal workers - Proper health check integration Tested on single H100 with Qwen/Qwen2.5-VL-7B-Instruct Signed-off-by: Nancy Agarwal <[email protected]>
|
👋 Hi nancya-nv! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughA single-file change to Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20–25 minutes
Poem
Pre-merge checks✅ Passed checks (3 passed)
Tip 📝 Customizable high-level summaries are now available in beta!You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.
Example instruction:
Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
components/src/dynamo/sglang/main.py (1)
379-396: Multimodal encode worker registration pattern looks correct and consistentThis brings
init_multimodal_encode_workerin line withinit_multimodal_processor: the endpoint is served and the worker is registered concurrently, with a readiness gate to coordinate discovery. Passingengine=Noneis consistent with the existing processor path and matches the behavior ofregister_llm_with_readiness_gate, which already treatsengineas optional at runtime.One follow-up you might consider (in
components/src/dynamo/sglang/register.py, not blocking this PR) is updating the type hint forenginetoOptional[sgl.Engine]to reflect theseNonecall sites and keep static type-checkers happy.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
components/src/dynamo/sglang/main.py(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/src/dynamo/sglang/main.py (4)
lib/bindings/python/rust/lib.rs (2)
serve_endpoint(596-649)generate(719-727)lib/bindings/python/src/dynamo/_core.pyi (3)
serve_endpoint(127-139)generate(1298-1337)ModelInput(958-960)components/src/dynamo/sglang/register.py (1)
register_llm_with_readiness_gate(136-176)components/src/dynamo/sglang/health_check.py (1)
SglangHealthCheckPayload(50-78)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
components/src/dynamo/sglang/main.py (2)
430-447: Multimodal worker now registers with discovery and exposes a health payloadThe multimodal worker path now matches the standard decode-worker pattern: it builds a
SglangHealthCheckPayloadfrom the engine, startsserve_endpointwithhealth_check_payloadandmetrics_labels, and registers viaregister_llm_with_readiness_gateusing anasyncio.Eventas the readiness gate. This should fix the missing-registration behavior for multimodal decode/aggregated mode without changing the lifetime or cleanup semantics of the handler.
472-488: Multimodal prefill worker registration aligns with other prefill/decode pathsAdding a readiness gate and wiring
init_multimodal_prefill_workerthroughregister_llm_with_readiness_gateplus aSglangPrefillHealthCheckPayloadbrings this path in line with the existing prefill/decode workers. The concurrentasyncio.gatherpreserves the existing error handling and ensures the worker is discoverable before it starts serving real traffic.
|
/ok to test 232bf97 |
krishung5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm, thanks for the fix! I've triggered a CI to check the sglang-2gpu test for multimodal.
indrajit96
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Overview:
Fixes multimodal workers failing to register with the frontend discovery system, causing 503 errors when attempting to use multimodal models. The issue was that three multimodal worker initialization functions were missing the
register_llm_with_readiness_gate()call that all other worker types have.Details:
Added
register_llm_with_readiness_gate()calls to three multimodal worker initialization functions to ensure they register with the etcd/NATS discovery service:init_multimodal_worker()(line ~366-390)health_check_payloadandready_eventserve_endpoint()inasyncio.gather()with registration callinit_multimodal_prefill_worker()(line ~410-434)ready_eventasyncio.gather()init_multimodal_encode_worker()(line ~314-337)taskslist patternready_eventengine=None(encode worker doesn't have engine)All three now follow the same pattern as working workers (
init_worker,init_prefill,init_embedding,init_multimodal_processor), whereserve_endpoint()andregister_llm_with_readiness_gate()run concurrently viaasyncio.gather().Before: Workers started but never announced themselves to the discovery service
After: Workers properly register, frontend discovers them, requests are routed successfully
Where should the reviewer start?
components/src/dynamo/sglang/main.py- Review the three modified functions:init_multimodal_encode_worker()(lines ~314-337)init_multimodal_worker()(lines ~366-390)init_multimodal_prefill_worker()(lines ~410-434)Compare these changes against the existing pattern in
init_worker(),init_prefill(), andinit_embedding()to verify consistency.Related Issues:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.