-
Notifications
You must be signed in to change notification settings - Fork 739
feat: Add logprobs support to SGLang backend #4912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
feat: Add logprobs support to SGLang backend #4912
Conversation
Signed-off-by: Aryan Bagade <[email protected]>
|
👋 Hi AryanBagade! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
WalkthroughThis change adds logprobs support to the SGLang backend by introducing logprobs extraction from framework responses, parsing logprobs parameters from requests, and integrating extracted probabilities into response outputs for both chat and completion request paths. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)
373-428: Add logprobs extraction to OpenAI-format text stream output.The
_process_text_streammethod does not extract or emit logprobs in the response, even though_build_sampling_paramsrequests them from the SGLang engine (lines 117-123) and the data is available in the response. According to the OpenAI chat completion streaming specification, logprobs should be included in each choice object when requested. Currently,_process_token_streamcorrectly extracts and includes logprobs using the_extract_logprobsmethod, but_process_text_streamignores this data entirely, causing OpenAI-format requests to violate API spec compliance by omitting logprobs even when requested.Add logprobs extraction to the choice_data dict in
_process_text_stream, similar to how_process_token_streamhandles it, to ensure logprobs are included in the OpenAI-formatted chunks when requested.
🧹 Nitpick comments (1)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)
58-61: Inconsistent logger usage.These lines use
logging.info()(root logger) while the logprobs validation uses the module-levelloggerdefined at line 17. For consistency and proper log namespacing, consider usinglogger.info()throughout.if self.serving_mode == DisaggregationMode.DECODE: if self.prefill_client is None: raise ValueError( "prefill_client must be provided when serving_mode is decode" ) self.prefill_client = prefill_client - logging.info("Decode worker handler initialized") + logger.info("Decode worker handler initialized") self.prefill_router_client = prefill_router_client - logging.info("Worker handler initialized") + logger.info("Worker handler initialized")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py(6 hunks)tests/serve/test_sglang.py(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/serve/test_sglang.py (1)
tests/utils/payload_builder.py (2)
chat_payload_with_logprobs(316-356)completion_payload_with_logprobs(359-393)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (2)
components/src/dynamo/vllm/handlers.py (1)
_extract_logprobs(683-740)components/src/dynamo/trtllm/request_handlers/handler_base.py (1)
_extract_logprobs(110-177)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (4)
17-18: LGTM!Adding a module-level logger is appropriate for the new warning messages in logprobs validation.
92-107: LGTM!The logprobs validation is thorough—handles empty strings, non-integer values, and negative integers with appropriate warnings.
117-124: LGTM!The OpenAI-format handling correctly differentiates between the boolean
logprobsflag and the optionaltop_logprobscount, matching the OpenAI API specification.
346-355: LGTM!The logprobs extraction is correctly integrated after determining new tokens, and fields are only added when non-None.
tests/serve/test_sglang.py (1)
20-22: LGTM!The new payload builder imports are correctly added alongside existing imports.
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
Outdated
Show resolved
Hide resolved
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Aryan Bagade <[email protected]>
e6b1c9d to
246ab54
Compare
ishandhanani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
Outdated
Show resolved
Hide resolved
| async for out in self._process_text_stream(agg, context): | ||
| yield out | ||
|
|
||
| @staticmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wdyt about moving this into handler base?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had some digging here, basically the extraction logic is backend-specific since SGLang returns logprobs as tuples (logprob, token_id, decoded_text) which differs from TRTLLM's structure.
vLLM also keeps its _extract_logprobs in handlers.py rather than a shared base for the same reason. Happy to refactor if you see a clean abstraction though!
|
/ok to test 45fb04e |
…'s SamplingParams doesn't accept return_logprob or top_logprobs_num. These must be passed as separate kwargs to engine.async_generate(). Signed-off-by: Aryan Bagade <[email protected]>
|
Fixed logprobs params, |
Overview:
Add logprobs support to SGLang backend, enabling log probability data to be returned in streaming responses. This is the SGLang counterpart to the vLLM logprobs support added in #4697.
Details:
Related Issues:
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.