Skip to content

Conversation

@AryanBagade
Copy link
Contributor

@AryanBagade AryanBagade commented Dec 11, 2025

Overview:
Add logprobs support to SGLang backend, enabling log probability data to be returned in streaming responses. This is the SGLang counterpart to the vLLM logprobs support added in #4697.

Details:

  • Add output_options.logprobs handling in _build_sampling_params for token-based format (sets return_logprob and top_logprobs_num)
  • Add OpenAI format logprobs/top_logprobs parameter handling
  • Add _extract_logprobs static method to parse SGLang's output_token_logprobs and output_top_logprobs fields
  • Update _process_token_stream to include log_probs and top_logprobs in output
  • Add aggregated_logprobs e2e test configuration

Related Issues:

Summary by CodeRabbit

  • New Features
    • Log probability support: Users can now request per-token log probabilities and top-k alternative tokens from both chat and completion API endpoints, enabling detailed analysis of model outputs and probability-based applications.

✏️ Tip: You can customize this high-level summary in your review settings.

@AryanBagade AryanBagade requested review from a team as code owners December 11, 2025 23:25
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 11, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@github-actions
Copy link

👋 Hi AryanBagade! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor feat labels Dec 11, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 11, 2025

Walkthrough

This change adds logprobs support to the SGLang backend by introducing logprobs extraction from framework responses, parsing logprobs parameters from requests, and integrating extracted probabilities into response outputs for both chat and completion request paths.

Changes

Cohort / File(s) Summary
Core logprobs extraction and integration
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
Adds module-level logger and static method _extract_logprobs to extract per-token logprobs and top-k logprobs from SGLang responses. Parses logprobs parameters from both token-based and OpenAI-format request paths, configuring return_logprob and top_logprobs_num accordingly. Integrates extraction into streaming paths to attach log_probs and top_logprobs to outputs. Adds initialization logging in DecodeWorkerHandler constructor.
Logprobs test configuration
tests/serve/test_sglang.py
Imports new payload builders chat_payload_with_logprobs and completion_payload_with_logprobs. Adds aggregated_logprobs test configuration requesting logprobs (chat with top_logprobs=3, completion with logprobs=5). Extends completions_only configuration to include aggregated_logprobs testing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

  • decode_handler.py: The new _extract_logprobs static method requires careful review to verify correct slicing logic for new tokens only and proper extraction of both log probabilities and top-k candidates from SGLang responses.
  • Request parsing logic: Review token-based and OpenAI-format logprobs parameter parsing paths to ensure both request types are handled correctly and parameter validation is appropriate.
  • test_sglang.py: Verify that new test payloads correctly request logprobs with appropriate parameters and that the aggregated configuration properly validates the feature across both chat and completion paths.

Poem

🐰 From SGLang's depths, the logprobs now flow,
With tokens and probabilities all aglow,
Top-k candidates dance in the light,
Extracted with care, extracted just right,
The rabbit hops forth with probabilities bright! 🎲

Pre-merge checks

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Add logprobs support to SGLang backend' clearly summarizes the main change—adding logprobs support to SGLang—which is the primary focus of the changeset.
Description check ✅ Passed The PR description follows the template structure with Overview, Details, and Related Issues sections, providing clear implementation details and linking to issue #4685.
Linked Issues check ✅ Passed The PR successfully addresses all coding requirements from issue #4685: handling logprobs parameters, parsing SGLang logprobs fields, extracting and returning logprobs in output, and adding e2e test coverage.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing logprobs support for the SGLang backend as specified in the linked issue; no unrelated modifications are present.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)

373-428: Add logprobs extraction to OpenAI-format text stream output.

The _process_text_stream method does not extract or emit logprobs in the response, even though _build_sampling_params requests them from the SGLang engine (lines 117-123) and the data is available in the response. According to the OpenAI chat completion streaming specification, logprobs should be included in each choice object when requested. Currently, _process_token_stream correctly extracts and includes logprobs using the _extract_logprobs method, but _process_text_stream ignores this data entirely, causing OpenAI-format requests to violate API spec compliance by omitting logprobs even when requested.

Add logprobs extraction to the choice_data dict in _process_text_stream, similar to how _process_token_stream handles it, to ensure logprobs are included in the OpenAI-formatted chunks when requested.

🧹 Nitpick comments (1)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (1)

58-61: Inconsistent logger usage.

These lines use logging.info() (root logger) while the logprobs validation uses the module-level logger defined at line 17. For consistency and proper log namespacing, consider using logger.info() throughout.

         if self.serving_mode == DisaggregationMode.DECODE:
             if self.prefill_client is None:
                 raise ValueError(
                     "prefill_client must be provided when serving_mode is decode"
                 )
             self.prefill_client = prefill_client
-            logging.info("Decode worker handler initialized")
+            logger.info("Decode worker handler initialized")

         self.prefill_router_client = prefill_router_client
-        logging.info("Worker handler initialized")
+        logger.info("Worker handler initialized")
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3324994 and 32486d5.

📒 Files selected for processing (2)
  • components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (6 hunks)
  • tests/serve/test_sglang.py (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
tests/serve/test_sglang.py (1)
tests/utils/payload_builder.py (2)
  • chat_payload_with_logprobs (316-356)
  • completion_payload_with_logprobs (359-393)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (2)
components/src/dynamo/vllm/handlers.py (1)
  • _extract_logprobs (683-740)
components/src/dynamo/trtllm/request_handlers/handler_base.py (1)
  • _extract_logprobs (110-177)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (5)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (4)

17-18: LGTM!

Adding a module-level logger is appropriate for the new warning messages in logprobs validation.


92-107: LGTM!

The logprobs validation is thorough—handles empty strings, non-integer values, and negative integers with appropriate warnings.


117-124: LGTM!

The OpenAI-format handling correctly differentiates between the boolean logprobs flag and the optional top_logprobs count, matching the OpenAI API specification.


346-355: LGTM!

The logprobs extraction is correctly integrated after determining new tokens, and fields are only added when non-None.

tests/serve/test_sglang.py (1)

20-22: LGTM!

The new payload builder imports are correctly added alongside existing imports.

@AryanBagade AryanBagade force-pushed the feat/add-logprobs-support-sglang-backend branch from e6b1c9d to 246ab54 Compare December 12, 2025 00:22
Copy link
Contributor

@ishandhanani ishandhanani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some thoughts

async for out in self._process_text_stream(agg, context):
yield out

@staticmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt about moving this into handler base?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had some digging here, basically the extraction logic is backend-specific since SGLang returns logprobs as tuples (logprob, token_id, decoded_text) which differs from TRTLLM's structure.
vLLM also keeps its _extract_logprobs in handlers.py rather than a shared base for the same reason. Happy to refactor if you see a clean abstraction though!

@karen-sy
Copy link
Contributor

/ok to test 45fb04e

AryanBagade and others added 2 commits December 12, 2025 19:13
…'s SamplingParams doesn't accept return_logprob or top_logprobs_num. These must be passed as separate kwargs to engine.async_generate().

Signed-off-by: Aryan Bagade <[email protected]>
@AryanBagade
Copy link
Contributor Author

Fixed logprobs params, return_logprob and top_logprobs_num are passed to async_generate() as separate kwargs, not inside SamplingParams.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Add logprobs support to SGLang Backend

3 participants