router select dp group with the minimum number of tokens #13208

jiashaokun-1 · 2025-11-13T14:30:23Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

gemini-code-assist · 2025-11-13T14:30:54Z

Summary of Changes

Hello @jiashaokun-1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the router's ability to manage and distribute requests across data parallel workers. By introducing a new scheduling strategy that considers the minimum number of tokens on individual data parallel ranks, it aims to optimize resource utilization and improve the efficiency of prefill and decode operations. The changes involve extending API parameters, updating internal data structures to track per-rank loads, and modifying the core routing logic to leverage this new load information for more intelligent request distribution.

Highlights

Data Parallel Rank Routing: Introduced a new mechanism to route requests to specific data parallel (DP) ranks within workers based on their current token load. This allows for more granular load balancing in data-parallel setups.
Configuration Options: Added a new router configuration option, dp_minimum_tokens_scheduler, to enable this new minimum token-based scheduling for data parallel groups. Also introduced worker_load_check_interval to control how frequently worker loads are checked.
API and Internal Data Structures Update: Extended various request and protocol structures (e.g., GenerateReqInput, ChatCompletionRequest, GenerateRequest) with a decode_dp_rank field, alongside the existing data_parallel_rank, to differentiate routing for prefill and decode stages.
Worker Load Management: The WorkerManager and LoadBalancingPolicy implementations (CacheAware, PowerOfTwo, Random, RoundRobin) were updated to fetch, store, and utilize per-DP-rank load information, rather than just aggregate worker load. A new DPLoadManager was introduced to handle this DP-specific load caching and incrementing.
Prefill-Decode Router Integration: The PDRouter now incorporates the logic to select the appropriate data_parallel_rank and decode_dp_rank for requests by querying the load balancing policies for the ranks with the lowest token count, when the dp_minimum_tokens_scheduler is enabled.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a new router scheduling strategy to select a data parallel (DP) group based on the one with the minimum number of tokens. This is a significant feature for load balancing in distributed setups. The changes are extensive, correctly propagating new configuration options and logic through both the Python and Rust codebases. The introduction of a reusable DPLoadManager in Rust is a good design choice for managing DP loads. I've identified a minor bug where a statement was likely intended to be a log message, and some unreachable code that can be simplified. Overall, this is a solid and valuable contribution.

gemini-code-assist · 2025-11-13T14:32:42Z

python/sglang/srt/managers/data_parallel_controller.py

+                self.workers[req.decode_dp_rank].send_pyobj(req)
+                return True
+            if req.data_parallel_rank is not None:
+                (f"Decode direct routing to DP rank {req.decode_dp_rank}, by data parallel rank")


This f-string is a statement with no effect and seems to be a bug. It was likely intended to be a logger.debug call. Additionally, it references req.decode_dp_rank while this code block routes based on req.data_parallel_rank, which could be confusing. I've corrected it to log req.data_parallel_rank to match the routing logic.

Suggested change

(f"Decode direct routing to DP rank {req.decode_dp_rank}, by data parallel rank")

logger.debug(f"Decode direct routing to DP rank {req.data_parallel_rank}, by data parallel rank")

gemini-code-assist · 2025-11-13T14:32:42Z

sgl-router/src/app_context.rs

+            if let Some(policy_registry) = &self.policy_registry {
+                policy_registry.enable_dp_minimum_tokens_scheduler();
+            } else {
+                info!("policy_registry is None")
+            }


Since self.policy_registry is assigned a Some(...) value on line 428, the if let Some(...) on line 430 will always succeed. This makes the else block unreachable dead code. You can simplify this block to a direct call.

self.policy_registry.as_ref().unwrap().enable_dp_minimum_tokens_scheduler();

jiashaokun-1 added 3 commits November 13, 2025 22:08

router select dp group with the minimum number of tokens 02

57cc99d

router select dp group with the minimum number of tokens 03

12484e1

router select dp group with the minimum number of tokens 05

5a705db

jiashaokun-1 requested review from ByronHsu, CatherineSue, JustinTong0323, Ying1123, hnyls2002, ispobock, key4ng, merrymercy, slin1237, xiezhq-hermann and zhyncs as code owners November 13, 2025 14:30

github-actions bot added the router label Nov 13, 2025

gemini-code-assist bot reviewed Nov 13, 2025

View reviewed changes

router select dp group with the minimum number of tokens 06

1a5228e

jiashaokun-1 force-pushed the main branch from 27a4896 to 1a5228e Compare November 13, 2025 14:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

router select dp group with the minimum number of tokens #13208

router select dp group with the minimum number of tokens #13208

jiashaokun-1 commented Nov 13, 2025 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

gemini-code-assist bot Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	(f"Decode direct routing to DP rank {req.decode_dp_rank}, by data parallel rank")
	logger.debug(f"Decode direct routing to DP rank {req.data_parallel_rank}, by data parallel rank")

router select dp group with the minimum number of tokens #13208

Are you sure you want to change the base?

router select dp group with the minimum number of tokens #13208

Conversation

jiashaokun-1 commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist bot commented Nov 13, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jiashaokun-1 commented Nov 13, 2025 •

edited

Loading