docs: disagg router docs update (#4093)

PeaBrane · web-flow · commit 17af7a0fef47 · 2025-11-05T05:34:29.000Z
Signed-off-by: PeaBrane &lt;yanrpei@gmail.com&gt;
diff --git a/benchmarks/router/README.md b/benchmarks/router/README.md
@@ -118,23 +118,27 @@ python -m dynamo.frontend --help
 
 For detailed explanations of router arguments (especially KV cache routing parameters), see the [KV Cache Routing documentation](../../docs/router/kv_cache_routing.md).
 
+> [!Note]
+> If you're unsure whether your backend engines correctly emit KV events for certain models (e.g., hybrid models like gpt-oss or nemotron nano 2), use the `--no-kv-events` flag to disable KV event tracking and use approximate KV indexing instead:
+>
+> ```bash
+> python -m dynamo.frontend \
+>     --router-mode kv \
+>     --http-port 8000 \
+>     --no-kv-events
+> ```
+
 #### Disaggregated Serving with Automatic Prefill Routing
 
 When you launch prefill workers using `run_engines.sh --prefill`, the frontend automatically detects them and activates an internal prefill router. This prefill router:
 - Automatically routes initial token processing to dedicated prefill workers
-- Uses KV-aware routing regardless of the frontend's `--router-mode` setting
+- Uses the same routing mode as the frontend's `--router-mode` setting
 - Seamlessly integrates with your decode workers for token generation
 
 No additional configuration is needed - simply launch both decode and prefill workers, and the system handles the rest. See the [KV Cache Routing documentation](../../docs/router/kv_cache_routing.md#disaggregated-serving-prefill-and-decode) for more details.
 
-**Note**: If you're unsure whether your backend engines correctly emit KV events for certain models (e.g., hybrid models like gpt-oss or nemotron nano 2), use the `--no-kv-events` flag to disable KV event tracking and use approximate KV indexing instead:
-
-```bash
-python -m dynamo.frontend \
-    --router-mode kv \
-    --http-port 8000 \
-    --no-kv-events
-```
+> [!Note]
+> The unified frontend with automatic prefill routing is currently enabled for vLLM and TensorRT-LLM backends. For SGLang (work in progress), you need to launch a separate standalone router as the prefill router targeting the prefill endpoints. See example script: [`examples/backends/sglang/launch/disagg_router.sh`](../../examples/backends/sglang/launch/disagg_router.sh)
 
 ### Step 3: Verify Setup
 
diff --git a/docs/router/kv_cache_routing.md b/docs/router/kv_cache_routing.md
@@ -65,13 +65,14 @@ The prefill router is automatically created when:
 2. A prefill worker is detected with the same model name and `ModelType.Prefill`
 
 **Key characteristics of the prefill router:**
-- **Always uses KV-aware routing** regardless of the frontend's `--router-mode` setting
 - **Always disables active block tracking** (`track_active_blocks=false`) since prefill workers don't perform decode
 - **Seamlessly integrated** into the request pipeline between preprocessing and decode routing
 - **Falls back gracefully** to decode-only mode if prefill fails or no prefill workers are available
 
 ### Setup Example
 
+When both workers are registered, requests are automatically routed.
+
 ```python
 # Decode worker registration (in your decode worker)
 await register_llm(
@@ -92,12 +93,8 @@ await register_llm(
 )
 ```
 
-When both workers are registered, requests are automatically routed:
-1. **Prefill phase** → Prefill router selects best prefill worker (KV-aware)
-2. **Decode phase** → Decode router selects decode worker (uses frontend's `--router-mode`)
-
 > [!Note]
-> **WIP**: Currently, the prefill router always uses KV routing. Future updates will provide more fine-grained control over prefill routing behavior to match user-specified frontend router modes.
+> The unified frontend with automatic prefill routing is currently enabled for vLLM and TensorRT-LLM backends. For SGLang (work in progress), you need to launch a separate standalone router as the prefill router targeting the prefill endpoints. See example script: [`examples/backends/sglang/launch/disagg_router.sh`](../../examples/backends/sglang/launch/disagg_router.sh).
 
 ## Overview