-
Notifications
You must be signed in to change notification settings - Fork 728
feat: unified tracing across dynamo + sglang #4248
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Adds `--enable-otel` command-line flag to four SGLang launch scripts for easy OpenTelemetry tracing enablement. When enabled, the flag: - Exports DYN_LOGGING_JSONL=true for JSON logging - Exports OTEL_EXPORT_ENABLED=1 to enable trace export - Sets OTEL_EXPORTER_OTLP_TRACES_ENDPOINT (defaults to http://localhost:4317) - Sets DYN_SYSTEM_PORT=8081 to enable system metrics server - Prefixes each process with OTEL_SERVICE_NAME for trace attribution Modified scripts: - examples/backends/sglang/launch/agg.sh - examples/backends/sglang/launch/disagg.sh - examples/backends/sglang/launch/multimodal_agg.sh - examples/backends/sglang/launch/multimodal_disagg.sh Usage: ./agg.sh --enable-otel Note: Uses DYN_SYSTEM_PORT instead of deprecated DYN_SYSTEM_ENABLED per #4082
Adds --enable-metrics flag to agg.sh and disagg.sh with automatic port allocation to avoid conflicts when multiple processes run on the same node. Changes: - Added --enable-metrics flag to control system metrics server - Automatic port allocation: - agg.sh: frontend (8080), worker (8081) - disagg.sh: frontend (8080), prefill (8081), decode (8082) - OTEL tracing automatically enables metrics on same ports when used - Ports only assigned when flags are enabled (empty otherwise) Usage: ./agg.sh --enable-metrics # Enable metrics only ./agg.sh --enable-otel # Enable tracing + metrics ./agg.sh --enable-metrics --enable-otel # Both explicit Note: Each worker process gets a unique DYN_SYSTEM_PORT when using CUDA_VISIBLE_DEVICES on the same node.
Changes metrics to be enabled by default rather than opt-in. This provides better observability out-of-the-box for all deployments. Changes: - Set ENABLE_METRICS=true by default in agg.sh and disagg.sh - Removed --enable-metrics flag (no longer needed) - Updated help text to document default port assignments - Port allocation remains the same: - agg.sh: frontend (8080), worker (8081) - disagg.sh: frontend (8080), prefill (8081), decode (8082) Usage: ./agg.sh # Metrics enabled by default ./agg.sh --enable-otel # Metrics + tracing enabled
Simplified the scripts by removing conditional metrics logic since metrics are now always enabled by default: Changes: - Removed ENABLE_METRICS variable (no longer needed) - Removed METRICS_ARGS variable and all conditionals - Hardcoded DYN_SYSTEM_PORT values directly (8080, 8081, 8082) - Always pass --enable-metrics to sglang commands - Cleaner, simpler code without unnecessary variables
- Add --enable-otel flag to agg_embed.sh, agg_router.sh, disagg_router.sh, and disagg_same_gpu.sh - Enable system metrics by default with proper port allocation for all workers - Add OTEL_SERVICE_NAME to each process for proper trace attribution - Remove deprecated disagg_dp_attn.sh launch script - Update help text to document system metrics ports 🤖 Generated with [Claude Code](https://claude.com/claude-code)
- Remove DYN_SYSTEM_PORT=8080 from all frontend commands across launch scripts - Update help text to remove frontend port from metrics documentation - Revert all OTEL changes to multimodal_agg.sh and multimodal_disagg.sh Frontend does not need DYN_SYSTEM_PORT as it doesn't expose system metrics. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Revert disagg_same_gpu.sh to its original state before OTEL flag additions. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
WalkthroughThe pull request introduces trace identifier extraction and propagation across request handlers, and enables tracing configuration in the aggregation deployment script with OTLP endpoint specification for observability. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes
Poem
Pre-merge checks❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py (1)
67-87: Critical: Trace ID extracted but not propagated to the engine.The trace_id is extracted from the context but never passed to the SGLang engine, breaking the unified tracing objective of this PR. In
decode_handler.py(line 178), the trace_id is correctly passed asrid=trace_idtoengine.async_generate.Apply this diff to propagate the trace_id:
trace_id = context.trace_id bootstrap_room = self._generate_bootstrap_room() bootstrap_info = { "bootstrap_host": self.bootstrap_host, "bootstrap_port": self.bootstrap_port, "bootstrap_room": bootstrap_room, } yield bootstrap_info input_param = self._get_input_param(request["request"]) results = await self.engine.async_generate( **input_param, sampling_params=request["sampling_params"], stream=True, bootstrap_host=self.bootstrap_host, bootstrap_port=self.bootstrap_port, bootstrap_room=bootstrap_room, + rid=trace_id, )
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py(2 hunks)components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py(1 hunks)examples/backends/sglang/launch/agg.sh(1 hunks)
🧰 Additional context used
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4248/merge) by ishandhanani.
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
[error] 67-67: Ruff: 'trace_id' assigned to but never used. (F841)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py
[warning] 1-1: Black formatting: file reformatted by Black during pre-commit.
🪛 Ruff (0.14.4)
components/src/dynamo/sglang/request_handlers/llm/prefill_handler.py
67-67: Local variable trace_id is assigned to but never used
Remove assignment to unused variable trace_id
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: trtllm (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: sglang (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/src/dynamo/sglang/request_handlers/llm/decode_handler.py (3)
115-115: Trace ID extracted correctly.The trace_id is properly extracted and will be used for trace propagation.
174-179: Trace ID correctly propagated in aggregated mode.The trace_id is correctly passed as
rid=trace_idto the engine, enabling unified tracing for aggregated serving mode.
158-165: I've examined the full decode_handler.py file. The review comment's observation is factually correct:
- DECODE mode (lines 158-165): Does NOT pass
rid=trace_idtoengine.async_generate- Aggregated mode (line 178): DOES pass
rid=trace_idHowever, this pattern is consistent throughout the codebase—other bootstrap-related calls in
multimodal/worker_handler.py,main.py, andprefill_handler.pyalso omit theridparameter.The inconsistency between the two modes in the same file (aggregated passes
rid, DECODE does not) suggests either: (1) theridparameter is unsupported/ineffective in bootstrap context, or (2) trace propagation in disaggregated mode is intentionally handled differently (via thecontextobject passed toprefill_client). Without access to SGLang engine internals or design documentation, this cannot be definitively resolved.
…ishan/unify-trace
biswapanda
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: ishandhanani <[email protected]>
Summary
Implements distributed trace propagation from Dynamo to SGLang, enabling end-to-end request tracing across the stack:
dynamo-frontend → dynamo-worker → sglang{agg,disagg}.Changes
Core Implementation
handler_base.py_propagate_trace_context_to_sglang()method to BaseWorkerHandlertrace_set_remote_propagate_context()decode_handler.py&prefill_handler.pytrace_idfromcontext.trace_id_propagate_trace_context_to_sglang(context, bootstrap_room)beforeengine.async_generate()rid=trace_idto SGLang engine (requires feat(engine): add rid parameter to methods in Engine class sgl-project/sglang#13095 to be released)How It Works
SGLang Trace Context Propagation
SGLang supports trace propagation via
trace_set_remote_propagate_context(), which stores trace context keyed bybootstrap_room:When SGLang's trace_req_start() runs, it checks remote_trace_contexts[str(bootstrap_room)] and uses our context as the parent span.
Trace Flow
Result
Example (disagg)