-
Notifications
You must be signed in to change notification settings - Fork 742
fix: TRT-LLM cache transceiver setting issue since 1.2.0rc3 #4703
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughUpdated TensorRT-LLM cancellation test timings, reduced several timeouts, removed Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/fault_tolerance/cancellation/test_trtllm.py (1)
39-65: Keep TRT‑LLM token limits in sync via a single constantYou now have
16384hard‑coded three times (--max-seq-len,--max-num-tokens, andmax_tokens_in_bufferin the YAML). To avoid future mismatches that would violate the “must match” requirement, consider centralizing this as a single constant and reusing it for both the CLI args and the YAML write:- command = [ + max_tokens = "16384" + + command = [ "python3", @@ - "--max-seq-len", - "16384", - "--max-num-tokens", - "16384", + "--max-seq-len", + max_tokens, + "--max-num-tokens", + max_tokens, @@ - if mode != "prefill_and_decode": + if mode != "prefill_and_decode": with open("test_request_cancellation_trtllm_config.yaml", "w") as f: - f.write( - "cache_transceiver_config:\n backend: DEFAULT\n max_tokens_in_buffer: 16384\n" - ) + f.write( + f"cache_transceiver_config:\n" + f" backend: DEFAULT\n" + f" max_tokens_in_buffer: {max_tokens}\n" + )This keeps the test logic aligned with the TRT‑LLM requirement that all three limits match while making future changes less error‑prone.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/fault_tolerance/cancellation/test_trtllm.py(1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: tests/fault_tolerance/cancellation/test_trtllm.py:4-204
Timestamp: 2025-09-25T00:54:01.369Z
Learning: The fault tolerance tests in tests/fault_tolerance/cancellation/ run in a controlled container environment where files written to /workspace are automatically cleaned up after test completion, and tests execute sequentially without concurrency concerns, so temporary file management for config files is not necessary.
📚 Learning: 2025-10-03T01:53:15.023Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
📚 Learning: 2025-09-25T00:49:16.914Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: vllm (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: sglang (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
tests/fault_tolerance/cancellation/test_trtllm.py (1)
374-462: Confirm KV‑transfer cancellation test stability after xfail removalNow that
test_request_cancellation_trtllm_kv_transfer_cancelruns as a normal test, please double‑check it’s stable given the strict 500 ms default timeout inpoll_for_patternfor cancellation propagation. Inter‑process log polling plus the new cache‑transceiver config could still surface rare timing issues, so it’s worth running this test repeatedly (and perhaps under load) to ensure it doesn’t exhibit intermittent failures. Based on learnings, this timeout is intentionally tight for TRT‑LLM cancellation.
1c1c87c to
a37800c
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/fault_tolerance/cancellation/test_trtllm.py (1)
33-40: Update the marker comment: it contradicts the newpre_mergemarker.
Line 38 says “post_merge to pinpoint failure commit” but the code ispytest.mark.pre_merge; this is confusing and (given marker enforcement) should be accurate. Based on learnings, marker correctness is intentionally enforced.
🧹 Nitpick comments (1)
tests/fault_tolerance/cancellation/test_trtllm.py (1)
67-93: Centralize16384constant and use unique config filenames to prevent drift and parallel-execution collisions.The code contains four hardcoded instances of
16384that must stay in sync (--max-seq-len,--max-num-tokens,max_tokens_in_buffer,kv_cache_config.max_tokens), creating maintenance risk. Additionally, the fixed filenametest_request_cancellation_trtllm_config.yamlwill collide if tests run in parallel withpytest-xdist.Recommended approach:
- Define
MAX_TOKENS = 16384at module level and reference it everywhere.- Generate unique config filenames using test identity (e.g., with
request.node.nameandmode) to avoid collisions under parallel execution.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/fault_tolerance/cancellation/test_trtllm.py(7 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.
Learnt from: bhuvan002
Repo: ai-dynamo/dynamo PR: 2702
File: components/backends/trtllm/src/dynamo/trtllm/logits_processing/adapter.py:7-13
Timestamp: 2025-08-28T20:43:54.701Z
Learning: CI import failures for optional dependencies like tensorrt_llm can be resolved by removing __init__.py files and restructuring imports, rather than wrapping imports in try/except blocks.
📚 Learning: 2025-10-03T01:53:15.023Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
📚 Learning: 2025-09-25T00:54:01.369Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: tests/fault_tolerance/cancellation/test_trtllm.py:4-204
Timestamp: 2025-09-25T00:54:01.369Z
Learning: The fault tolerance tests in tests/fault_tolerance/cancellation/ run in a controlled container environment where files written to /workspace are automatically cleaned up after test completion, and tests execute sequentially without concurrency concerns, so temporary file management for config files is not necessary.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
📚 Learning: 2025-12-11T21:55:13.944Z
Learnt from: pvijayakrish
Repo: ai-dynamo/dynamo PR: 4812
File: scripts/report_pytest_markers.py:412-438
Timestamp: 2025-12-11T21:55:13.944Z
Learning: In scripts/report_pytest_markers.py, the unconditional failure when report.total_missing > 0 (tests missing required marker categories) is intentional behavior designed to catch missing markers at PR review and merge time, not a bug requiring phased rollout options.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
📚 Learning: 2025-09-25T00:49:16.914Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: sglang (amd64)
- GitHub Check: operator (arm64)
- GitHub Check: sglang (arm64)
- GitHub Check: trtllm (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: operator (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
tests/fault_tolerance/cancellation/test_trtllm.py (1)
168-168: Timeout reductions: please sanity-check against worst-case engine init + pre_merge load.
The reduced timeouts match the updated “3x average” intent, but since these are nowpre_merge, verify they still pass under busy/shared runners and both request planes.Also applies to: 255-255, 339-339, 431-431
9d5a402 to
7f025b1
Compare
7f025b1 to
1abc47f
Compare
1abc47f to
e9707a4
Compare
e9707a4 to
c57ef93
Compare
Signed-off-by: Jacky <[email protected]>
Signed-off-by: Jacky <[email protected]>
Signed-off-by: Jacky <[email protected]>
Signed-off-by: Jacky <[email protected]>
c57ef93 to
090f987
Compare
090f987 to
85621a2
Compare
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/fault_tolerance/cancellation/test_trtllm.py (1)
84-88: Cache configuration correctly implements PR objectives.The addition of
max_tokens_in_buffer: 16384andkv_cache_config.max_tokens: 16384properly aligns with the command-line flags--max-seq-lenand--max-num-tokens(lines 76-78), which matches the PR's goal of ensuring cache transceiver settings are consistent with CLI parameters.Consider extracting the hard-coded
16384value to a module-level constant to improve maintainability:+# TRT-LLM test configuration +TEST_MAX_TOKENS = 16384 + class DynamoWorkerProcess(ManagedProcess): ... command = [ ... "--max-seq-len", - "16384", + str(TEST_MAX_TOKENS), "--max-num-tokens", - "16384", + str(TEST_MAX_TOKENS), ... ] if mode != "prefill_and_decode": with open("test_request_cancellation_trtllm_config.yaml", "w") as f: f.write( - "cache_transceiver_config:\n backend: DEFAULT\n max_tokens_in_buffer: 16384\n" + f"cache_transceiver_config:\n backend: DEFAULT\n max_tokens_in_buffer: {TEST_MAX_TOKENS}\n" ) f.write("disable_overlap_scheduler: true\n") - f.write("kv_cache_config:\n max_tokens: 16384\n") + f.write(f"kv_cache_config:\n max_tokens: {TEST_MAX_TOKENS}\n")
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/fault_tolerance/cancellation/test_trtllm.py(6 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.
📚 Learning: 2025-10-03T01:53:15.023Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
📚 Learning: 2025-09-25T00:54:01.369Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: tests/fault_tolerance/cancellation/test_trtllm.py:4-204
Timestamp: 2025-09-25T00:54:01.369Z
Learning: The fault tolerance tests in tests/fault_tolerance/cancellation/ run in a controlled container environment where files written to /workspace are automatically cleaned up after test completion, and tests execute sequentially without concurrency concerns, so temporary file management for config files is not necessary.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
📚 Learning: 2025-09-25T00:49:16.914Z
Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.
Applied to files:
tests/fault_tolerance/cancellation/test_trtllm.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: sglang (arm64)
- GitHub Check: vllm (amd64)
- GitHub Check: Build and Test - dynamo
🔇 Additional comments (1)
tests/fault_tolerance/cancellation/test_trtllm.py (1)
168-168: LGTM! Timeout reductions align with deterministic KV cache allocation.The timeout adjustments (135s for aggregated, 195s for disaggregated tests) appropriately reflect the performance improvements from switching to exact KV cache sizing. The 3x safety margin over documented average runtimes is reasonable.
Also applies to: 255-255, 339-339, 432-432
Overview:
Fix TRT-LLM cache transceiver setting issue since 1.2.0rc3
Details:
max_tokens_in_bufferneed to match--max-seq-lenand--max-num-tokensOther changes:
Where should the reviewer start?
N/A
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
N/A
Summary by CodeRabbit
Tests
Chores
✏️ Tip: You can customize this high-level summary in your review settings.