fix: TRT-LLM cache transceiver setting issue since 1.2.0rc3 #4703

kthui · 2025-12-02T20:31:46Z

Overview:

Fix TRT-LLM cache transceiver setting issue since 1.2.0rc3

Details:

max_tokens_in_buffer need to match --max-seq-len and --max-num-tokens

Other changes:

Allocate KV cache exactly for the max sequence length, instead of a percentage of free memory.
Update test timeout markers, accounting for the reduced KV cache space allocation.

Where should the reviewer start?

N/A

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

N/A

Summary by CodeRabbit

Tests
- Adjusted fault‑tolerance cancellation tests with shorter per‑case and aggregated durations and reduced timeouts for faster, more consistent runs.
- Simplified expected‑failure handling for KV‑transfer cancellation to a lighter xfail (non‑strict).
- Updated timing comments/docs to reflect recent runs.
Chores
- Replaced a GPU memory startup argument with token/buffer and KV cache limits for improved buffering behavior.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-02T20:46:25Z

Walkthrough

Updated TensorRT-LLM cancellation test timings, reduced several timeouts, removed free-gpu-memory-fraction, added max_tokens_in_buffer and KV-cache max_tokens for non-prefill_and_decode modes, simplified XFAIL for kv_transfer test, and adjusted test markers and doc comments.

Changes

Cohort / File(s)	Summary
TensorRT-LLM cancellation tests `tests/fault_tolerance/cancellation/test_trtllm.py`	Reduced per-test durations and aggregated timeouts (e.g., per-case ~115s → ~65s; aggregated 140s→135s; several 350s→195s). Removed `free-gpu-memory-fraction` arg. Added `max_tokens_in_buffer: 16384` and `kv_cache_config: max_tokens: 16384` when mode ≠ `prefill_and_decode`. Simplified XFAIL for `test_request_cancellation_trtllm_kv_transfer_cancel`. Updated pytest mark and timing comments.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Verify timeout reductions are safe and not flaky in CI.
Confirm new config keys (max_tokens_in_buffer, kv_cache_config.max_tokens) are applied only when mode ≠ prefill_and_decode.
Search for any remaining free-gpu-memory-fraction references.
Check test_request_cancellation_trtllm_kv_transfer_cancel expectations after XFAIL simplification.
Ensure pytest mark change aligns with CI policies.

Poem

🐇 I hopped through tests at dawn’s first light,

Trimming waits, setting buffers tight,
No heavy X now slows the run,
KV caches ready, tokens one-six-three-eight-four, fun! 🎉

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main fix: addressing a TRT-LLM cache transceiver configuration issue introduced in version 1.2.0rc3, which aligns with the PR's primary objective.
Description check	✅ Passed	The description includes all required template sections (Overview, Details, Where to start, Related Issues) with substantive content explaining the fix and supporting changes, though the 'Where should the reviewer start' section is marked N/A.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

tests/fault_tolerance/cancellation/test_trtllm.py (1)

39-65: Keep TRT‑LLM token limits in sync via a single constant

You now have 16384 hard‑coded three times (--max-seq-len, --max-num-tokens, and max_tokens_in_buffer in the YAML). To avoid future mismatches that would violate the “must match” requirement, consider centralizing this as a single constant and reusing it for both the CLI args and the YAML write:

-        command = [
+        max_tokens = "16384"
+
+        command = [
           "python3",
@@
-            "--max-seq-len",
-            "16384",
-            "--max-num-tokens",
-            "16384",
+            "--max-seq-len",
+            max_tokens,
+            "--max-num-tokens",
+            max_tokens,
@@
-        if mode != "prefill_and_decode":
+        if mode != "prefill_and_decode":
             with open("test_request_cancellation_trtllm_config.yaml", "w") as f:
-                f.write(
-                    "cache_transceiver_config:\n  backend: DEFAULT\n  max_tokens_in_buffer: 16384\n"
-                )
+                f.write(
+                    f"cache_transceiver_config:\n"
+                    f"  backend: DEFAULT\n"
+                    f"  max_tokens_in_buffer: {max_tokens}\n"
+                )

This keeps the test logic aligned with the TRT‑LLM requirement that all three limits match while making future changes less error‑prone.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 44e8600 and 8c37f8e.

📒 Files selected for processing (1)

tests/fault_tolerance/cancellation/test_trtllm.py (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: tests/fault_tolerance/cancellation/test_trtllm.py:4-204
Timestamp: 2025-09-25T00:54:01.369Z
Learning: The fault tolerance tests in tests/fault_tolerance/cancellation/ run in a controlled container environment where files written to /workspace are automatically cleaned up after test completion, and tests execute sequentially without concurrency concerns, so temporary file management for config files is not necessary.

📚 Learning: 2025-10-03T01:53:15.023Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.

Applied to files:

tests/fault_tolerance/cancellation/test_trtllm.py

📚 Learning: 2025-09-25T00:49:16.914Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.

Applied to files:

tests/fault_tolerance/cancellation/test_trtllm.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: vllm (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: trtllm (arm64)
GitHub Check: sglang (arm64)
GitHub Check: sglang (amd64)
GitHub Check: vllm (amd64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

tests/fault_tolerance/cancellation/test_trtllm.py (1)

374-462: Confirm KV‑transfer cancellation test stability after xfail removal

Now that test_request_cancellation_trtllm_kv_transfer_cancel runs as a normal test, please double‑check it’s stable given the strict 500 ms default timeout in poll_for_pattern for cancellation propagation. Inter‑process log polling plus the new cache‑transceiver config could still surface rare timing issues, so it’s worth running this test repeatedly (and perhaps under load) to ensure it doesn’t exhibit intermittent failures. Based on learnings, this timeout is intentionally tight for TRT‑LLM cancellation.

kthui · 2025-12-13T00:44:21Z

@coderabbitai full review

coderabbitai · 2025-12-13T00:44:35Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

tests/fault_tolerance/cancellation/test_trtllm.py (1)

33-40: Update the marker comment: it contradicts the new pre_merge marker.
Line 38 says “post_merge to pinpoint failure commit” but the code is pytest.mark.pre_merge; this is confusing and (given marker enforcement) should be accurate. Based on learnings, marker correctness is intentionally enforced.

🧹 Nitpick comments (1)

tests/fault_tolerance/cancellation/test_trtllm.py (1)

67-93: Centralize 16384 constant and use unique config filenames to prevent drift and parallel-execution collisions.

The code contains four hardcoded instances of 16384 that must stay in sync (--max-seq-len, --max-num-tokens, max_tokens_in_buffer, kv_cache_config.max_tokens), creating maintenance risk. Additionally, the fixed filename test_request_cancellation_trtllm_config.yaml will collide if tests run in parallel with pytest-xdist.

Recommended approach:

Define MAX_TOKENS = 16384 at module level and reference it everywhere.

Generate unique config filenames using test identity (e.g., with request.node.name and mode) to avoid collisions under parallel execution.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45e881d and a37800c.

📒 Files selected for processing (1)

tests/fault_tolerance/cancellation/test_trtllm.py (7 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.

Learnt from: bhuvan002
Repo: ai-dynamo/dynamo PR: 2702
File: components/backends/trtllm/src/dynamo/trtllm/logits_processing/adapter.py:7-13
Timestamp: 2025-08-28T20:43:54.701Z
Learning: CI import failures for optional dependencies like tensorrt_llm can be resolved by removing __init__.py files and restructuring imports, rather than wrapping imports in try/except blocks.

📚 Learning: 2025-10-03T01:53:15.023Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.

Applied to files:

tests/fault_tolerance/cancellation/test_trtllm.py

📚 Learning: 2025-09-25T00:54:01.369Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: tests/fault_tolerance/cancellation/test_trtllm.py:4-204
Timestamp: 2025-09-25T00:54:01.369Z
Learning: The fault tolerance tests in tests/fault_tolerance/cancellation/ run in a controlled container environment where files written to /workspace are automatically cleaned up after test completion, and tests execute sequentially without concurrency concerns, so temporary file management for config files is not necessary.

Applied to files:

tests/fault_tolerance/cancellation/test_trtllm.py

📚 Learning: 2025-12-11T21:55:13.944Z

Learnt from: pvijayakrish
Repo: ai-dynamo/dynamo PR: 4812
File: scripts/report_pytest_markers.py:412-438
Timestamp: 2025-12-11T21:55:13.944Z
Learning: In scripts/report_pytest_markers.py, the unconditional failure when report.total_missing > 0 (tests missing required marker categories) is intentional behavior designed to catch missing markers at PR review and merge time, not a bug requiring phased rollout options.

Applied to files:

tests/fault_tolerance/cancellation/test_trtllm.py

📚 Learning: 2025-09-25T00:49:16.914Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.

Applied to files:

tests/fault_tolerance/cancellation/test_trtllm.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)

GitHub Check: sglang (amd64)
GitHub Check: operator (arm64)
GitHub Check: sglang (arm64)
GitHub Check: trtllm (amd64)
GitHub Check: vllm (amd64)
GitHub Check: trtllm (arm64)
GitHub Check: vllm (arm64)
GitHub Check: operator (amd64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

tests/fault_tolerance/cancellation/test_trtllm.py (1)

168-168: Timeout reductions: please sanity-check against worst-case engine init + pre_merge load.
The reduced timeouts match the updated “3x average” intent, but since these are now pre_merge, verify they still pass under busy/shared runners and both request planes.

Also applies to: 255-255, 339-339, 431-431

tests/fault_tolerance/cancellation/test_trtllm.py

Signed-off-by: Jacky <[email protected]>

kthui · 2025-12-16T21:23:09Z

@coderabbitai full review

coderabbitai · 2025-12-16T21:23:15Z

✅ Actions performed

Full review triggered.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/fault_tolerance/cancellation/test_trtllm.py (1)

84-88: Cache configuration correctly implements PR objectives.

The addition of max_tokens_in_buffer: 16384 and kv_cache_config.max_tokens: 16384 properly aligns with the command-line flags --max-seq-len and --max-num-tokens (lines 76-78), which matches the PR's goal of ensuring cache transceiver settings are consistent with CLI parameters.

Consider extracting the hard-coded 16384 value to a module-level constant to improve maintainability:

+# TRT-LLM test configuration
+TEST_MAX_TOKENS = 16384
+
 class DynamoWorkerProcess(ManagedProcess):
     ...
         command = [
             ...
             "--max-seq-len",
-            "16384",
+            str(TEST_MAX_TOKENS),
             "--max-num-tokens",
-            "16384",
+            str(TEST_MAX_TOKENS),
             ...
         ]
         if mode != "prefill_and_decode":
             with open("test_request_cancellation_trtllm_config.yaml", "w") as f:
                 f.write(
-                    "cache_transceiver_config:\n  backend: DEFAULT\n  max_tokens_in_buffer: 16384\n"
+                    f"cache_transceiver_config:\n  backend: DEFAULT\n  max_tokens_in_buffer: {TEST_MAX_TOKENS}\n"
                 )
                 f.write("disable_overlap_scheduler: true\n")
-                f.write("kv_cache_config:\n  max_tokens: 16384\n")
+                f.write(f"kv_cache_config:\n  max_tokens: {TEST_MAX_TOKENS}\n")

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59e6873 and 85621a2.

📒 Files selected for processing (1)

tests/fault_tolerance/cancellation/test_trtllm.py (6 hunks)

🧰 Additional context used

🧠 Learnings (4)

📓 Common learnings

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/distributed/transfer.rs:6-6
Timestamp: 2025-09-18T21:47:44.143Z
Learning: For PR ai-dynamo/dynamo#2989, the ConnectorTransferBatcher architectural issues will be addressed in a follow-up PR by removing the duplicate batching logic and integrating distributed transfers with the existing TransferBatcher + LocalTransferManager pipeline, rather than adding bounded concurrency primitives like Semaphore.

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.

📚 Learning: 2025-10-03T01:53:15.023Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3391
File: tests/fault_tolerance/cancellation/utils.py:323-390
Timestamp: 2025-10-03T01:53:15.023Z
Learning: In `tests/fault_tolerance/cancellation/utils.py`, the `poll_for_pattern` function's default `max_wait_ms` of 500ms is intentionally set to detect failures in cancellation signal propagation to TRT-LLM. This timeout covers only the time for the cancellation signal to reach TRT-LLM (not any generation routine), and if cancellation takes longer than 0.5s to propagate, it should be considered a test failure.

Applied to files:

tests/fault_tolerance/cancellation/test_trtllm.py

📚 Learning: 2025-09-25T00:54:01.369Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: tests/fault_tolerance/cancellation/test_trtllm.py:4-204
Timestamp: 2025-09-25T00:54:01.369Z
Learning: The fault tolerance tests in tests/fault_tolerance/cancellation/ run in a controlled container environment where files written to /workspace are automatically cleaned up after test completion, and tests execute sequentially without concurrency concerns, so temporary file management for config files is not necessary.

Applied to files:

tests/fault_tolerance/cancellation/test_trtllm.py

📚 Learning: 2025-09-25T00:49:16.914Z

Learnt from: kthui
Repo: ai-dynamo/dynamo PR: 3193
File: components/backends/trtllm/src/dynamo/trtllm/request_handlers/handler_base.py:106-147
Timestamp: 2025-09-25T00:49:16.914Z
Learning: In TensorRT-LLM cancellation implementation, there are two distinct request IDs: the external/user-facing request_id from the incoming request, and the internal_request_id from TRT-LLM's generation_result that's needed for executor.abort_request() calls.

Applied to files:

tests/fault_tolerance/cancellation/test_trtllm.py

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: sglang (arm64)
GitHub Check: vllm (amd64)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (1)

tests/fault_tolerance/cancellation/test_trtllm.py (1)

168-168: LGTM! Timeout reductions align with deterministic KV cache allocation.

The timeout adjustments (135s for aggregated, 195s for disaggregated tests) appropriately reflect the performance improvements from switching to exact KV cache sizing. The 3x safety margin over documented average runtimes is reasonable.

Also applies to: 255-255, 339-339, 432-432

tests/fault_tolerance/cancellation/test_trtllm.py

pull-request-size bot added the size/XS label Dec 2, 2025

github-actions bot added the fix label Dec 2, 2025

kthui self-assigned this Dec 2, 2025

kthui marked this pull request as ready for review December 2, 2025 20:43

kthui requested review from a team as code owners December 2, 2025 20:43

kthui enabled auto-merge (squash) December 2, 2025 20:43

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

tanmayv25 approved these changes Dec 2, 2025

View reviewed changes

copy-pr-bot bot had a problem deploying to GITLAB December 2, 2025 22:42 Failure

kthui changed the title ~~fix: Set max_tokens_in_buffer on TRT-LLM cache transceiver~~ fix: TRT-LLM cache transceiver setting issue since 1.2.0rc3 Dec 2, 2025

copy-pr-bot bot had a problem deploying to GITLAB December 2, 2025 23:14 Failure

copy-pr-bot bot temporarily deployed to GITLAB December 2, 2025 23:57 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 3, 2025 00:01 Inactive

kthui force-pushed the jacky-ft-cancel-kv-transfer-trtllm-config branch from 1c1c87c to a37800c Compare December 13, 2025 00:41

pull-request-size bot added size/M and removed size/XS labels Dec 13, 2025

copy-pr-bot bot temporarily deployed to GITLAB December 13, 2025 00:41 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 13, 2025 00:42 Inactive

coderabbitai bot reviewed Dec 13, 2025

View reviewed changes

tests/fault_tolerance/cancellation/test_trtllm.py Outdated Show resolved Hide resolved

copy-pr-bot bot temporarily deployed to GITLAB December 13, 2025 00:52 Inactive

copy-pr-bot bot temporarily deployed to GITLAB December 13, 2025 00:56 Inactive

copy-pr-bot bot had a problem deploying to GITLAB December 15, 2025 20:08 Failure

kthui force-pushed the jacky-ft-cancel-kv-transfer-trtllm-config branch from 9d5a402 to 7f025b1 Compare December 15, 2025 22:10

copy-pr-bot bot had a problem deploying to GITLAB December 15, 2025 22:10 Failure

kthui force-pushed the jacky-ft-cancel-kv-transfer-trtllm-config branch from 7f025b1 to 1abc47f Compare December 15, 2025 22:14

copy-pr-bot bot had a problem deploying to GITLAB December 15, 2025 22:14 Failure

kthui force-pushed the jacky-ft-cancel-kv-transfer-trtllm-config branch from 1abc47f to e9707a4 Compare December 15, 2025 23:16

copy-pr-bot bot had a problem deploying to GITLAB December 15, 2025 23:16 Failure

kthui force-pushed the jacky-ft-cancel-kv-transfer-trtllm-config branch from e9707a4 to c57ef93 Compare December 16, 2025 00:29

copy-pr-bot bot had a problem deploying to GITLAB December 16, 2025 00:29 Failure

kthui added 4 commits December 16, 2025 11:26

fix: Set max_tokens_in_buffer on TRT-LLM cache transceiver

3ae7e13

Signed-off-by: Jacky <[email protected]>

test: Switch from memory utilization to setting kv_cache max_tokens

a177dfa

Signed-off-by: Jacky <[email protected]>

test: Update timeout accounting for reduced KV cache size

b88c58f

Signed-off-by: Jacky <[email protected]>

tmp: Allow kv transfer cancel test to fail on CI

85621a2

Signed-off-by: Jacky <[email protected]>

kthui force-pushed the jacky-ft-cancel-kv-transfer-trtllm-config branch from c57ef93 to 090f987 Compare December 16, 2025 19:26

copy-pr-bot bot had a problem deploying to GITLAB December 16, 2025 19:26 Failure

kthui force-pushed the jacky-ft-cancel-kv-transfer-trtllm-config branch from 090f987 to 85621a2 Compare December 16, 2025 21:22

coderabbitai bot reviewed Dec 16, 2025

View reviewed changes

tests/fault_tolerance/cancellation/test_trtllm.py Show resolved Hide resolved

kthui merged commit 32eaecb into main Dec 16, 2025
29 checks passed

kthui deleted the jacky-ft-cancel-kv-transfer-trtllm-config branch December 16, 2025 22:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: TRT-LLM cache transceiver setting issue since 1.2.0rc3 #4703

fix: TRT-LLM cache transceiver setting issue since 1.2.0rc3 #4703

kthui commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 2, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

kthui commented Dec 13, 2025

Uh oh!

coderabbitai bot commented Dec 13, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

kthui commented Dec 16, 2025

Uh oh!

coderabbitai bot commented Dec 16, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: TRT-LLM cache transceiver setting issue since 1.2.0rc3 #4703

fix: TRT-LLM cache transceiver setting issue since 1.2.0rc3 #4703

Conversation

kthui commented Dec 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

kthui commented Dec 13, 2025

Uh oh!

coderabbitai bot commented Dec 13, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kthui commented Dec 16, 2025

Uh oh!

coderabbitai bot commented Dec 16, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kthui commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Dec 2, 2025 •

edited

Loading