Skip to content

Commit c57ef93

Browse files
committed
tmp: Return to setting free gpu mem fraction
Signed-off-by: Jacky <[email protected]>
1 parent e709de8 commit c57ef93

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

tests/fault_tolerance/cancellation/test_trtllm.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@ def __init__(
7272
FAULT_TOLERANCE_MODEL_NAME,
7373
"--disaggregation-mode",
7474
mode,
75+
"--free-gpu-memory-fraction",
76+
"0.2",
7577
"--max-seq-len",
7678
"16384",
7779
"--max-num-tokens",
@@ -85,7 +87,7 @@ def __init__(
8587
"cache_transceiver_config:\n backend: DEFAULT\n max_tokens_in_buffer: 16384\n"
8688
)
8789
f.write("disable_overlap_scheduler: true\n")
88-
f.write("kv_cache_config:\n max_tokens: 16384\n")
90+
# f.write("kv_cache_config:\n max_tokens: 16384\n")
8991
command += [
9092
"--extra-engine-args",
9193
"test_request_cancellation_trtllm_config.yaml",

0 commit comments

Comments
 (0)