Skip to content

Commit 31f31e8

Browse files
authored
fix: remove small --gpu-memory-utilization to avoid OOM due to vllm upgrade (#4899)
Signed-off-by: Ziqi Fan <[email protected]>
1 parent 5e96c9a commit 31f31e8

File tree

3 files changed

+0
-10
lines changed

3 files changed

+0
-10
lines changed

examples/backends/vllm/deploy/agg_kvbm.yaml

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,6 @@ spec:
4040
args:
4141
- --model
4242
- Qwen/Qwen3-8B
43-
- --gpu-memory-utilization
44-
- "0.45"
4543
- --max-model-len
4644
- "32000"
4745
- --enforce-eager

examples/backends/vllm/deploy/disagg_kvbm.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,6 @@ spec:
3333
args:
3434
- --model
3535
- Qwen/Qwen3-8B
36-
- --gpu-memory-utilization
37-
- "0.3"
3836
- --max-model-len
3937
- "32000"
4038
- --enforce-eager
@@ -65,8 +63,6 @@ spec:
6563
- --model
6664
- Qwen/Qwen3-8B
6765
- --is-prefill-worker
68-
- --gpu-memory-utilization
69-
- "0.3"
7066
- --max-model-len
7167
- "32000"
7268
- --enforce-eager

examples/backends/vllm/deploy/disagg_kvbm_2p2d.yaml

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,8 +33,6 @@ spec:
3333
args:
3434
- --model
3535
- Qwen/Qwen3-8B
36-
- --gpu-memory-utilization
37-
- "0.3"
3836
- --max-model-len
3937
- "32000"
4038
- --enforce-eager
@@ -65,8 +63,6 @@ spec:
6563
- --model
6664
- Qwen/Qwen3-8B
6765
- --is-prefill-worker
68-
- --gpu-memory-utilization
69-
- "0.3"
7066
- --max-model-len
7167
- "32000"
7268
- --enforce-eager

0 commit comments

Comments
 (0)