You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> When disk offloading is enabled, to extend SSD lifespan, disk offload filtering would be enabled by default. The current policy is only offloading KV blocks from CPU to disk if the blocks have frequency equal or more than `2`. Frequency is determined via doubling on cache hit (init with 1) and decrement by 1 on each time decay step.
63
+
>
64
+
> To disable disk offload filtering, set `DYN_KVBM_DISABLE_DISK_OFFLOAD_FILTER` to true or 1.
65
+
61
66
```bash
62
67
# write an example LLM API config
63
68
# Note: Disable partial reuse "enable_partial_reuse: false" in the LLM API config’s "kv_connector_config" to increase offloading cache hits.
Copy file name to clipboardExpand all lines: docs/guides/run_kvbm_in_vllm.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,6 +61,11 @@ cd $DYNAMO_HOME/components/backends/vllm
61
61
> [!NOTE]
62
62
> `DYN_KVBM_CPU_CACHE_GB` must be set and `DYN_KVBM_DISK_CACHE_GB` is optional.
63
63
64
+
> [!NOTE]
65
+
> When disk offloading is enabled, to extend SSD lifespan, disk offload filtering would be enabled by default. The current policy is only offloading KV blocks from CPU to disk if the blocks have frequency equal or more than `2`. Frequency is determined via doubling on cache hit (init with 1) and decrement by 1 on each time decay step.
66
+
>
67
+
> To disable disk offload filtering, set `DYN_KVBM_DISABLE_DISK_OFFLOAD_FILTER` to true or 1.
68
+
64
69
### Sample Request
65
70
```bash
66
71
# make a request to verify vLLM with KVBM is started up correctly
returnErr("Disk offload filter is not supported.".to_string());
624
+
}
625
+
626
+
let host_is_none = ifletSome(host) = self.host.as_ref(){
627
+
host.is_none()
628
+
}else{
629
+
true
630
+
};
631
+
632
+
if host_is_none {
633
+
tracing::warn!(
634
+
"Host to Disk offload filter is not provided. All blocks in host will be offloaded to disk. This may result in excessive disk offloading and accelerated SSD degradation."
0 commit comments