You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -34,9 +34,11 @@ In addition, it supports a subset of vLLM's Prometheus metrics. These metrics ar
34
34
| vllm:time_to_first_token_seconds| Histogram of time to first token in seconds |
35
35
| vllm:time_per_output_token_seconds| Histogram of time per output token in seconds |
36
36
| vllm:request_generation_tokens| Number of generation tokens processed |
37
+
| vllm:generation_tokens_total| Total number of generated tokens. |
37
38
| vllm:max_num_generation_tokens| Maximum number of requested generation tokens. Currently same as `vllm:request_generation_tokens` since always only one choice is returned |
38
39
| vllm:request_params_max_tokens| Histogram of the max_tokens request parameter |
39
40
| vllm:request_prompt_tokens| Number of prefill tokens processed |
41
+
| vllm:prompt_tokens_total| Total number of prompt tokens processed |
40
42
| vllm:request_success_total| Count of successfully processed requests |
41
43
42
44
The simulated inference has no connection with the model and LoRA adapters specified in the command line parameters or via the /v1/load_lora_adapter HTTP REST endpoint. The /v1/models endpoint returns simulated results based on those same command line parameters and those loaded via the /v1/load_lora_adapter HTTP REST endpoint.
0 commit comments