The code in this repo was used to produce How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.
anyscale/llm-continuous-batching-benchmarks
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
| Name | Name | Last commit date | ||
|---|---|---|---|---|
The code in this repo was used to produce How continuous batching enables 23x throughput in LLM inference while reducing p50 latency.