@@ -22,8 +22,8 @@ and running the CLI from within the latest corresponding `tritonserver`
2222container image, which should have all necessary system dependencies installed.
2323
2424For vLLM and TRT-LLM, you can use their respective images:
25- - ` nvcr.io/nvidia/tritonserver:24.04 -vllm-python-py3 `
26- - ` nvcr.io/nvidia/tritonserver:24.04 -trtllm-python-py3 `
25+ - ` nvcr.io/nvidia/tritonserver:24.05 -vllm-python-py3 `
26+ - ` nvcr.io/nvidia/tritonserver:24.05 -trtllm-python-py3 `
2727
2828If you decide to run the CLI on the host or in a custom image, please
2929see this list of [ additional dependencies] ( #additional-dependencies-for-custom-environments )
@@ -38,6 +38,7 @@ matrix below:
3838
3939| Triton CLI Version | TRT-LLM Version | Triton Container Tag |
4040| :------------------:| :---------------:| :--------------------:|
41+ | 0.0.8 | v0.9.0 | 24.05 |
4142| 0.0.7 | v0.9.0 | 24.04 |
4243| 0.0.6 | v0.8.0 | 24.02, 24.03 |
4344| 0.0.5 | v0.7.1 | 24.01 |
@@ -51,10 +52,10 @@ pip install git+https://github.com/triton-inference-server/triton_cli.git
5152```
5253
5354It is also possible to install from a specific branch name, a commit hash
54- or a tag name. For example to install ` triton_cli ` with tag 0.0.7 :
55+ or a tag name. For example to install ` triton_cli ` with a specific tag :
5556
5657``` bash
57- GIT_REF=" 0.0.7 "
58+ GIT_REF=" 0.0.8 "
5859pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
5960```
6061
@@ -89,7 +90,7 @@ triton -h
8990triton import -m gpt2
9091
9192# Start server pointing at the default model repository
92- triton start --image nvcr.io/nvidia/tritonserver:24.04 -vllm-python-py3
93+ triton start --image nvcr.io/nvidia/tritonserver:24.05 -vllm-python-py3
9394
9495# Infer with CLI
9596triton infer -m gpt2 --prompt " machine learning is"
@@ -143,11 +144,10 @@ docker run -ti \
143144 --shm-size=1g --ulimit memlock=-1 \
144145 -v ${HOME} /models:/root/models \
145146 -v ${HOME} /.cache/huggingface:/root/.cache/huggingface \
146- nvcr.io/nvidia/tritonserver:24.04 -vllm-python-py3
147+ nvcr.io/nvidia/tritonserver:24.05 -vllm-python-py3
147148
148149# Install the Triton CLI
149- GIT_REF=" 0.0.7"
150- pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
150+ pip install git+https://github.com/triton-inference-server/
[email protected] 151151
152152# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
153153huggingface-cli login
@@ -213,11 +213,10 @@ docker run -ti \
213213 -v /tmp:/tmp \
214214 -v ${HOME} /models:/root/models \
215215 -v ${HOME} /.cache/huggingface:/root/.cache/huggingface \
216- nvcr.io/nvidia/tritonserver:24.04 -trtllm-python-py3
216+ nvcr.io/nvidia/tritonserver:24.05 -trtllm-python-py3
217217
218218# Install the Triton CLI
219- GIT_REF=" 0.0.7"
220- pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
219+ pip install git+https://github.com/triton-inference-server/
[email protected] 221220
222221# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
223222huggingface-cli login
0 commit comments