@@ -22,8 +22,8 @@ and running the CLI from within the latest corresponding `tritonserver`
2222container image, which should have all necessary system dependencies installed.
2323
2424For vLLM and TRT-LLM, you can use their respective images:
25- - ` nvcr.io/nvidia/tritonserver:24.05 -vllm-python-py3 `
26- - ` nvcr.io/nvidia/tritonserver:24.05 -trtllm-python-py3 `
25+ - ` nvcr.io/nvidia/tritonserver:24.06 -vllm-python-py3 `
26+ - ` nvcr.io/nvidia/tritonserver:24.06 -trtllm-python-py3 `
2727
2828If you decide to run the CLI on the host or in a custom image, please
2929see this list of [ additional dependencies] ( #additional-dependencies-for-custom-environments )
@@ -38,6 +38,7 @@ matrix below:
3838
3939| Triton CLI Version | TRT-LLM Version | Triton Container Tag |
4040| :------------------:| :---------------:| :--------------------:|
41+ | 0.0.9 | v0.10.0 | 24.06 |
4142| 0.0.8 | v0.9.0 | 24.05 |
4243| 0.0.7 | v0.9.0 | 24.04 |
4344| 0.0.6 | v0.8.0 | 24.02, 24.03 |
@@ -55,7 +56,7 @@ It is also possible to install from a specific branch name, a commit hash
5556or a tag name. For example to install ` triton_cli ` with a specific tag:
5657
5758``` bash
58- GIT_REF=" 0.0.8 "
59+ GIT_REF=" 0.0.9 "
5960pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
6061```
6162
@@ -90,7 +91,7 @@ triton -h
9091triton import -m gpt2
9192
9293# Start server pointing at the default model repository
93- triton start --image nvcr.io/nvidia/tritonserver:24.05 -vllm-python-py3
94+ triton start --image nvcr.io/nvidia/tritonserver:24.06 -vllm-python-py3
9495
9596# Infer with CLI
9697triton infer -m gpt2 --prompt " machine learning is"
@@ -144,10 +145,10 @@ docker run -ti \
144145 --shm-size=1g --ulimit memlock=-1 \
145146 -v ${HOME} /models:/root/models \
146147 -v ${HOME} /.cache/huggingface:/root/.cache/huggingface \
147- nvcr.io/nvidia/tritonserver:24.05 -vllm-python-py3
148+ nvcr.io/nvidia/tritonserver:24.06 -vllm-python-py3
148149
149150# Install the Triton CLI
150- pip install git+https://github.com/triton-inference-server/
[email protected] .
8 151+ pip install git+https://github.com/triton-inference-server/
[email protected] .
9 151152
152153# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
153154huggingface-cli login
@@ -213,10 +214,10 @@ docker run -ti \
213214 -v /tmp:/tmp \
214215 -v ${HOME} /models:/root/models \
215216 -v ${HOME} /.cache/huggingface:/root/.cache/huggingface \
216- nvcr.io/nvidia/tritonserver:24.05 -trtllm-python-py3
217+ nvcr.io/nvidia/tritonserver:24.06 -trtllm-python-py3
217218
218219# Install the Triton CLI
219- pip install git+https://github.com/triton-inference-server/
[email protected] .
8 220+ pip install git+https://github.com/triton-inference-server/
[email protected] .
9 220221
221222# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
222223huggingface-cli login
0 commit comments