@@ -49,8 +49,8 @@ and running the CLI from within the latest corresponding `tritonserver`
4949container image, which should have all necessary system dependencies installed.
5050
5151For vLLM and TRT-LLM, you can use their respective images:
52- - ` nvcr.io/nvidia/tritonserver:25.01 -vllm-python-py3 `
53- - ` nvcr.io/nvidia/tritonserver:25.01 -trtllm-python-py3 `
52+ - ` nvcr.io/nvidia/tritonserver:25.02 -vllm-python-py3 `
53+ - ` nvcr.io/nvidia/tritonserver:25.02 -trtllm-python-py3 `
5454
5555If you decide to run the CLI on the host or in a custom image, please
5656see this list of [ additional dependencies] ( #additional-dependencies-for-custom-environments )
@@ -65,6 +65,7 @@ matrix below:
6565
6666| Triton CLI Version | TRT-LLM Version | Triton Container Tag |
6767| :------------------:| :---------------:| :--------------------:|
68+ | 0.1.3 | v0.17.0.post1 | 25.02 |
6869| 0.1.2 | v0.17.0.post1 | 25.01 |
6970| 0.1.1 | v0.14.0 | 24.10 |
7071| 0.1.0 | v0.13.0 | 24.09 |
@@ -88,7 +89,7 @@ It is also possible to install from a specific branch name, a commit hash
8889or a tag name. For example to install ` triton_cli ` with a specific tag:
8990
9091``` bash
91- GIT_REF=" 0.1.2 "
92+ GIT_REF=" 0.1.3 "
9293pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
9394```
9495
@@ -123,7 +124,7 @@ triton -h
123124triton import -m gpt2
124125
125126# Start server pointing at the default model repository
126- triton start --image nvcr.io/nvidia/tritonserver:25.01 -vllm-python-py3
127+ triton start --image nvcr.io/nvidia/tritonserver:25.02 -vllm-python-py3
127128
128129# Infer with CLI
129130triton infer -m gpt2 --prompt " machine learning is"
@@ -209,10 +210,10 @@ docker run -ti \
209210 --shm-size=1g --ulimit memlock=-1 \
210211 -v ${HOME} /models:/root/models \
211212 -v ${HOME} /.cache/huggingface:/root/.cache/huggingface \
212- nvcr.io/nvidia/tritonserver:25.01 -vllm-python-py3
213+ nvcr.io/nvidia/tritonserver:25.02 -vllm-python-py3
213214
214215# Install the Triton CLI
215- pip install git+https://github.com/triton-inference-server/
[email protected] .
2 216+ pip install git+https://github.com/triton-inference-server/
[email protected] .
3 216217
217218# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
218219huggingface-cli login
@@ -277,7 +278,7 @@ docker run -ti \
277278 nvcr.io/nvidia/tritonserver:25.01-trtllm-python-py3
278279
279280# Install the Triton CLI
280- pip install git+https://github.com/triton-inference-server/
[email protected] .
2 281+ pip install git+https://github.com/triton-inference-server/
[email protected] .
3 281282
282283# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
283284huggingface-cli login
@@ -331,10 +332,10 @@ docker run -ti \
331332 -v /tmp:/tmp \
332333 -v ${HOME} /models:/root/models \
333334 -v ${HOME} /.cache/huggingface:/root/.cache/huggingface \
334- nvcr.io/nvidia/tritonserver:25.01 -trtllm-python-py3
335+ nvcr.io/nvidia/tritonserver:25.02 -trtllm-python-py3
335336
336337# Install the Triton CLI
337- pip install git+https://github.com/triton-inference-server/triton_cli.git@main
338+ pip install git+https://github.com/triton-inference-server/triton_cli.git@0.1.3
338339
339340# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
340341huggingface-cli login
0 commit comments