1+ <!--
2+ # Copyright 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+ #
4+ # Redistribution and use in source and binary forms, with or without
5+ # modification, are permitted provided that the following conditions
6+ # are met:
7+ # * Redistributions of source code must retain the above copyright
8+ # notice, this list of conditions and the following disclaimer.
9+ # * Redistributions in binary form must reproduce the above copyright
10+ # notice, this list of conditions and the following disclaimer in the
11+ # documentation and/or other materials provided with the distribution.
12+ # * Neither the name of NVIDIA CORPORATION nor the names of its
13+ # contributors may be used to endorse or promote products derived
14+ # from this software without specific prior written permission.
15+ #
16+ # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+ # EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+ # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+ # PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+ # CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+ # EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+ # PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+ # PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+ # OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+ # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+ # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -->
27+
128# Triton Command Line Interface (Triton CLI)
229> [ !NOTE]
330> Triton CLI is currently in BETA. Its features and functionality are likely
@@ -22,8 +49,8 @@ and running the CLI from within the latest corresponding `tritonserver`
2249container image, which should have all necessary system dependencies installed.
2350
2451For vLLM and TRT-LLM, you can use their respective images:
25- - ` nvcr.io/nvidia/tritonserver:24.10 -vllm-python-py3 `
26- - ` nvcr.io/nvidia/tritonserver:24.10 -trtllm-python-py3 `
52+ - ` nvcr.io/nvidia/tritonserver:25.01 -vllm-python-py3 `
53+ - ` nvcr.io/nvidia/tritonserver:25.01 -trtllm-python-py3 `
2754
2855If you decide to run the CLI on the host or in a custom image, please
2956see this list of [ additional dependencies] ( #additional-dependencies-for-custom-environments )
@@ -38,6 +65,7 @@ matrix below:
3865
3966| Triton CLI Version | TRT-LLM Version | Triton Container Tag |
4067| :------------------:| :---------------:| :--------------------:|
68+ | 0.1.2 | v0.17.0.post1 | 25.01 |
4169| 0.1.1 | v0.14.0 | 24.10 |
4270| 0.1.0 | v0.13.0 | 24.09 |
4371| 0.0.11 | v0.12.0 | 24.08 |
@@ -60,7 +88,7 @@ It is also possible to install from a specific branch name, a commit hash
6088or a tag name. For example to install ` triton_cli ` with a specific tag:
6189
6290``` bash
63- GIT_REF=" 0.1.1 "
91+ GIT_REF=" 0.1.2 "
6492pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
6593```
6694
@@ -95,7 +123,7 @@ triton -h
95123triton import -m gpt2
96124
97125# Start server pointing at the default model repository
98- triton start --image nvcr.io/nvidia/tritonserver:24.10 -vllm-python-py3
126+ triton start --image nvcr.io/nvidia/tritonserver:25.01 -vllm-python-py3
99127
100128# Infer with CLI
101129triton infer -m gpt2 --prompt " machine learning is"
@@ -120,6 +148,12 @@ minutes.
120148> Also, usage of certain restricted models like Llama models requires authentication
121149> in Huggingface through either ` huggingface-cli login ` or setting the ` HF_TOKEN `
122150> environment variable.
151+ >
152+ > If your huggingface cache is not located at ` ${HOME}/.cache/huggingface ` , you could
153+ > set the huggingface cache with
154+ >
155+ > ex: ` export HF_HOME=path/to/your/huggingface/cache `
156+ >
123157
124158### Model Sources
125159
@@ -175,26 +209,26 @@ docker run -ti \
175209 --shm-size=1g --ulimit memlock=-1 \
176210 -v ${HOME} /models:/root/models \
177211 -v ${HOME} /.cache/huggingface:/root/.cache/huggingface \
178- nvcr.io/nvidia/tritonserver:24.10 -vllm-python-py3
212+ nvcr.io/nvidia/tritonserver:25.01 -vllm-python-py3
179213
180214# Install the Triton CLI
181- pip install git+https://github.com/triton-inference-server/
[email protected] .
1 215+ pip install git+https://github.com/triton-inference-server/
[email protected] .
2 182216
183217# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
184218huggingface-cli login
185219
186220# Generate a Triton model repository containing a vLLM model config
187221triton remove -m all
188- triton import -m llama-3-8b-instruct --backend vllm
222+ triton import -m llama-3.1 -8b-instruct --backend vllm
189223
190224# Start Triton pointing at the default model repository
191225triton start
192226
193227# Interact with model
194- triton infer -m llama-3-8b-instruct --prompt " machine learning is"
228+ triton infer -m llama-3.1 -8b-instruct --prompt " machine learning is"
195229
196230# Profile model with GenAI-Perf
197- triton profile -m llama-3-8b-instruct --backend vllm
231+ triton profile -m llama-3.1 -8b-instruct --backend vllm
198232```
199233
200234### Serving a TRT-LLM Model
@@ -240,26 +274,26 @@ docker run -ti \
240274 -v /tmp:/tmp \
241275 -v ${HOME} /models:/root/models \
242276 -v ${HOME} /.cache/huggingface:/root/.cache/huggingface \
243- nvcr.io/nvidia/tritonserver:24.10 -trtllm-python-py3
277+ nvcr.io/nvidia/tritonserver:25.01 -trtllm-python-py3
244278
245279# Install the Triton CLI
246- pip install git+https://github.com/triton-inference-server/
[email protected] .
0 280+ pip install git+https://github.com/triton-inference-server/
[email protected] .
2 247281
248282# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
249283huggingface-cli login
250284
251285# Build TRT LLM engine and generate a Triton model repository pointing at it
252286triton remove -m all
253- triton import -m llama-3-8b-instruct --backend tensorrtllm
287+ triton import -m llama-3.1 -8b-instruct --backend tensorrtllm
254288
255289# Start Triton pointing at the default model repository
256290triton start
257291
258292# Interact with model
259- triton infer -m llama-3-8b-instruct --prompt " machine learning is"
293+ triton infer -m llama-3.1 -8b-instruct --prompt " machine learning is"
260294
261295# Profile model with GenAI-Perf
262- triton profile -m llama-3-8b-instruct --backend tensorrtllm
296+ triton profile -m llama-3.1 -8b-instruct --backend tensorrtllm
263297```
264298## Additional Dependencies for Custom Environments
265299
0 commit comments