Skip to content

Commit 8f577d3

Browse files
authored
build: Update CLI version references to 0.0.8 and Triton references to 24.05 (#72)
1 parent db70fcb commit 8f577d3

File tree

5 files changed

+17
-28
lines changed

5 files changed

+17
-28
lines changed

.github/workflows/python-package.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ jobs:
3636
build:
3737
runs-on: ${{ matrix.os }}
3838
container:
39-
image: nvcr.io/nvidia/tritonserver:24.04-py3
39+
image: nvcr.io/nvidia/tritonserver:24.05-py3
4040
strategy:
4141
fail-fast: false
4242
matrix:

README.md

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,8 +22,8 @@ and running the CLI from within the latest corresponding `tritonserver`
2222
container image, which should have all necessary system dependencies installed.
2323

2424
For vLLM and TRT-LLM, you can use their respective images:
25-
- `nvcr.io/nvidia/tritonserver:24.04-vllm-python-py3`
26-
- `nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3`
25+
- `nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3`
26+
- `nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3`
2727

2828
If you decide to run the CLI on the host or in a custom image, please
2929
see this list of [additional dependencies](#additional-dependencies-for-custom-environments)
@@ -38,6 +38,7 @@ matrix below:
3838

3939
| Triton CLI Version | TRT-LLM Version | Triton Container Tag |
4040
|:------------------:|:---------------:|:--------------------:|
41+
| 0.0.8 | v0.9.0 | 24.05 |
4142
| 0.0.7 | v0.9.0 | 24.04 |
4243
| 0.0.6 | v0.8.0 | 24.02, 24.03 |
4344
| 0.0.5 | v0.7.1 | 24.01 |
@@ -51,10 +52,10 @@ pip install git+https://github.com/triton-inference-server/triton_cli.git
5152
```
5253

5354
It is also possible to install from a specific branch name, a commit hash
54-
or a tag name. For example to install `triton_cli` with tag 0.0.7:
55+
or a tag name. For example to install `triton_cli` with a specific tag:
5556

5657
```bash
57-
GIT_REF="0.0.7"
58+
GIT_REF="0.0.8"
5859
pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
5960
```
6061

@@ -89,7 +90,7 @@ triton -h
8990
triton import -m gpt2
9091

9192
# Start server pointing at the default model repository
92-
triton start --image nvcr.io/nvidia/tritonserver:24.04-vllm-python-py3
93+
triton start --image nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3
9394

9495
# Infer with CLI
9596
triton infer -m gpt2 --prompt "machine learning is"
@@ -143,11 +144,10 @@ docker run -ti \
143144
--shm-size=1g --ulimit memlock=-1 \
144145
-v ${HOME}/models:/root/models \
145146
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
146-
nvcr.io/nvidia/tritonserver:24.04-vllm-python-py3
147+
nvcr.io/nvidia/tritonserver:24.05-vllm-python-py3
147148

148149
# Install the Triton CLI
149-
GIT_REF="0.0.7"
150-
pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
150+
pip install git+https://github.com/triton-inference-server/[email protected]
151151

152152
# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
153153
huggingface-cli login
@@ -213,11 +213,10 @@ docker run -ti \
213213
-v /tmp:/tmp \
214214
-v ${HOME}/models:/root/models \
215215
-v ${HOME}/.cache/huggingface:/root/.cache/huggingface \
216-
nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
216+
nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
217217

218218
# Install the Triton CLI
219-
GIT_REF="0.0.7"
220-
pip install git+https://github.com/triton-inference-server/triton_cli.git@${GIT_REF}
219+
pip install git+https://github.com/triton-inference-server/[email protected]
221220

222221
# Authenticate with huggingface for restricted models like Llama-2 and Llama-3
223222
huggingface-cli login

pyproject.toml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,8 @@ dependencies = [
5858
"psutil >= 5.9.5", # may remove later
5959
"rich == 13.5.2",
6060
# TODO: Test on cpu-only machine if [cuda] dependency is an issue,
61-
"tritonclient[all] >= 2.46",
61+
# Use explicit client version matching genai-perf version for tagged release
62+
"tritonclient[all] == 2.46",
6263
"huggingface-hub >= 0.19.4",
6364
# Testing
6465
"pytest >= 8.1.1", # may remove later

src/triton_cli/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,4 +24,4 @@
2424
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
2525
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
2626

27-
__version__ = "0.0.8dev"
27+
__version__ = "0.0.8"

src/triton_cli/docker/Dockerfile

Lines changed: 3 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,9 @@
1-
FROM nvcr.io/nvidia/tritonserver:24.04-trtllm-python-py3
1+
# TRT-LLM image contains engine building and runtime dependencies
2+
FROM nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
23

34
# Setup vLLM Triton backend
45
RUN mkdir -p /opt/tritonserver/backends/vllm && \
56
wget -P /opt/tritonserver/backends/vllm https://raw.githubusercontent.com/triton-inference-server/vllm_backend/main/src/model.py
67

7-
# TRT-LLM engine build dependencies
8-
# NOTE: torch 2.2.0 has a symbol conflict, so WAR is to install 2.1.2
9-
RUN pip install \
10-
"psutil" \
11-
"pynvml>=11.5.0" \
12-
--extra-index-url https://pypi.nvidia.com/ "tensorrt-llm==0.9.0"
13-
148
# vLLM runtime dependencies
15-
RUN pip install \
16-
# Triton 24.04 vLLM containers comes with "vllm==0.4.0.post1", but this has
17-
# incompatible dependencies with trtllm==0.9.0 around torch and transformers.
18-
"vllm==0.4.1"
19-
20-
# TODO: Install Triton CLI in this image
9+
RUN pip install "vllm==0.4.3"

0 commit comments

Comments
 (0)