Skip to content

Commit e905b95

Browse files
committed
merge: merge main into ryan/kvbm-nova-251204
Merged 238 commits from main branch to bring the feature branch up to date. Key conflicts resolved: - Removed lib/kvbm-kernels references (deleted in main) - Kept nova/nova-backend/kvbm workspace members from feature branch - Maintained v2 module API refactoring from feature branch - Updated Cargo.lock files to reflect new dependencies Major updates from main include: - LoRA support for vLLM (#4810) - Multimodal documentation (#4510) - Scaling adapter features (#4699, #4825) - Tool calling support (#4822, #4722) - NIXL connect improvements (#4433) Signed-off-by: Ryan Olson <[email protected]>
2 parents 2c1af84 + f57fd72 commit e905b95

File tree

661 files changed

+53160
-17859
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

661 files changed

+53160
-17859
lines changed

.devcontainer/README.md

Lines changed: 14 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,24 +1,26 @@
11
<!--
22
SPDX-FileCopyrightText: Copyright (c) 2024-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
33
SPDX-License-Identifier: Apache-2.0
4-
5-
Licensed under the Apache License, Version 2.0 (the "License");
6-
you may not use this file except in compliance with the License.
7-
You may obtain a copy of the License at
8-
9-
http://www.apache.org/licenses/LICENSE-2.0
10-
11-
Unless required by applicable law or agreed to in writing, software
12-
distributed under the License is distributed on an "AS IS" BASIS,
13-
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14-
See the License for the specific language governing permissions and
15-
limitations under the License.
164
-->
175

186
# NVIDIA Dynamo Development Environment
197

208
> Warning: Dev Containers (aka `devcontainers`) is an evolving feature and we are not testing in CI. Please submit any problem/feedback using the issues on GitHub.
219
10+
## Known Issues
11+
12+
### Docker Version Compatibility
13+
14+
**Docker 29.x has compatibility issues with Dev Containers (by Anysphere):**
15+
- It is known that Docker Engine version **29.0.1** (released November 14, 2025) may cause Dev Containers to hang all the time, rendering it unusable
16+
- The Dev Containers extension (v1.0.26) and devcontainer CLI (v0.75.0) were not tested against Docker 29.x
17+
- Symptoms: Container builds successfully but connection hangs, requiring a manual reload to connect
18+
- This may be fixed in a later version of the Anysphere Dev Containers extension and/or Docker Engine patch
19+
20+
**Recommended Docker Version:**
21+
- Use Docker Engine **28.5.2** or earlier for stable Dev Container operation
22+
- Docker 27.x and 28.x series are confirmed working with current Dev Containers tooling
23+
2224
## Framework-Specific Devcontainers
2325

2426
This directory contains framework-specific devcontainer configurations generated from a Jinja2 template:

.devcontainer/post-create.sh

Lines changed: 21 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,9 @@ show_and_run() {
4848
# Run commands with debug output shown
4949
set -x
5050
"$@"
51+
local exit_code=$?
5152
{ set +x; } 2>/dev/null
53+
return $exit_code
5254
}
5355

5456
set -x
@@ -62,28 +64,7 @@ pre-commit run --all-files || true # don't fail the build if pre-commit hooks fa
6264
export CARGO_TARGET_DIR=${CARGO_TARGET_DIR:-$WORKSPACE_DIR/target}
6365
mkdir -p $CARGO_TARGET_DIR
6466

65-
uv pip uninstall --yes ai-dynamo ai-dynamo-runtime 2>/dev/null || true
66-
67-
# Build project, with `dev` profile it will be saved at $CARGO_TARGET_DIR/debug
68-
cargo build --locked --profile dev --features dynamo-llm/block-manager
69-
70-
# install the python bindings
71-
72-
# Install maturin if not already installed.
73-
# TODO: Uncomment for SGLANG. Right now, the CI team is making a refactor
74-
# to the Dockerfile, so this is a temporary fix.
75-
# if ! command -v maturin &> /dev/null; then
76-
# echo "Installing maturin..."
77-
# retry uv pip install maturin[patchelf]
78-
# else
79-
# echo "maturin is already installed"
80-
# fi
81-
82-
# install ai-dynamo-runtime
83-
(cd $WORKSPACE_DIR/lib/bindings/python && retry maturin develop)
84-
85-
# install ai-dynamo
86-
cd $WORKSPACE_DIR && retry env DYNAMO_BIN_PATH=$CARGO_TARGET_DIR/debug uv pip install -e .
67+
# Note: Build steps moved to after sanity check - see instructions at the end
8768

8869
{ set +x; } 2>/dev/null
8970

@@ -114,16 +95,26 @@ else
11495
echo "⚠️ SSH agent forwarding not configured - SSH_AUTH_SOCK is not set"
11596
fi
11697

117-
show_and_run $WORKSPACE_DIR/deploy/sanity_check.py
98+
echo -e "\n🔍 Running sanity check..."
99+
if show_and_run $WORKSPACE_DIR/deploy/sanity_check.py; then
100+
SANITY_STATUS="✅ Sanity check passed"
101+
else
102+
SANITY_STATUS="⚠️ Sanity check failed - review output above"
103+
fi
118104

119105
cat <<EOF
120106
121-
✅ SUCCESS: Built cargo project, installed Python bindings, configured pre-commit hooks
107+
========================================
108+
$SANITY_STATUS
109+
✅ Pre-commit hooks configured
110+
111+
Now build the project:
112+
cargo build --locked --profile dev --features dynamo-llm/block-manager
113+
cd lib/bindings/python && maturin develop --uv
114+
DYNAMO_BIN_PATH=$CARGO_TARGET_DIR/debug uv pip install -e .
115+
116+
Optional: cd lib/bindings/kvbm && maturin develop --uv # For KVBM support
122117
123-
Example commands:
124-
cargo build --locked --profile dev # Build Rust project in $CARGO_TARGET_DIR
125-
cd lib/bindings/python && maturin develop --uv # Update Python bindings (if you changed them)
126-
cargo fmt && cargo clippy # Format and lint code before committing
127-
cargo doc --no-deps # Generate documentation
128-
uv pip install -e . # Install various Python packages Dynamo depends on
118+
If cargo build fails with a Cargo.lock error, try to update it with 'cargo update'
119+
========================================
129120
EOF

.github/actions/docker-build/action.yml

Lines changed: 151 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,6 @@ inputs:
1616
image_tag:
1717
description: 'Custom image tag (optional, defaults to framework:latest)'
1818
required: false
19-
ngc_ci_access_token:
20-
description: 'NGC CI Access Token'
21-
required: false
2219
ci_token:
2320
description: 'CI Token'
2421
required: false
@@ -49,6 +46,12 @@ inputs:
4946
torch_backend:
5047
description: 'Optional override for TORCH_BACKEND build-arg (e.g., cu129)'
5148
required: false
49+
enable_kvbm:
50+
description: 'Enable KVBM support (optional)'
51+
required: false
52+
dynamo_base_image:
53+
description: 'Pre-built Dynamo base image to use instead of building from scratch'
54+
required: false
5255

5356
outputs:
5457
image_tag:
@@ -61,18 +64,9 @@ runs:
6164
- name: Set up Docker Buildx
6265
uses: docker/setup-buildx-action@e468171a9de216ec08956ac3ada2f0791b6bd435 #v3.11.1
6366
with:
64-
driver: docker
65-
- name: Login to ECR
66-
shell: bash
67-
env:
68-
ECR_HOSTNAME: ${{ inputs.aws_account_id }}.dkr.ecr.${{ inputs.aws_default_region }}.amazonaws.com
69-
run: |
70-
aws ecr get-login-password --region ${{ inputs.aws_default_region }} | docker login --username AWS --password-stdin ${ECR_HOSTNAME}
71-
- name: Login to NGC
72-
if: github.event.pull_request.head.repo.full_name == github.repository || github.event_name == 'push'
73-
shell: bash
74-
run: |
75-
echo "${{ inputs.ngc_ci_access_token }}" | docker login nvcr.io -u '$oauthtoken' --password-stdin
67+
driver: docker-container
68+
# Enable BuildKit for enhanced metadata
69+
buildkitd-flags: --debug
7670
- name: Cleanup
7771
if: always()
7872
shell: bash
@@ -88,7 +82,12 @@ runs:
8882
AWS_ACCESS_KEY_ID: ${{ inputs.aws_access_key_id }}
8983
AWS_SECRET_ACCESS_KEY: ${{ inputs.aws_secret_access_key }}
9084
PLATFORM: ${{ inputs.platform }}
85+
ECR_HOSTNAME: ${{ inputs.aws_account_id }}.dkr.ecr.${{ inputs.aws_default_region }}.amazonaws.com
86+
GITHUB_RUN_ID: ${{ github.run_id }}
87+
GITHUB_JOB: ${{ github.job }}
88+
GITHUB_REF_NAME: ${{ github.ref_name }}
9189
run: |
90+
set -x
9291
# Determine image tag
9392
if [ -n "${{ inputs.image_tag }}" ]; then
9493
IMAGE_TAG="${{ inputs.image_tag }}"
@@ -97,43 +96,89 @@ runs:
9796
fi
9897
9998
BUILD_START_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
100-
echo "🕐 Build started at: ${BUILD_START_TIME}"
10199
echo "BUILD_START_TIME=${BUILD_START_TIME}" >> $GITHUB_ENV
102100
103101
echo "image_tag=$IMAGE_TAG" >> $GITHUB_OUTPUT
102+
103+
# Create build logs directory
104+
mkdir -p build-logs
105+
BUILD_LOG_FILE="build-logs/build-${{ inputs.framework }}-$(echo '${{ inputs.platform }}' | sed 's/linux\///').log"
106+
echo "BUILD_LOG_FILE=${BUILD_LOG_FILE}" >> $GITHUB_ENV
107+
echo "📝 Build log will be saved to: ${BUILD_LOG_FILE}"
108+
104109
# Collect optional overrides provided by the workflow
110+
# Set base cache args and set --cache-to if this is a main commit
105111
EXTRA_ARGS=""
112+
EXTRA_ARGS="--cache-to type=inline "
113+
EXTRA_ARGS+="--cache-from type=registry,ref=${ECR_HOSTNAME}/ai-dynamo/dynamo:${{ inputs.framework }}-${PLATFORM##*/}-cache "
114+
EXTRA_ARGS+="--cache-from type=registry,ref=${ECR_HOSTNAME}/ai-dynamo/dynamo:main-${{ inputs.framework }}-${PLATFORM##*/} "
115+
if [[ "$GITHUB_REF_NAME" == "main" ]]; then
116+
EXTRA_ARGS+="--cache-to type=registry,ref=${ECR_HOSTNAME}/ai-dynamo/dynamo:${{ inputs.framework }}-${PLATFORM##*/}-cache,mode=max "
117+
fi
118+
119+
echo "$EXTRA_ARGS"
120+
# Collect optional overrides provided by the workflow
106121
if [ -n "${{ inputs.base_image_tag }}" ]; then
107-
EXTRA_ARGS+=" --base-image-tag ${{ inputs.base_image_tag }}"
122+
EXTRA_ARGS+="--base-image-tag ${{ inputs.base_image_tag }} "
108123
fi
109124
if [ -n "${{ inputs.runtime_image_tag }}" ]; then
110-
EXTRA_ARGS+=" --build-arg RUNTIME_IMAGE_TAG=${{ inputs.runtime_image_tag }}"
125+
EXTRA_ARGS+="--build-arg RUNTIME_IMAGE_TAG=${{ inputs.runtime_image_tag }} "
111126
fi
112127
if [ -n "${{ inputs.cuda_version }}" ]; then
113-
EXTRA_ARGS+=" --build-arg CUDA_VERSION=${{ inputs.cuda_version }}"
128+
EXTRA_ARGS+="--build-arg CUDA_VERSION=${{ inputs.cuda_version }} "
114129
fi
115130
if [ -n "${{ inputs.torch_backend }}" ]; then
116-
EXTRA_ARGS+=" --build-arg TORCH_BACKEND=${{ inputs.torch_backend }}"
131+
EXTRA_ARGS+="--build-arg TORCH_BACKEND=${{ inputs.torch_backend }} "
132+
fi
133+
if [ -n "${{ inputs.dynamo_base_image }}" ]; then
134+
EXTRA_ARGS+=" --dynamo-base-image ${{ inputs.dynamo_base_image }}"
135+
fi
136+
if [ -n "${{ inputs.enable_kvbm }}" ]; then
137+
EXTRA_ARGS+=" --build-arg ENABLE_KVBM=${{ inputs.enable_kvbm }}"
117138
fi
118139
140+
# Execute build and capture output (show on console AND save to file)
119141
./container/build.sh --tag "$IMAGE_TAG" \
120142
--target ${{ inputs.target }} \
121143
--vllm-max-jobs 10 \
122144
--framework ${{ inputs.framework }} \
123145
--platform ${{ inputs.platform }} \
124146
--use-sccache \
125147
--sccache-bucket "$SCCACHE_S3_BUCKET" \
126-
--sccache-region "$AWS_DEFAULT_REGION" $EXTRA_ARGS
148+
--sccache-region "$AWS_DEFAULT_REGION" $EXTRA_ARGS 2>&1 | tee "${BUILD_LOG_FILE}"
149+
150+
BUILD_EXIT_CODE=${PIPESTATUS[0]}
127151
128152
BUILD_END_TIME=$(date -u +%Y-%m-%dT%H:%M:%SZ)
129-
echo "🕐 Build ended at: ${BUILD_END_TIME}"
130153
echo "BUILD_END_TIME=${BUILD_END_TIME}" >> $GITHUB_ENV
131154
155+
# Exit with the build's exit code
156+
exit ${BUILD_EXIT_CODE}
157+
158+
- name: Run Sanity Check on Runtime Image
159+
if: inputs.target == 'runtime'
160+
shell: bash
161+
run: |
162+
IMAGE_TAG="${{ steps.build.outputs.image_tag }}"
163+
echo "Running sanity check on image: $IMAGE_TAG"
164+
165+
# Run the sanity check script inside the container
166+
# The script is located in /workspace/deploy/sanity_check.py in runtime containers
167+
set +e
168+
docker run --rm "$IMAGE_TAG" python /workspace/deploy/sanity_check.py --runtime-check --no-gpu-check
169+
SANITY_CHECK_EXIT_CODE=$?
170+
set -e
171+
if [ ${SANITY_CHECK_EXIT_CODE} -ne 0 ]; then
172+
echo "ERROR: Sanity check failed - ai-dynamo packages not properly installed"
173+
exit ${SANITY_CHECK_EXIT_CODE}
174+
else
175+
echo "✅ Sanity check passed"
176+
fi
177+
132178
- name: Capture Build Metrics
133179
id: metrics
134180
shell: bash
135181
run: |
136-
echo "📊 Capturing build metrics for ${{ inputs.framework }}..."
137182
138183
# Create metrics directory
139184
mkdir -p build-metrics
@@ -162,21 +207,17 @@ runs:
162207
echo "⚠️ No image tag available"
163208
fi
164209
165-
echo "📊 Final metrics captured"
166-
167-
# Create consolidated metrics JSON file
168-
echo "🔍 Debug: inputs.platform = '${{ inputs.platform }}'"
169210
PLATFORM_ARCH=$(echo "${{ inputs.platform }}" | sed 's/linux\///')
170-
echo "🔍 Debug: PLATFORM_ARCH = '${PLATFORM_ARCH}'"
211+
echo " Architecture: ${PLATFORM_ARCH}"
171212
echo "PLATFORM_ARCH=${PLATFORM_ARCH}" >> $GITHUB_ENV
172213
JOB_KEY="${{ inputs.framework }}-${PLATFORM_ARCH}"
173-
echo "🔍 Debug: JOB_KEY = '${JOB_KEY}'"
214+
echo " Job Key: ${JOB_KEY}"
174215
175216
# Create job-specific metrics file
176217
mkdir -p build-metrics
177-
METRICS_FILE="build-metrics/metrics-${{ inputs.framework }}-${PLATFORM_ARCH}.json"
218+
METRICS_FILE="build-metrics/metrics-${{ inputs.framework }}-${PLATFORM_ARCH}-${{ github.run_id }}-${{ job.check_run_id }}.json"
178219
179-
# Create the job metrics file directly
220+
# Create the job metrics file
180221
cat > "$METRICS_FILE" << EOF
181222
{
182223
"framework": "${{ inputs.framework }}",
@@ -190,15 +231,88 @@ runs:
190231
}
191232
EOF
192233
193-
echo "📁 Created build metrics file for ${JOB_KEY}:"
194234
cat "$METRICS_FILE"
195235
196-
# Metrics captured and saved to JSON file
236+
- name: Generate Comprehensive Build Metrics
237+
id: comprehensive-metrics
238+
if: always()
239+
shell: bash
240+
run: |
241+
echo "=========================================="
242+
echo "📊 GENERATING COMPREHENSIVE BUILD METRICS"
243+
echo "=========================================="
244+
245+
# Create metrics directory
246+
mkdir -p build-metrics
247+
248+
PLATFORM_ARCH="${{ env.PLATFORM_ARCH }}"
249+
WORKFLOW_ID="${{ github.run_id }}"
250+
JOB_ID="${{ job.check_run_id }}"
251+
FRAMEWORK_LOWER=$(echo "${{ inputs.framework }}" | tr '[:upper:]' '[:lower:]')
252+
253+
# Make parser executable
254+
chmod +x .github/scripts/parse_buildkit_output.py
255+
256+
# Check for build logs and build stage arguments dynamically
257+
BUILD_LOG="build-logs/single-stage-build.log"
258+
259+
# Path to container metadata created in previous step
260+
CONTAINER_METADATA="build-metrics/metrics-${{ inputs.framework }}-${PLATFORM_ARCH}-${WORKFLOW_ID}-${JOB_ID}.json"
261+
262+
# Output single comprehensive JSON with all build stages
263+
COMPREHENSIVE_JSON="build-metrics/build-${{ inputs.framework }}-${PLATFORM_ARCH}-${WORKFLOW_ID}-${JOB_ID}.json"
264+
265+
echo "🚀 Parsing BuildKit outputs and merging with container metrics..."
197266
198-
# Upload job-specific build metrics as artifact
199-
- name: Upload Build Metrics
267+
# Build stage arguments dynamically based on which logs exist
268+
STAGE_ARGS=()
269+
270+
if [ -f "$BUILD_LOG" ]; then
271+
echo " ✓ Found base image log: ${BUILD_LOG}"
272+
STAGE_ARGS+=("runtime:${BUILD_LOG}")
273+
else
274+
echo " ℹ️ No image log found"
275+
fi
276+
277+
# Check for any additional stage logs (e.g., build-logs/stage3-*.log)
278+
for extra_log in build-logs/stage*.log; do
279+
if [ -f "$extra_log" ]; then
280+
stage_name=$(basename "$extra_log" .log)
281+
echo " ✓ Found additional stage log: ${extra_log} (${stage_name})"
282+
STAGE_ARGS+=("${stage_name}:${extra_log}")
283+
fi
284+
done
285+
286+
echo "Container Metadata: ${CONTAINER_METADATA}"
287+
echo "Output: ${COMPREHENSIVE_JSON}"
288+
echo ""
289+
290+
# Run parser with all discovered stages
291+
# Usage: parse_buildkit_output.py <output_json> <stage1_name:log_file> [stage2_name:log_file] ... [--metadata=<file>]
292+
set +e
293+
python3 .github/scripts/parse_buildkit_output.py \
294+
"$COMPREHENSIVE_JSON" \
295+
"${STAGE_ARGS[@]}" \
296+
"--metadata=${CONTAINER_METADATA}"
297+
PARSER_EXIT_CODE=$?
298+
set -e
299+
300+
echo ""
301+
echo "📊 Parser exit code: ${PARSER_EXIT_CODE}"
302+
303+
if [ ${PARSER_EXIT_CODE} -eq 0 ] && [ -f "$COMPREHENSIVE_JSON" ]; then
304+
echo "✅ Comprehensive build metrics generated successfully"
305+
echo "📄 Output file: ${COMPREHENSIVE_JSON}"
306+
else
307+
echo "⚠️ Metrics generation had issues but continuing..."
308+
fi
309+
310+
# Upload comprehensive build metrics as artifact
311+
- name: Upload Comprehensive Build Metrics
200312
uses: actions/upload-artifact@v4
313+
if: always()
201314
with:
202-
name: build-metrics-${{ inputs.framework }}-${{ env.PLATFORM_ARCH }}
203-
path: build-metrics/metrics-${{ inputs.framework }}-${{ env.PLATFORM_ARCH }}.json
315+
name: build-metrics-${{ inputs.framework }}-${{ inputs.target }}-${{ env.PLATFORM_ARCH }}-${{ github.run_id }}-${{ job.check_run_id }}
316+
path: build-metrics/build-${{ inputs.framework }}-${{ env.PLATFORM_ARCH }}-${{ github.run_id }}-${{ job.check_run_id }}.json
204317
retention-days: 7
318+

0 commit comments

Comments
 (0)