Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
62 commits
Select commit Hold shift + click to select a range
6c038f9
Add modelopt/torch/_compress CODEOWNERS
kevalmorabia97 Oct 27, 2025
230cee1
Merge branch 'main' into feature/compress
kevalmorabia97 Oct 27, 2025
54c5f0f
Remove llm_ptq example tests from CICD
kevalmorabia97 Oct 27, 2025
9eeee25
E2E test for the experimental compress algorithm based on https://arx…
danielkorzekwa Oct 28, 2025
ad1d18e
Merge branch 'main' into feature/compress
kevalmorabia97 Oct 28, 2025
cef3655
Add convert_llama3_config_to_decilm_config + unit test (#465)
danielkorzekwa Oct 29, 2025
002b8b5
Implement nas.convert() api for the compress algorithm (#482)
danielkorzekwa Oct 31, 2025
1c12fd8
modelopt nas search() implementation for the compress algorithm (#490)
danielkorzekwa Nov 3, 2025
f7d547f
Add decilm modelling code (#505)
danielkorzekwa Nov 12, 2025
50a580c
Compress tutorial (PoC) (#492)
danielkorzekwa Nov 12, 2025
b121945
Add llama converter (no dependency on internal Nvidia code) - part 1/…
danielkorzekwa Nov 13, 2025
866e400
llama converter is self-contained now (no dependency on internal nvid…
danielkorzekwa Nov 14, 2025
0868f1c
Add integration test for attention pruning (#562)
danielkorzekwa Nov 14, 2025
69726cc
Merge branch 'main' into feature/compress
kevalmorabia97 Nov 15, 2025
07ca24d
Merge branch 'main' into feature/compress
kevalmorabia97 Nov 15, 2025
1dde209
Add score_pruning_activations (step 2/6) (#563)
danielkorzekwa Nov 18, 2025
2e559e7
Update README.md
kevalmorabia97 Nov 18, 2025
f10be0d
Add activation hooks used for pruning (#576)
danielkorzekwa Nov 20, 2025
194b532
Add sewing kit and utilities used for pruning scoring - pruning scori…
danielkorzekwa Nov 24, 2025
8c9cdd4
Add L2NormHook and use it in megatron.py (#599)
danielkorzekwa Nov 26, 2025
1f72466
Add pruning checkpoints for the compress algorithm (#607)
danielkorzekwa Nov 27, 2025
97fe7f0
Add build replacement library to the compress algorithm. (#616)
danielkorzekwa Dec 1, 2025
954103e
Add subblock stats to the compress algorithm (#623)
danielkorzekwa Dec 1, 2025
dcc425f
Add 1-block scoring to the compress algorithm (#625)
danielkorzekwa Dec 2, 2025
56d95de
Add checkpoint save/load to ForwardHook + add IterativeChannelContrib…
danielkorzekwa Dec 2, 2025
74aae83
Add MIP step to the compress algorithm (#627)
danielkorzekwa Dec 4, 2025
a1f63bc
Merge branch 'main' into feature/compress
kevalmorabia97 Dec 8, 2025
a99f503
Remove unused mip functions + fix multi-gpu test (#660)
kevalmorabia97 Dec 8, 2025
67489f4
Fix a bug in IterativeChannelContributionHook + tools for activation …
danielkorzekwa Dec 11, 2025
1d8bd20
Remove runtime.py and directly use torch dist utils + remove unused f…
kevalmorabia97 Dec 11, 2025
f7a0cb0
Use shared activation hooks component in the puzzle algorithm (#687)
danielkorzekwa Dec 17, 2025
db866d9
Clean up Puzzle Compress Tutorial (#711)
LianaMikael Dec 22, 2025
2e813bf
Two bug fixes: mix checkpointing and dtype (#718)
danielkorzekwa Dec 22, 2025
83ac3b1
Merge remote-tracking branch 'origin/main' into feature/compress
kevalmorabia97 Jan 13, 2026
0eecfc6
Fix test assertions for 2-gpu (#772)
kevalmorabia97 Jan 13, 2026
43b3cfa
Rename compress to puzzletron (#776)
kevalmorabia97 Jan 14, 2026
4c30bd5
Add NeMo Conversion Scripts to Puzzletron (#784)
LianaMikael Jan 15, 2026
96bb0ba
Merge branch 'main' into feature/compress
kevalmorabia97 Mar 3, 2026
8c84fee
[CI] Update to only run puzzletron tests
kevalmorabia97 Mar 3, 2026
5812777
Merge branch 'main' into feature/puzzletron
kevalmorabia97 Mar 3, 2026
5f77c81
Pin torchprofile==0.0.4 to fix CI
kevalmorabia97 Mar 10, 2026
82df595
Add anymodel-core to feature/puzzletron (#974)
danielkorzekwa Mar 11, 2026
4dc9932
Draft: anymodel activation scoring (#989)
danielkorzekwa Mar 12, 2026
d358eb3
Draft: Merge anymodel pruning (#990)
danielkorzekwa Mar 12, 2026
8e827f3
Draft: Merging anymodel:build_library_and_stats (#993)
danielkorzekwa Mar 12, 2026
eb4b210
Draft: merge any model calc one block scores (#994)
danielkorzekwa Mar 12, 2026
8fe318d
Draft: merge any_model: mip_and_realize_models (#995)
danielkorzekwa Mar 13, 2026
2fbdf0e
Update uv.lock for nspect puzzletron scanning
kevalmorabia97 Mar 13, 2026
1b42f0b
Dkorzekwa/any model other models (#1007)
danielkorzekwa Mar 17, 2026
67999eb
Dkorzekwa/anymodel gptoss (#1020)
danielkorzekwa Mar 17, 2026
660dc17
Merge any_model tutorial (#1035)
danielkorzekwa Mar 19, 2026
01cba6a
Merge mbridge distillation for any_model (#1036)
danielkorzekwa Mar 20, 2026
2b6572c
MR branch for the remaining difference between dkorzekwa/any_model an…
danielkorzekwa Mar 20, 2026
110316a
Dkorzekwa/decilm hf code cleanup (#1071)
danielkorzekwa Mar 23, 2026
4190275
Dkorzekwa/decilm hf code cleanup 2 (#1073)
danielkorzekwa Mar 23, 2026
0708ca2
Dkorzekwa/anymodel subblock stats (#1085)
danielkorzekwa Mar 24, 2026
3193f30
Dkorzekwa/anymodel subblock stats nodecilm (#1102)
danielkorzekwa Mar 24, 2026
928036e
Dkorzekwa/decilm cleanup post subblockstats (#1103)
danielkorzekwa Mar 24, 2026
e508b76
code clean up (#1110)
danielkorzekwa Mar 24, 2026
f460d16
Merge branch 'main' into feature/puzzletron
kevalmorabia97 Mar 25, 2026
2f55c73
Dkorzekwa/puzzletron use importance hooks from prune (#1115)
danielkorzekwa Mar 25, 2026
c5ec50b
Merge remote-tracking branch 'origin/main' into feature/puzzletron
kevalmorabia97 Mar 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ modelopt/torch/nas @NVIDIA/modelopt-torch-nas-prune-codeowners
modelopt/torch/opt @NVIDIA/modelopt-torch-opt-codeowners
modelopt/torch/peft @NVIDIA/modelopt-torch-peft-codeowners
modelopt/torch/prune @NVIDIA/modelopt-torch-nas-prune-codeowners
modelopt/torch/puzzletron @NVIDIA/modelopt-torch-puzzletron-codeowners
modelopt/torch/quantization @NVIDIA/modelopt-torch-quantization-codeowners
modelopt/torch/sparsity @NVIDIA/modelopt-torch-sparsity-codeowners
modelopt/torch/speculative @NVIDIA/modelopt-torch-speculative-codeowners
Expand Down
102 changes: 5 additions & 97 deletions .github/workflows/example_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -56,83 +56,21 @@ jobs:
match_pattern: "^DCO$|^linux$" # Wait for DCO and Unit tests / linux to pass
delay: 300s

##### PyTorch Example Tests (speculative_decoding requires 26.01 image) #####
torch-pr:
needs: [check-file-changes, wait-checks]
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
strategy: &torch_strategy
fail-fast: false
matrix:
example: [llm_distill, llm_qat, llm_sparsity]
include:
- example: speculative_decoding
docker_image: "26.01"
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/pytorch:${{ matrix.docker_image || '26.01' }}-py3"
example: ${{ matrix.example }}
timeout_minutes: 30
pip_install_extras: "[hf,dev-test]"
runner: linux-amd64-gpu-h100-latest-1

torch-non-pr:
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
strategy: *torch_strategy
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/pytorch:${{ matrix.docker_image || '26.01' }}-py3"
example: ${{ matrix.example }}
timeout_minutes: 30
pip_install_extras: "[hf,dev-test]"
runner: linux-amd64-gpu-rtxpro6000-latest-2

##### TensorRT-LLM Example Tests #####
trtllm-pr:
needs: [check-file-changes, wait-checks]
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
strategy:
fail-fast: false
matrix:
example: [llm_ptq, vlm_ptq]
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5"
example: ${{ matrix.example }}
pip_install_extras: "[hf,dev-test]"
runner: linux-amd64-gpu-rtxpro6000-latest-1

trtllm-non-pr:
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
strategy:
fail-fast: false
matrix:
example: [llm_autodeploy, llm_eval, llm_ptq, vlm_ptq]
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/tensorrt-llm/release:1.3.0rc5"
example: ${{ matrix.example }}
pip_install_extras: "[hf,dev-test]"
runner: linux-amd64-gpu-rtxpro6000-latest-2

##### NeMo Example Tests #####
nemo-pr:
needs: [check-file-changes, wait-checks]
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
strategy: &nemo_strategy
fail-fast: false
matrix:
example: [megatron_bridge]
example: [megatron_bridge, puzzletron]
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/nemo:26.02"
example: ${{ matrix.example }}
timeout_minutes: 30
pip_install_extras: "[hf,dev-test]"
pip_install_extras: "[hf,puzzletron,dev-test]"
runner: linux-amd64-gpu-rtxpro6000-latest-1

nemo-non-pr:
Expand All @@ -144,50 +82,20 @@ jobs:
docker_image: "nvcr.io/nvidia/nemo:26.02"
example: ${{ matrix.example }}
timeout_minutes: 30
pip_install_extras: "[hf,dev-test]"
runner: linux-amd64-gpu-rtxpro6000-latest-2

##### ONNX/TensorRT Example Tests #####
onnx-pr:
needs: [check-file-changes, wait-checks]
if: startsWith(github.ref, 'refs/heads/pull-request/') && needs.check-file-changes.outputs.any_changed == 'true'
strategy: &onnx_strategy
fail-fast: false
matrix:
example: [diffusers, torch_onnx]
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/tensorrt:26.02-py3"
example: ${{ matrix.example }}
pip_install_extras: "[all,dev-test]"
runner: linux-amd64-gpu-l4-latest-1

onnx-non-pr:
if: ${{ !startsWith(github.ref, 'refs/heads/pull-request/') }}
strategy: *onnx_strategy
uses: ./.github/workflows/_example_tests_runner.yml
secrets: inherit
with:
docker_image: "nvcr.io/nvidia/tensorrt:26.02-py3"
example: ${{ matrix.example }}
pip_install_extras: "[all,dev-test]"
pip_install_extras: "[hf,puzzletron,dev-test]"
runner: linux-amd64-gpu-rtxpro6000-latest-2

##### Required Check for PR #####
example-pr-required-check:
# Run even if example tests are skipped
if: ${{ startsWith(github.ref, 'refs/heads/pull-request/') && always() }}
needs: [check-file-changes, torch-pr, trtllm-pr, nemo-pr, onnx-pr]
needs: [check-file-changes, nemo-pr]
runs-on: ubuntu-latest
steps:
- name: Required GPU tests did not succeed
if: |
needs.check-file-changes.result != 'success' ||
(needs.check-file-changes.outputs.any_changed == 'true' && (
needs.torch-pr.result != 'success' ||
needs.trtllm-pr.result != 'success' ||
needs.nemo-pr.result != 'success' ||
needs.onnx-pr.result != 'success'
needs.nemo-pr.result != 'success'
))
run: exit 1
20 changes: 11 additions & 9 deletions .github/workflows/gpu_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,16 +62,16 @@ jobs:
fail-fast: false
matrix:
include:
- example: gpu
timeout: 45
container_image: pytorch:26.01-py3
- example: gpu-megatron
timeout: 45
container_image: pytorch:26.01-py3
- example: gpu-trtllm
- example: gpu-puzzletron
timeout: 30
container_image: tensorrt-llm/release:1.3.0rc5
runs-on: linux-amd64-gpu-rtxpro6000-latest-1
container_image: pytorch:26.01-py3
# - example: gpu-megatron
# timeout: 45
# container_image: pytorch:26.01-py3
# - example: gpu-trtllm
# timeout: 30
# container_image: tensorrt-llm/release:1.3.0rc5
runs-on: linux-amd64-gpu-rtxpro6000-latest-2
timeout-minutes: ${{ matrix.timeout }}
container: &gpu_container
image: nvcr.io/nvidia/${{ matrix.container_image }}
Expand All @@ -85,6 +85,8 @@ jobs:
- name: Setup environment variables
run: |
echo "LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/include:/usr/lib/x86_64-linux-gnu" >> $GITHUB_ENV
- name: Install dependencies for mip
run: apt-get update && apt-get install -y libffi-dev
- name: Run gpu tests
run: pip install tox-current-env && tox -e cuda13-${{ matrix.example }} --current-env
gpu-tests-non-pr:
Expand Down
16 changes: 14 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,18 @@ repos:
hooks:
- id: ruff-check
args: [--fix, --exit-non-zero-on-fix]
exclude: ^examples/specdec_bench/specdec_bench/datasets/speed\.py$
# See: commit hooks modifies block_config.py leading to test_puzzletron.py failing (#25) · Issues · omniml / modelopt · GitLab
exclude: >
(?x)^(
^examples/specdec_bench/specdec_bench/datasets/speed\.py$|
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config\.py
)$
- id: ruff-format
exclude: ^examples/specdec_bench/specdec_bench/datasets/speed\.py$
exclude: >
(?x)^(
^examples/specdec_bench/specdec_bench/datasets/speed\.py$|
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config\.py
)$

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.17.1
Expand Down Expand Up @@ -95,10 +104,13 @@ repos:
examples/llm_eval/modeling.py|
examples/llm_qat/main.py|
examples/llm_sparsity/weight_sparsity/finetune.py|
examples/puzzletron/evaluation/lm_eval_anymodel.py|
examples/specdec_bench/specdec_bench/models/specbench_medusa.py|
examples/speculative_decoding/main.py|
examples/speculative_decoding/medusa_utils.py|
examples/speculative_decoding/server_generate.py|
examples/puzzletron/evaluation/lm_eval_anymodel.py|
modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py|
experimental/dms/models/qwen3/configuration_qwen3_dms.py|
experimental/dms/models/qwen3/modeling_qwen3_dms.py|
)$
Expand Down
1 change: 1 addition & 0 deletions examples/pruning/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Pruning can involve removal (prune) of Linear and Conv layers; and Transformer a
This section focuses on applying Model Optimizer's state-of-the-art complementary pruning modes to enable you to search for the best subnet architecture from your provided base model:

1. [Minitron](https://arxiv.org/pdf/2408.11796): A pruning method developed by NVIDIA Research for pruning GPT (and later extended to Mamba, MoE, and Hybrid Transformer Mamba) models in NVIDIA Megatron-LM (M-LM) or Megatron-Bridge (M-Bridge) framework. It uses the activation magnitudes to prune the embedding hidden size; mlp ffn hidden size; transformer attention heads; mamba heads and head dimension; MoE number of experts, ffn hidden size, and shared expert intermediate size; and number of layers of the model.
1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using Mixed Integer Programming (MIP) based NAS search algorithm.
1. FastNAS: A pruning method recommended for Computer Vision models. Given a pretrained model, FastNAS finds the subnet which maximizes the score function while meeting the given constraints.
1. GradNAS: A light-weight pruning method recommended for language models like Hugging Face BERT, GPT-J. It uses the gradient information to prune the model's linear layers and attention heads to meet the given constraints.

Expand Down
14 changes: 14 additions & 0 deletions examples/puzzletron/GPTOSS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@

## GptOss

With this release Puzzle algorithm supports only experts removal for `Gpt-Oss`.

This model comes as a quantized checkpoint i.e. MoE experts matrices are quantized with _MXFP4_ format.
In the pruning steps puzzle utilizes decompressed model (back to BF16) for statistics and scores computation.
This means, during the conversion to puzzle format we decompress the model and store it as a BF16.
Once the pruning is done i.e. experts to be removed are identified and the process is finished, user may want to get back the _MXFP4_ format of the checkpoint.
To do so, there is an additional script, that takes the original and the pruned checkpoint and outputs pruned checkpoint in _MXFP4_ format.

```bash
python -m modelopt.torch.puzzletron.anymodel.models.gpt_oss.gpt_oss_pruned_to_mxfp4 --student-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/solution_0/ --original-path /workspaces/source_model_checkpoints/openai_gpt-oss-20b/ --output-path /workspaces/any_model_gpt_oss/mip/puzzle_solutions/stats_num_params_18014757184/solutions--checkpoints/mxfp4-ckpt/ --num-layers 24
```
Loading
Loading