Skip to content

Merge puzzletron compression algorithm#1121

Open
danielkorzekwa wants to merge 62 commits intomainfrom
feature/puzzletron
Open

Merge puzzletron compression algorithm#1121
danielkorzekwa wants to merge 62 commits intomainfrom
feature/puzzletron

Conversation

@danielkorzekwa
Copy link
Contributor

@danielkorzekwa danielkorzekwa commented Mar 25, 2026

What does this PR do?

Implement puzzletron compression algorithm based on Puzzle paper (https://arxiv.org/abs/2411.19146)

Th list of reviewed and merged MRs that resulted in the feature/puzzletron branch:

Merging dkorzekwa/any_model to feature/puzzletron

Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull Request #974 · NVIDIA/Model-Optimizer - merged

Draft: anymodel activation scoring by danielkorzekwa · Pull Request #989 · NVIDIA/Model-Optimizer - merged

Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 · NVIDIA/Model-Optimizer - merged

Draft: Merging anymodel:build_library_and_stats by danielkorzekwa · Pull Request #993 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull Request #994 · NVIDIA/Model-Optimizer - merged

Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull Request #995 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request #1007 · NVIDIA/Model-Optimizer - merged

PR to 1007: #1039 - merged

Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 · NVIDIA/Model-Optimizer - merged

Merge any_model tutorial by danielkorzekwa · Pull Request #1035 · NVIDIA/Model-Optimizer - merged

Merge mbridge distillation for any_model by danielkorzekwa · Pull Request #1036 · NVIDIA/Model-Optimizer - merged

MR branch for the remaining difference between dkorzekwa/any_model an… by danielkorzekwa · Pull Request #1047 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request #1073 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request #1085 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull Request #1102 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull Request #1103 · NVIDIA/Model-Optimizer - merged

code clean up by danielkorzekwa · Pull Request #1110 · NVIDIA/Model-Optimizer - merged

Merging into main:

Activation hooks redesign (reuse hooks component across both minitron and puzzletron) by danielkorzekwa · Pull Request #1022 · NVIDIA/Model-Optimizer - merged

Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa · Pull Request #1115 · NVIDIA/Model-Optimizer - merged

Usage

Puzzletron tutorial:
https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron

Testing

The main e2e test for compressing 9 models with Puzzletron:
https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py

Before your PR is "Ready for review"

  • Is this change backward compatible?: ✅
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅
  • Did you write any new necessary tests?: ✅
  • Did you update Changelog?: ❌

Summary by CodeRabbit

  • New Features

    • Added Puzzletron, an experimental LLM compression algorithm using Mixed Integer Programming-based Neural Architecture Search, supporting heterogeneous model modifications including FFN intermediate pruning, KV-head reduction, and expert removal.
    • Added support for multiple model architectures (Llama, Mistral, Qwen, GPT-OSS, NeMo) with configurable compression workflows.
    • Added knowledge distillation and inference evaluation utilities including Megatron-Bridge export support and lm-eval integration.
  • Documentation

    • Added comprehensive guides and example configurations for running the Puzzletron compression pipeline.
  • Chores

    • Updated workflows and build configuration to support new compression tooling.

kevalmorabia97 and others added 30 commits October 27, 2025 17:42
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…iv.org/abs/2411.19146 (#464)

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

Add decilm modelling code

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Compress tutorial (PoC) + compress cli app.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com>
Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
#545)

## What does this PR do?

Add llama converter (no dependency on internal Nvidia code) - part 1/2
- change top-level dependencies in convert_llama3_to_decilm.py from
puzzle_tools.... to modelopt.....
- added modelopt.torch._compress.tools module
- remove tokenization_mistral.py - not used

scope of 2/2 part (will come once part 1/2 is merged):
- change all deeper dependencies from from puzzle_tools.... to
modelopt....
- test_convert_llama3_config_to_decilm_config.py should run without any
internal nvidia dependencies

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ia code) (#552)

## What does this PR do?
llama converter is self-contained now (no dependency on internal nvidia
code)

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?
Make score pruning activations self-contained (no dependency on internal
Nvidia code) - 1/6
- add integration test for attention pruning (pre-step to make adding
activation scoring safer)

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?
- Add score_pruning_activations.py

Notes:
- validate_model.py still depends on Nvidia internal code (will be
changed in the subsequent MR)
- sharded_checkpoint_utils.py - for now it needs to use DeciLM from
internal Nvidia code, to be changed in the next MR

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

Add activation hooks used for pruning

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ng is self-contained now (#584)

## What does this PR do?

Add sewing kit and utilities used for pruning scoring - pruning scoring
is self-contained now - no dependency on internal Nvidia code.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?
- Add L2NormHook and use it in megatron.py
- Using L2NormHook removes code duplication between
_DynamicSelfAttention and _DynamicMLP

This is the first step towards reusing activation scores logic across
Minitron and Puzzle. Next steps:
- complete redesign of megatron.py - move other activation hooks logic
to hooks.py
- then combined those hooks.py with a similar hooks.py functoriality in
puzzle
(modelopt/torch/_compress/activation_scoring/activation_hooks/hooks.py)

Questions:
- why in the code before and after this redesign we store temp variables
in two ways _register_temp_attribute and self.hook_handle)?

```
self._register_temp_attribute("_activation_hook", activation_hook)
# TODO: confusion: why hook_handle is removed manually in export() and not using _register_temp_attribute?
self.hook_handle = self.linear_fc2.register_forward_hook(activation_hook)
```

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

Add pruning checkpoints for the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add build replacement library to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add subblock stats to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add 1-block scoring to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…utionHook (#610)

## What does this PR do?

Add checkpoint save/load to ForwardHook + add
IterativeChannelContributionHook.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do?

Add MIP step to the compress algorithm.

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?

**Type of change:** Improvement <!-- Use one of the following: Bug fix,
new feature, new example, new tests, documentation. -->

- Fix tests for 2-gpu: Some places hard-coded cpu device for distributed
communications which was causing this issue
- Remove unused constrain_search_space.py
- Remove `is_multi_layer_puzzle: False` case
- Remove `use_greedy_search: False` case
- Remove knapsack mip case
- Remove unused `num_solutions` and `minimal_diversity` flags

## Testing
<!-- Mention how have you tested your change if applicable. -->

- GH CICD test passing
- Tested on 2-gpu setup locally as well


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

# Release Notes

* **Refactor**
  * Optimized solver implementation with improved library integration.
* Simplified model compression configuration by removing deprecated
search options.
  * Consolidated optimization paths for streamlined processing.

* **Chores**
  * Updated dependencies for improved compatibility.

* **Documentation**
  * Clarified Model-Optimizer installation instructions in examples.

<sub>✏️ Tip: You can customize this high-level summary in your review
settings.</sub>

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…hooks analysis (#670)

## What does this PR do?

Fix a bug in IterativeChannelContributionHook + tools for activation
hooks analysis

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…unctions (#667)

## What does this PR do?

- Remove `runtime.py` and directly use
`modelopt.torch.utils.distributed` functions
- Read model_dtype and autocast_dtype from validate_model_defaults.yaml
instead of runtime object
- Remove more unused functions
- Remove unnecessary parse_args for intermediate steps, and improve
docstrings

## Testing
<!-- Mention how have you tested your change if applicable. -->

- No change in functionality
- Existing tests passing

---------

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
danielkorzekwa and others added 18 commits March 12, 2026 17:51
### What does this PR do?

Merging dkorzekwa/anymodel_build_library_and_stats into
dkorzekwa/anymodel_pruning - this MR is only for reviewing. Ultimately
dkorzekwa/anymodel_build_library_and_stats should be merged into
feature/puzzletron once dkorzekwa/anymodel_pruning scoring is merged
there.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Integrated language-model descriptors to improve compatibility and
produce model-specific stats.

* **Bug Fixes**
  * Re‑enabled the single‑process library build and stats workflow step.

* **Documentation**
* Expanded usage docs and CLI guidance for the replacement library
builder.

* **Refactor**
* Improved typing/annotations and normalized key‑value head handling,
affecting attention memory/parameter calculations and textual outputs.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do?

Merging dkorzekwa/any_model_calc_one_block_scores into
dkorzekwa/anymodel_build_library_and_stats - this MR is only for
reviewing. Ultimately dkorzekwa/any_model_calc_one_block_scores should
be merged into feature/puzzletron once
dkorzekwa/anymodel_build_library_and_stats is merged there.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Enabled distributed scoring step in puzzletron workflow for improved
performance.

* **Improvements**
* Extended model loading support to accommodate additional model formats
and configurations.
* Optimized GPU memory management during validation with improved cache
synchronization.
  * Streamlined validation function signatures for easier usage.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do?

Merging dkorzekwa/mip_and_realize_models into
dkorzekwa/any_model_calc_one_block_scores - this MR is only for
reviewing. Ultimately dkorzekwa/mip_and_realize_models should be merged
into feature/puzzletron once dkorzekwa/any_model_calc_one_block_scores
is merged there.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **New Features**
* Enabled model realization step during compression workflow after
scoring phase completes.

* **Bug Fixes**
* Fixed key-value head calculation in attention configuration sourcing.

* **Tests**
* Strengthened validation checks for compression artifacts and output
directories; added rank-aware assertions for model compression
expectations.

* **Chores**
  * Minor documentation formatting updates.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do?

Merging dkorzekwa/any_model_other_models into
dkorzekwa/mip_and_realize_models - this MR is only for reviewing.
Ultimately dkorzekwa/any_model_other_models should be merged into
feature/puzzletron once dkorzekwa/mip_and_realize_models is merged
there.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added support for multiple model architectures: Mistral Small,
Nemotron H, Nemotron H v2, Qwen2, Qwen3 8B, and Qwen3 VL 30B.
* Introduced new pruning configurations and optimization pipelines for
supported models.
* Added comprehensive model descriptor framework enabling automated
weight conversion and configuration handling.
* Extended support for Mixture of Experts (MoE) models with expert
removal pruning capabilities.

* **Tests**
* Enhanced test coverage with parametrized configurations for multiple
model variants.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do?

Merging dkorzekwa/anymodel_gptoss into dkorzekwa/any_model_other_models
- this MR is only for reviewing. Ultimately dkorzekwa/anymodel_gptoss
should be merged into feature/puzzletron once
dkorzekwa/any_model_other_models is merged there.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * Added support for GPT-OSS, Nemotron V2, Qwen2, and Qwen3 models.
  * Enabled MXFP4 quantization for GPT-OSS model compression.
  * Added expert removal pruning for mixture-of-experts (MoE) models.

* **Bug Fixes**
* Fixed padding token validation to ensure it doesn't exceed vocabulary
size.

* **Tests**
  * Enabled test coverage for GPT-OSS model workflows.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do?

Merge any_model tutorial for Puzzletron.


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
  * MIP sweep mode for multi-rate memory-compression searches
* Triton-ready HuggingFace deployable, lm-eval adapter, and
Ray-compatible inference pathways
  * Megatron-Bridge distillation CLI/workflow with optional HF export
* **New Configurations**
* Extensive pruning/memory-sweep profiles for GPT‑Oss, Llama, Mistral,
Nemotron, Qwen families
* **Documentation**
* GptOss guide and conversion example, expanded READMEs, MIP Quick
Start, NeMo evaluator notes
* **Chores**
  * Requirements updated (lm-eval bumped; math-verify, ray added)
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do?

Merge anymodel mbridge distillation


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
- Heterogeneous-layer support for Puzzletron AnyModel → Megatron-Core
conversions
  - Knowledge-distillation workflow support and tooling
  - Export of trained models to HuggingFace format (new export helper)
  - Bridge support extended for Llama and Qwen3 architectures
- New CLI options to control HuggingFace export destination and template
model

* **Tests**
  - Added integration test validating the distillation + export flow
  - Removed an obsolete trivial test

* **Documentation**
  - Updated README to document HF export flags

* **Chores**
- CI: switched example-tests workflow to use NeMo and updated pip
invocation
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
#1047)

### What does this PR do?
The remaining difference between dkorzekwa/any_model and
dkorzekwa/anymodel_mbridgedist



<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added an end-to-end MIP sweep workflow that runs sweeps, collects
metrics, and writes CSV reports.

* **Improvements**
* Simplified attention implementation selection and clarified attention
display wording.
  * Realization pipeline now returns realized model solution paths.
  * Caching behavior adjusted for token ID retrieval.
  * Test model generation scales layer count with distributed size.

* **Bug Fixes**
  * Checkpoint hook state restored using the correct load API.

* **Chores**
  * Extended pre-commit license hook exclusion.

* **Tests**
  * Added GPU/distributed test fixtures and dtype/checkout helpers.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do?
- Delete unused decilm code

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Removals**
  * Removed model conversion utilities for Llama-Nemotron format
* Removed DeciLM model classes, tokenizer implementations, and
configuration utilities
  * Removed checkpoint import/export functionality
* Removed heterogeneous transformer layer specifications and
configuration builders

* **Updates**
  * Updated pre-commit configuration for additional file exclusions
  * Updated imports across modules to reflect removed dependencies

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do?

Delete not used decilm code

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Removed DeciLM-specific components including decoder layers, attention
implementations, and specialized cache utilities, streamlining the
codebase
* Updated replacement library to use generic model configurations
instead of DeciLM-specific types, improving compatibility with diverse
architectures
* Cleaned up internal utilities for attention masking, flash attention
compatibility, and rotary position embeddings

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do?

Integration tests for subblock stats (memory + num_of_params)

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Refactor**
* Improved teacher model configuration loading and statistics
computation for enhanced accuracy in memory and parameter calculations.

* **Tests**
* Added comprehensive tests validating teacher model memory and
parameter statistics calculations with strict accuracy tolerances.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do?

Refactoring of subblock stats to stop using DeciLM code and use anymodel
instead.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Refactor**
* Restructured internal estimation logic for model memory and parameters
to support broader model architectures.
  * Updated model initialization utilities.

* **Tests**
  * Updated baseline metrics for validation tests.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do?

Removing unused code from
modelopt/torch/puzzletron/decilm/deci_lm_hf_code - completed.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

* **Chores**
* Removed vendored Transformers modules including activation functions,
cache utilities, and configuration classes.
  * Removed custom DeciLM configuration class.
* Updated internal type signatures from DeciLMConfig to standard Hugging
Face PretrainedConfig.
* Simplified internal function signatures by removing unused parameters.
  * Enhanced code quality checks via pre-commit configuration updates.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do?

Remove hardcoded trust_remote_code=true

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

## Release Notes

* **Documentation**
* Clarified documentation for parameter counting in model layer
calculations.

* **Bug Fixes**
* Remote code execution is now disabled by default in model
configuration loading. Explicitly enable `trust_remote_code` when using
custom modeling code.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do?

Use pruning importance hooks from prune/importance_hooks/ in
modelopt.torch.puzzletron

---------

Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Introduces Puzzletron, a comprehensive LLM model compression algorithm featuring model conversion to a heterogeneous block-config format, activation scoring, multi-stage pruning (FFN/attention/expert/hidden-dim), MIP-based architecture search, and model realization. Includes AnyModel framework for supporting multiple model families, distributed execution infrastructure, evaluation tools, and extensive configuration examples for multiple model variants.

Changes

Cohort / File(s) Summary
Workflow & Configuration
.github/workflows/example_tests.yml, .github/workflows/gpu_tests.yml, .github/CODEOWNERS, .pre-commit-config.yaml
Updated CI/CD workflows to support puzzletron example tests, changed GPU test matrix from multiple frameworks to single gpu-puzzletron entry, added Puzzletron CODEOWNERS entry, excluded Puzzletron files from linting checks.
Documentation & Examples
examples/pruning/README.md, examples/puzzletron/*
Added Puzzletron end-to-end README documenting compression workflow, GPTOSS support, evaluation via lm-eval and NeMo Evaluator, MIP sweep mode, knowledge distillation guide, and multiple model config examples (Llama, Mistral, Qwen variants, Nemotron, GPT-OSS).
AnyModel Framework
modelopt/torch/puzzletron/anymodel/
Introduced extensible model conversion framework with converter factory, model descriptor registry, support for multiple HF models, and model-specific layer handling (GPT-OSS, Llama, Mistral, Qwen, Nemotron variants).
Pruning & Initialization
modelopt/torch/puzzletron/pruning/, modelopt/torch/puzzletron/tools/bypassed_training/
Added pluggable pruning mixins (FFN intermediate, KV-heads, expert removal, hidden dimension), initialization modes, and child checkpoint creation from parent via config overrides and pruning strategies.
Activation Scoring & Hooks
modelopt/torch/puzzletron/activation_scoring/, modelopt/torch/prune/importance_hooks/
Introduced activation hook registration utilities, expert removal importance hooks for MoE pruning, and checkpoint save/resume for scored activations.
Block Configuration
modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py
Defined frozen dataclass schema for heterogeneous block configs (attention/FFN/MoE/Mamba with per-subblock flags and validation).
Model Conversion Utilities
modelopt/torch/puzzletron/tools/checkpoint_utils.py, modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py, modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
Added checkpoint loading/saving utilities supporting safetensors sharding, config overrides, dynamic module caching, multi-threaded weight I/O, and per-rank shard construction.
Replacement Library & MIP
modelopt/torch/puzzletron/replacement_library/, modelopt/torch/puzzletron/mip/
Implemented replacement library building (deduplication, block/subblock organization), MIP solver for multi-layer replacement selection, sweep mode for memory compression rates, and solution validation/realization orchestration.
Subblock Statistics
modelopt/torch/puzzletron/subblock_stats/
Added computation of per-subblock memory (including KV-cache, attention, FFN, MoE) and parameter counts with dtype/batch/sequence-length parametrization.
Scoring & Validation
modelopt/torch/puzzletron/scoring/
Introduced solution validation orchestration, metrics gathering from multiple solution/checkpoint variants, and scoring entrypoint for hub-to-validation workflow.
Module Stitching Framework
modelopt/torch/puzzletron/sewing_kit/
Defined graph-based module composition system supporting internal/external/remote targets, activity context, passage-based I/O capture/override, and dependency-driven execution.
Evaluation & Distillation
examples/puzzletron/evaluation/, examples/puzzletron/mbridge_distillation/, modelopt/torch/puzzletron/export/mbridge/
Added HuggingFace deployment wrapper for Triton/Ray inference, lm-eval integration, Megatron-Bridge distillation script, and heterogeneous bridge providers for HF-to-Megatron export.
Utilities & Infrastructure
modelopt/torch/puzzletron/tools/hydra_utils.py, modelopt/torch/puzzletron/tools/logger.py, modelopt/torch/puzzletron/tools/robust_json.py, modelopt/torch/puzzletron/dataset/
Added Hydra config initialization, distributed logging with rank-selective output, robust JSON serialization for dataclasses/Path/Enum, and dataset preparation utilities.
Main Entrypoints
examples/puzzletron/main.py, modelopt/torch/puzzletron/puzzletron.py, modelopt/torch/puzzletron/build_library_and_stats.py, modelopt/torch/puzzletron/nas/plugins/puzzletron_nas_plugin.py
Defined CLI orchestration for full vs. MIP-only modes, unified library+stats building, and NAS mode registration with searcher/converter/restore callbacks.
Importance Hook Extensions
modelopt/torch/prune/importance_hooks/expert_removal_hooks.py, modelopt/torch/prune/importance_hooks/__init__.py, modelopt/torch/prune/importance_hooks/base_hooks.py
Added expert removal importance hooks via independent/router-based strategies, ranked-choice routing metadata, and minor fixes to activation hook kwargs handling and KV-head grouping logic.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as Main Entrypoint
    participant Convert as Model Converter
    participant Score as Activation Scoring
    participant Prune as Pruning
    participant Library as Replacement Library
    participant MIP as MIP Optimizer
    participant Realize as Model Realization
    participant Evaluate as Evaluation

    CLI->>Convert: 1. Convert HF model to Puzzletron format
    Convert->>Convert: Apply ModelDescriptor, create block_configs
    Convert-->>CLI: Converted heterogeneous checkpoint
    
    CLI->>Score: 2. Score pruning activations
    Score->>Score: Collect forward activations with hooks
    Score-->>CLI: Activation logs per layer
    
    CLI->>Prune: 3. Generate pruned child checkpoints
    Prune->>Prune: For each pruning strategy (FFN/attention/expert)
    Prune->>Prune: Initialize children from parent via mode
    Prune-->>CLI: Multiple pruned checkpoint variants
    
    CLI->>Library: 4. Build replacement library
    Library->>Library: Deduplicate subblocks, organize by layer
    Library->>Library: Create layer replacement mappings
    Library-->>CLI: Replacement library + single sequence solution
    
    CLI->>MIP: 5. Run MIP optimization
    MIP->>MIP: Gather subblock stats + metrics
    MIP->>MIP: Solve multi-layer selection (memory constraint)
    MIP-->>CLI: Ranked solutions with objectives/costs
    
    CLI->>Realize: 6. Realize models from solutions
    Realize->>Realize: Load replacement library, select configs
    Realize->>Realize: Assemble heterogeneous model
    Realize-->>CLI: Final pruned HF checkpoints
    
    CLI->>Evaluate: 7. Evaluate (lm-eval or NeMo Evaluator)
    Evaluate->>Evaluate: Benchmark accuracy/throughput
    Evaluate-->>CLI: Metrics (loss, token accuracy, speed)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

The PR introduces a substantial, multi-component compression pipeline with significant new abstractions (factory patterns, descriptor-based extensibility, MIP solving, graph stitching), distributed execution logic, and intricate initialization/pruning strategies. While some configurations are repetitive, the core infrastructure (converters, model descriptors, pruning mixins, MIP, replacement library, checkpoint utilities) contains dense logic and heterogeneous change patterns requiring separate reasoning per module. The high file count, variety of concerns (conversion, scoring, optimization, distillation, evaluation), and deep integration with existing modelopt patterns elevate complexity beyond simple refactoring.

Possibly related PRs

  • PR #1035: Overlaps significantly in Puzzletron example files and configuration structure, indicating parallel or iterative work on the same compression pipeline and model family support.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/puzzletron

@github-actions
Copy link
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1121/

Built to branch gh-pages at 2026-03-25 19:18 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 12

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟡 Minor comments (15)
modelopt/torch/puzzletron/tools/logger.py-164-168 (1)

164-168: ⚠️ Potential issue | 🟡 Minor

Docstring says "rank -1 in node 0" but implementation may not match.

The lprint docstring says "Last rank prints only (rank -1 in node 0)" but the implementation checks local_rank != world_size - 1. If the intent is the last rank globally, this logic appears incorrect. If the intent is the last local rank within a node, the comparison should be against local world size, not global world size.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/logger.py` around lines 164 - 168, The
docstring for lprint mismatches its rank-check logic; update either the
docstring or the implementation in lprint to match the intended behavior: if you
want "last rank in node 0" (last local rank) change the check to compare
local_rank against local_world_size - 1 (use the local world size variable used
elsewhere in logger or from torch/distributed) before calling logger.dist_log,
or if you intended the global last rank adjust the docstring to "last global
rank" and ensure the check compares local_rank against world_size - 1
accordingly; reference the lprint function and logger.dist_log to locate where
to change the comparison and/or docstring.
modelopt/torch/puzzletron/tools/logger.py-62-66 (1)

62-66: ⚠️ Potential issue | 🟡 Minor

Error message is inconsistent with valid choices.

The error message on line 65 lists ['all', 'main', 'local_main'] but 'last' is also a valid choice per line 62.

🐛 Proposed fix
         if ranks not in ["all", "main", "local_main", "last"]:
             raise NotImplementedError(
                 f"Could not broadcast msg {msg} - "
-                f"ranks parameters choices are ['all', 'main', 'local_main']. Got {ranks}"
+                f"ranks parameters choices are ['all', 'main', 'local_main', 'last']. Got {ranks}"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/logger.py` around lines 62 - 66, The
NotImplementedError message for the ranks check mentions the wrong set of valid
choices; update the f-string in the ranks validation (the raise
NotImplementedError block that references the variable ranks and msg) to include
'last' in the displayed list (i.e., ["all", "main", "local_main", "last"]) so
the error accurately reflects the valid options.
examples/pruning/README.md-10-10 (1)

10-10: ⚠️ Potential issue | 🟡 Minor

Fix compound adjective hyphenation in docs.

At Line 10, use “MIP-based” (not “MIP based”) for correct technical phrasing.

✏️ Proposed edit
-1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using Mixed Integer Programming (MIP) based NAS search algorithm.
+1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using a Mixed Integer Programming (MIP)-based NAS search algorithm.

As per coding guidelines, "**/*.{py,cpp,h,hpp,md}: All code (Python, C++, Markdown, etc.) must adhere to coding standards as checked by pre-commit hooks".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/pruning/README.md` at line 10, Update the compound adjective "MIP
based" in the Puzzletron README entry to the hyphenated form "MIP-based" so the
phrase reads "MIP-based NAS search algorithm"; locate the text string
"Puzzletron" / the line containing "MIP based NAS search algorithm" and replace
it with "MIP-based" to comply with the Markdown/style pre-commit checks.
examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml-14-17 (1)

14-17: ⚠️ Potential issue | 🟡 Minor

Fix the mismatch between target_memory and its comment.

16_000 MiB is about 15.6 GiB; 45 GiB would be 46_080 MiB. One of these is wrong, and the current ambiguity can make the example optimize for the wrong budget.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml`
around lines 14 - 17, The mip.human_constraints.target_memory value
(target_memory: 16_000) and its comment (“45 GiB”) disagree; update one to match
the other: either set target_memory to 46_080 (MiB) if you mean 45 GiB, or
change the comment to match 16_000 MiB (~15.6 GiB). Locate the target_memory key
under mip -> human_constraints and make the numeric value and the GiB comment
consistent.
examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml-14-17 (1)

14-17: ⚠️ Potential issue | 🟡 Minor

Align the MiB value with the GiB comment.

90_000 MiB is about 87.9 GiB, not 90 GiB. If the intended cap is 90 GiB, this should be 92_160; otherwise update the comment so the search target is unambiguous.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml`
around lines 14 - 17, The target_memory under mip.human_constraints is
inconsistent with the "90 GiB" comment; update the value of target_memory to
92_160 (MiB) to represent 90 GiB, or alternatively change the comment to the
actual GiB represented by 90_000 MiB (~87.9 GiB); edit the target_memory key in
this YAML (mip -> human_constraints -> target_memory) accordingly to keep value
and comment aligned.
examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml-17-17 (1)

17-17: ⚠️ Potential issue | 🟡 Minor

Memory unit comment is inaccurate.

Line 17 uses 45_000 MiB, which is ~43.95 GiB, not 45 GiB. Please update either the value or the comment to avoid misconfigured constraints.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml`
at line 17, The comment next to the target_memory value is incorrect:
target_memory: 45_000 is in MiB (~43.95 GiB), not 45 GiB; either set
target_memory to 46_080 (45 * 1024) to represent 45 GiB exactly or update the
inline comment to the correct GiB approximation (≈43.95 GiB); locate the
target_memory entry and apply one of these fixes so the value and its comment
match.
modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py-36-52 (1)

36-52: ⚠️ Potential issue | 🟡 Minor

Return type annotation mismatch: method declares List[BlockConfig] but returns List[dict].

The method signature declares a return type of List[BlockConfig], but line 50 calls .to_dict() on each BlockConfig instance, causing the method to actually return List[dict]. Update the return type annotation to match the actual behavior.

🔧 Suggested fix
     `@staticmethod`
-    def create_block_configs_from_main_config(config: LlamaConfig) -> List[BlockConfig]:
+    def create_block_configs_from_main_config(config: LlamaConfig) -> List[dict]:
         """Create uniform block configs for all Llama layers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py` around
lines 36 - 52, The function create_block_configs_from_main_config currently
builds BlockConfig objects but calls .to_dict(), so its declared return type
List[BlockConfig] is incorrect; update the signature to return List[dict] (or
List[MutableMapping] if you prefer an abstract type) to match the actual
returned value, and ensure the typing import (List) is present/adjusted
accordingly; reference create_block_configs_from_main_config, LlamaConfig,
BlockConfig, and the .to_dict() call when making the change.
modelopt/torch/puzzletron/anymodel/converter/converter_factory.py-43-47 (1)

43-47: ⚠️ Potential issue | 🟡 Minor

Handle potential None from inspect.getmodule().

inspect.getmodule(ref) can return None if the module cannot be determined, which would cause an AttributeError when accessing .__name__.

🛡️ Proposed fix
-                raise KeyError(
-                    f"Could not register `{cls_name}`: {cls_type}, "
-                    f"`{cls_name}` is already registered and points to "
-                    f"`{inspect.getmodule(ref).__name__}.{ref.__name__}`"
-                )
+                module = inspect.getmodule(ref)
+                module_name = module.__name__ if module else "<unknown>"
+                raise KeyError(
+                    f"Could not register `{cls_name}`: {cls_type}, "
+                    f"`{cls_name}` is already registered and points to "
+                    f"`{module_name}.{ref.__name__}`"
+                )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/anymodel/converter/converter_factory.py` around
lines 43 - 47, The KeyError message in converter_factory.py assumes
inspect.getmodule(ref) always returns a module; update the error construction in
the registration failure branch (where cls_name, cls_type and ref are used) to
guard against None from inspect.getmodule(ref) by using a safe fallback (e.g.,
module = inspect.getmodule(ref); module_name = module.__name__ if module is not
None else getattr(ref, "__module__", "unknown")) and use module_name rather than
inspect.getmodule(ref).__name__; ensure ref.__name__ (or getattr(ref,
"__name__", repr(ref))) is similarly accessed safely so the f-string does not
raise AttributeError.
modelopt/torch/puzzletron/puzzletron.py-60-74 (1)

60-74: ⚠️ Potential issue | 🟡 Minor

Step numbering is inconsistent.

Comments jump from "Step 2" to "Step 4", skipping "Step 3". This appears to be a typo or leftover from a refactor. Consider renumbering for clarity:

📝 Proposed fix
     # Step 2: pruning_ckpts (single process)
     if dist.is_master():
         pruning_ckpts.launch_prune_ckpt(hydra_cfg)
     dist.barrier()

-    # Step 4: build_library_and_stats (single process)
+    # Step 3: build_library_and_stats (single process)
     if dist.is_master():
         build_library_and_stats.launch_build_library_and_stats(hydra_cfg)
     dist.barrier()

-    # Step 5: calc_one_block_scores (distributed processing)
+    # Step 4: calc_one_block_scores (distributed processing)
     scoring.launch_scoring(hydra_cfg)

-    # Step 6: mip_and_realize_models (distributed processing)
+    # Step 5: mip_and_realize_models (distributed processing)
     mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/puzzletron.py` around lines 60 - 74, The step
comments are misnumbered (jump from "Step 2" to "Step 4"); update the inline
step comments in puzzletron.py to use a consistent sequence (e.g., Step 2 → Step
3 → Step 4 etc.) around the blocks that call pruning_ckpts.launch_prune_ckpt,
build_library_and_stats.launch_build_library_and_stats, scoring.launch_scoring,
and mip_and_realize_models.launch_mip_and_realize_model so the step numbers
reflect the actual order of execution.
examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml-20-31 (1)

20-31: ⚠️ Potential issue | 🟡 Minor

Remove unused ffn_list definition or align with intermediate_size_list usage.

The file serves as a defaults template for Hydra configuration, but there's an inconsistency: ffn_list is defined on line 21 with no value, yet the actual FFN pruning configuration (ffn_pruning.yaml) populates and uses intermediate_size_list instead. Either rename ffn_list to intermediate_size_list for consistency, or remove the unused ffn_list definition.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml`
around lines 20 - 31, The defaults file defines an unused key ffn_list while the
FFN pruning logic uses intermediate_size_list (see ffn_pruning.yaml); fix by
either removing the ffn_list entry entirely or renaming it to
intermediate_size_list and ensuring the init mode key (mlp_init_mode) matches
the expected name used by ffn_pruning.yaml so the Hydra defaults align with the
pruning config.
modelopt/torch/puzzletron/tools/hydra_utils.py-29-36 (1)

29-36: ⚠️ Potential issue | 🟡 Minor

Validate block and mbs before division in warmup_steps.

block=0 or mbs=0 causes a runtime failure. Add explicit positive-value checks and fail early with a clear error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/hydra_utils.py` around lines 29 - 36, The
warmup_steps function currently divides by block and mbs which will raise on
zero; update warmup_steps to validate that block and mbs are positive integers
(e.g., block > 0 and mbs > 0) at the top of the function and raise a clear
ValueError if not; keep the existing logic (compute steps =
(tokens//block)//mbs, compute w = pct * steps, return max(1, round(w))) but fail
fast with an explanatory error mentioning the parameter names block and mbs so
callers get a clear message when invalid values are passed.
modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py-46-47 (1)

46-47: ⚠️ Potential issue | 🟡 Minor

Avoid mutating caller-owned activation_hooks_kwargs.

This function modifies the input dict in place (["model"] = model), which can leak state across calls. Copy it first before enrichment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py`
around lines 46 - 47, The code mutates the caller-owned dict
activation_hooks_kwargs by setting activation_hooks_kwargs["model"] = model;
instead, make a shallow copy of activation_hooks_kwargs (e.g., new_kwargs =
activation_hooks_kwargs.copy() or dict(activation_hooks_kwargs)), add the
"model" key to the copy, and use/return that copy so the original dict isn’t
mutated; update any return/usage to use the new_kwargs variable.
examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml-13-13 (1)

13-13: ⚠️ Potential issue | 🟡 Minor

Minor typo in comment.

ppath should be path.

Proposed fix
-dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2
+dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml`
at line 13, Fix the typo in the YAML comment for the dataset key: change the
comment after the dataset_path entry so it reads "path to
Nemotron-Post-Training-Dataset-v2" instead of "ppath to
Nemotron-Post-Training-Dataset-v2" so the comment for the dataset_path key is
correct and clear.
modelopt/torch/puzzletron/tools/validate_model.py-138-140 (1)

138-140: ⚠️ Potential issue | 🟡 Minor

Potential issue: validation_full_iters could be zero.

If args.eval_samples is less than args.micro_batch_size, the integer division results in validation_full_iters = 0. This could cause unexpected behavior in downstream code that expects at least one iteration.

🛡️ Proposed fix: Add validation or use ceiling division
     validation_full_iters = (
-        args.eval_samples // args.micro_batch_size
+        max(1, args.eval_samples // args.micro_batch_size)
     )  # model pipeline, single data rank

Or add an assertion:

     validation_full_iters = (
         args.eval_samples // args.micro_batch_size
     )  # model pipeline, single data rank
+    assert validation_full_iters > 0, f"eval_samples ({args.eval_samples}) must be >= micro_batch_size ({args.micro_batch_size})"
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_model.py` around lines 138 - 140,
The calculation of validation_full_iters can yield zero when args.eval_samples <
args.micro_batch_size; update the logic around validation_full_iters in
validate_model.py to ensure at least one validation iteration by either
validating inputs (raise/exit if args.eval_samples < args.micro_batch_size) or
computing a ceiling division and then taking max(1, ...); reference the variable
validation_full_iters and the CLI args args.eval_samples and
args.micro_batch_size when making the change so downstream loops that expect >=1
iteration do not break.
modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py-316-321 (1)

316-321: ⚠️ Potential issue | 🟡 Minor

Actually emit the missing-MoE-stats warning.

Warning(...) only instantiates an object; it does not notify the caller. Right now a missing moe_stats.json silently changes active_params from MoE-aware counts to total params. Use warnings.warn(...) or mprint(...) here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
316 - 321, The code creates a pathlib object moe_stats_file and on missing file
only instantiates Warning(...) which does nothing; update the block around
moe_stats_file (using variables moe_stats_file, moe_stats_filename,
master_puzzle_dir and the downstream active_params logic) to actually emit a
warning by calling warnings.warn(...) (or mprint(...)) with a clear message like
f"MOE stats file {moe_stats_file} does not exist, can't calculate num active
params", and add `import warnings` at the top if using warnings.warn; keep
setting moe_stats_file = None so the fallback to total params remains.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 31341e54-9727-48f6-b865-606bb7607464

📥 Commits

Reviewing files that changed from the base of the PR and between 291498b and c5ec50b.

⛔ Files ignored due to path filters (2)
  • examples/puzzletron/mip_sweep_example.png is excluded by !**/*.png
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (212)
  • .github/CODEOWNERS
  • .github/workflows/example_tests.yml
  • .github/workflows/gpu_tests.yml
  • .pre-commit-config.yaml
  • examples/pruning/README.md
  • examples/puzzletron/GPTOSS.md
  • examples/puzzletron/README.md
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/Llama-3_2-3B.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/Mistral-Small-24B.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/mistral-small-24b-instruct-2501_pruneffn_memory.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/validate_model_defaults.yaml
  • examples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct_pruneffn_memory.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/attn_pruning.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/hidden_dim_pruning.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/pruning_defaults.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_model_defaults.yaml
  • examples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_solutions_defaults.yaml
  • examples/puzzletron/evaluation/hf_deployable_anymodel.py
  • examples/puzzletron/evaluation/lm_eval_anymodel.py
  • examples/puzzletron/evaluation/nemo_evaluator_instructions.md
  • examples/puzzletron/main.py
  • examples/puzzletron/mbridge_distillation/README.md
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • examples/puzzletron/requirements.txt
  • modelopt/torch/prune/importance_hooks/__init__.py
  • modelopt/torch/prune/importance_hooks/base_hooks.py
  • modelopt/torch/prune/importance_hooks/expert_removal_hooks.py
  • modelopt/torch/puzzletron/README.md
  • modelopt/torch/puzzletron/__init__.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/__init__.py
  • modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py
  • modelopt/torch/puzzletron/activation_scoring/score_pruning_activations.py
  • modelopt/torch/puzzletron/anymodel/README.md
  • modelopt/torch/puzzletron/anymodel/__init__.py
  • modelopt/torch/puzzletron/anymodel/converter/__init__.py
  • modelopt/torch/puzzletron/anymodel/converter/convert_any_model.py
  • modelopt/torch/puzzletron/anymodel/converter/converter.py
  • modelopt/torch/puzzletron/anymodel/converter/converter_factory.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/__init__.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor_factory.py
  • modelopt/torch/puzzletron/anymodel/models/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_converter.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.py
  • modelopt/torch/puzzletron/anymodel/models/llama/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py
  • modelopt/torch/puzzletron/anymodel/models/llama/llama_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_converter.py
  • modelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_converter.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_converter.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_8b/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_8b/qwen3_8b_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_8b/qwen3_8b_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/__init__.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/qwen3_vl_30b_a3b_instruct_converter.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/qwen3_vl_30b_a3b_instruct_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/__init__.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/no_op.py
  • modelopt/torch/puzzletron/anymodel/puzzformer/utils.py
  • modelopt/torch/puzzletron/build_library_and_stats.py
  • modelopt/torch/puzzletron/dataset/__init__.py
  • modelopt/torch/puzzletron/dataset/prepare_dataset.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/__init__.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.py
  • modelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.py
  • modelopt/torch/puzzletron/export/mbridge/__init__.py
  • modelopt/torch/puzzletron/export/mbridge/base.py
  • modelopt/torch/puzzletron/export/mbridge/distillation_provider.py
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
  • modelopt/torch/puzzletron/export/mbridge/llama.py
  • modelopt/torch/puzzletron/export/mbridge/qwen3.py
  • modelopt/torch/puzzletron/mip/mip_and_realize_models.py
  • modelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/mip/sweep.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/nas/plugins/puzzletron_nas_plugin.py
  • modelopt/torch/puzzletron/pruning/expert_removal_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/ffn_intermediate_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_ckpts.py
  • modelopt/torch/puzzletron/pruning/pruning_mixin.py
  • modelopt/torch/puzzletron/pruning/pruning_utils.py
  • modelopt/torch/puzzletron/puzzletron.py
  • modelopt/torch/puzzletron/replacement_library/build_replacement_library.py
  • modelopt/torch/puzzletron/replacement_library/replacement_library.py
  • modelopt/torch/puzzletron/replacement_library/replacement_utils.py
  • modelopt/torch/puzzletron/scoring/scoring.py
  • modelopt/torch/puzzletron/sewing_kit/__init__.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/sewing_kit/passage/__init__.py
  • modelopt/torch/puzzletron/sewing_kit/passage/core.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/__init__.py
  • modelopt/torch/puzzletron/tools/bypassed_training/child_init.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils.py
  • modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py
  • modelopt/torch/puzzletron/tools/common.py
  • modelopt/torch/puzzletron/tools/hydra_utils.py
  • modelopt/torch/puzzletron/tools/kd_model.py
  • modelopt/torch/puzzletron/tools/logger.py
  • modelopt/torch/puzzletron/tools/post_init_sparse.py
  • modelopt/torch/puzzletron/tools/robust_json.py
  • modelopt/torch/puzzletron/tools/sharded_checkpoint_utils.py
  • modelopt/torch/puzzletron/tools/validate_model.py
  • modelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.py
  • modelopt/torch/puzzletron/tools/validation_utils.py
  • modelopt/torch/puzzletron/utils/checkpoint_manager.py
  • modelopt/torch/puzzletron/utils/data/dataloaders.py
  • modelopt/torch/puzzletron/utils/data/dataset.py
  • modelopt/torch/puzzletron/utils/dummy_modules.py
  • modelopt/torch/puzzletron/utils/parsing.py
  • modelopt/torch/puzzletron/utils/utils.py
  • modelopt/torch/puzzletron/utils/validate_runtime_pipeline.py
  • modelopt/torch/puzzletron/utils/validation.py
  • modelopt/torch/utils/robust_json.py
  • pyproject.toml
  • tests/_test_utils/torch/puzzletron/utils.py
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
  • tests/gpu/torch/conftest.py
  • tests/gpu/torch/puzzletron/conftest.py
  • tests/gpu/torch/puzzletron/nas/plugins/test_nas_convert.py
  • tests/gpu/torch/puzzletron/nas/plugins/test_nas_search.py
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen2.5-7B-Instruct/Qwen2.5-7B-Instruct.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen2.5-7B-Instruct/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/Qwen3-8B.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/Qwen3-VL-30B-A3B-Instruct.yaml
  • tests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/pruning/expert_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/Llama-3.1-8B-Instruct-attn-pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/Llama-3.1-8B-Instruct.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/pruning/attn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct.yaml
  • tests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.2-3B-Instruct/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/mistralai/Mistral-Small-24B-Instruct-2501/Mistral-Small-24B-Instruct-2501.yaml
  • tests/gpu/torch/puzzletron/resources/configs/mistralai/Mistral-Small-24B-Instruct-2501/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/pruning/expert_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-Nano-12B-v2/NVIDIA-Nemotron-Nano-12B-v2.yaml
  • tests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-Nano-12B-v2/pruning/ffn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/openai/gpt-oss-20b/gpt-oss-20b.yaml
  • tests/gpu/torch/puzzletron/resources/configs/openai/gpt-oss-20b/pruning/expert_removal.yaml
  • tests/gpu/torch/puzzletron/resources/configs/pruning/attn_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/pruning/ffn_pruning_base.yaml
  • tests/gpu/torch/puzzletron/resources/configs/pruning/hidden_dim_pruning.yaml
  • tests/gpu/torch/puzzletron/resources/configs/pruning/pruning_defaults.yaml
  • tests/gpu/torch/puzzletron/resources/configs/validate_model_defaults.yaml
  • tests/gpu/torch/puzzletron/resources/configs/validate_solutions_defaults.yaml
  • tests/gpu/torch/puzzletron/resources/tokenizer/special_tokens_map.json
  • tests/gpu/torch/puzzletron/resources/tokenizer/tokenizer.json
  • tests/gpu/torch/puzzletron/resources/tokenizer/tokenizer_config.json
  • tests/gpu/torch/puzzletron/resources/tokenizer/truncate_tokenizer.py
  • tests/gpu/torch/puzzletron/test_puzzletron.py
  • tox.ini

Comment on lines +5 to +10
solutions_to_validate:
skip_validation: false
save_models: false
bigger_is_better: false
sort_solutions_by:
calculate_full_score_ablations: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python -m pip install --quiet pyyaml
python - <<'PY'
from pathlib import Path
import yaml

p = Path("examples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yaml")
data = yaml.safe_load(p.read_text())

print("Top-level keys:", list(data.keys()))
print("solutions_to_validate value:", data.get("solutions_to_validate"))
for k in ["skip_validation", "save_models", "bigger_is_better", "sort_solutions_by"]:
    print(f"top-level has '{k}':", k in data)
PY

Repository: NVIDIA/Model-Optimizer

Length of output: 413


Indentation bug: solutions_to_validate block is not nested.

Lines 6-10 define keys at the top level instead of under solutions_to_validate, which will cause the solutions_to_validate key to be null rather than a dictionary. This breaks any config consumption expecting solutions_to_validate.skip_validation etc.

🐛 Proposed fix
 solutions_to_validate:
-skip_validation: false
-save_models: false
-bigger_is_better: false
-sort_solutions_by:
-calculate_full_score_ablations: false
+  skip_validation: false
+  save_models: false
+  bigger_is_better: false
+  sort_solutions_by:
+    calculate_full_score_ablations: false
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
solutions_to_validate:
skip_validation: false
save_models: false
bigger_is_better: false
sort_solutions_by:
calculate_full_score_ablations: false
solutions_to_validate:
skip_validation: false
save_models: false
bigger_is_better: false
sort_solutions_by:
calculate_full_score_ablations: false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yaml`
around lines 5 - 10, The YAML file has an indentation bug: the keys
skip_validation, save_models, bigger_is_better, sort_solutions_by, and
calculate_full_score_ablations must be nested under the solutions_to_validate
mapping; update the block so that solutions_to_validate is a mapping containing
those keys (i.e., indent those five keys one level under solutions_to_validate)
so that code referencing solutions_to_validate.skip_validation etc. sees a
dictionary rather than null.

Comment on lines +395 to +423
def _assert_valid_config(args, puzzle_profile):
required_args = (
"subblock_stats_path",
"objective",
"output_path",
)
missing_args = [arg for arg in required_args if arg not in args or getattr(args, arg) is None]
if missing_args:
mprint(f"error: The following arguments are required: {', '.join(missing_args)}")
exit(1)

# Make sure we have specified subblock_stats_args
if "subblock_stats_args" not in args and "subblock_stats_args" not in puzzle_profile:
mprint(
"error: Must specify `subblock_stats_arrs` in either puzzle_profile or as a commandline arg."
)
exit(1)

# Make sure we have specified constraints
if (
"mip_constraints" not in args
and "human_constraints" not in args
and "mip_constraints" not in puzzle_profile
and "human_constraints" not in puzzle_profile
):
mprint(
"error: Must specify either `mip_constraints` or `human_constraints` in one of puzzle_profile or as a commandline argument."
)
exit(1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

The CLI validation path crashes on argparse.Namespace.

These membership checks use arg in args, but Namespace is not iterable. Running the script via __main__ therefore throws TypeError before any friendly validation message is emitted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/mip/run_puzzle.py` around lines 395 - 423, The
validation uses membership checks like "subblock_stats_args" in args and
"mip_constraints" in args but args is an argparse.Namespace (not iterable),
which raises TypeError; update _assert_valid_config to use getattr(args,
'<name>', None) checks instead of "in args" (e.g., replace
`"subblock_stats_args" not in args` with `getattr(args, "subblock_stats_args",
None) is None`) and similarly for `"mip_constraints"` and `"human_constraints"`;
keep the existing checks against puzzle_profile (which may be a dict) unchanged
and ensure the missing_args logic still uses getattr as it already does.

Comment on lines +624 to +677
process_group = node.target.process_group
peers = node.target.peer_rank
if not isinstance(peers, Sequence):
peers = [peers]

if len(values_to_node) > 0:
items_to_send = list(self.create_output_overrides(values_to_node).items())

data_descriptors: list[RemoteDataDescriptor] = []
tensors_to_send: list[torch.Tensor] = []

for key, value in items_to_send:
if isinstance(value, torch.Tensor):
if value.is_cuda:
tensor_device = "cuda"
elif value.is_cpu:
tensor_device = "cpu"
else:
raise RuntimeError(
f"Invalid tensor device to send to remote target: {value.device}"
)

data_descriptor = RemoteTensorDataDescriptor(
key=key,
device=tensor_device,
dtype=value.dtype,
shape=value.shape,
)
tensors_to_send.append(value)

else:
data_descriptor = RemotePythonDataDescriptor(
key=key,
value=value,
)

data_descriptors.append(data_descriptor)

works: list[Optional[torch.distributed.Work]] = []
for peer in peers:
if process_group is not None:
peer = torch.distributed.get_global_rank(process_group, peer)

peer_works = distributed_isend_obj(data_descriptors, dst=peer)
works.extend(peer_works)

for tensor in tensors_to_send:
work = torch.distributed.isend(tensor, dst=peer)
works.append(work)

if node.target.blocking:
for work in works:
if work is not None:
work.wait()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# First, explore the file structure and locate the target file
find . -name "core.py" -path "*puzzletron/sewing_kit*" | head -20

Repository: NVIDIA/Model-Optimizer

Length of output: 167


🏁 Script executed:

# Check the file size and read the relevant sections
wc -l modelopt/torch/puzzletron/sewing_kit/core.py

Repository: NVIDIA/Model-Optimizer

Length of output: 114


🏁 Script executed:

# Read the specific lines mentioned in the review (624-677 and 687-718)
sed -n '620,730p' modelopt/torch/puzzletron/sewing_kit/core.py | cat -n

Repository: NVIDIA/Model-Optimizer

Length of output: 6346


🏁 Script executed:

# Search for all uses of distributed_isend_obj, distributed_recv_obj in the file
rg "distributed_(isend_obj|recv_obj|isend|irecv)" modelopt/torch/puzzletron/sewing_kit/core.py -B2 -A2

Repository: NVIDIA/Model-Optimizer

Length of output: 698


🏁 Script executed:

# Check the function signatures - find where these are imported from
rg "from.*distributed_isend_obj|import.*distributed_isend_obj" modelopt/torch/puzzletron/sewing_kit/ -B2 -A2

Repository: NVIDIA/Model-Optimizer

Length of output: 445


🏁 Script executed:

# Check if there's a utils module or similar that defines distributed_isend_obj
find modelopt/ -name "*.py" -type f -exec rg -l "def distributed_isend_obj|def distributed_recv_obj" {} \;

Repository: NVIDIA/Model-Optimizer

Length of output: 111


🏁 Script executed:

# Check the function signatures in utils.py
sed -n '1,200p' modelopt/torch/puzzletron/sewing_kit/utils.py | grep -A 15 "def distributed_isend_obj\|def distributed_recv_obj"

Repository: NVIDIA/Model-Optimizer

Length of output: 48


🏁 Script executed:

# Get full definitions of both functions
rg "def distributed_(isend_obj|recv_obj)" modelopt/torch/puzzletron/sewing_kit/utils.py -A 20

Repository: NVIDIA/Model-Optimizer

Length of output: 1582


🏁 Script executed:

# Check PyTorch distributed documentation - search for torch.distributed.isend signature
rg "torch.distributed.isend|torch.distributed.irecv" modelopt/torch/puzzletron/sewing_kit/ -B2 -A8

Repository: NVIDIA/Model-Optimizer

Length of output: 3244


🏁 Script executed:

# Search for any examples in the codebase that use group parameter with these functions
rg "distributed_isend_obj.*group|torch.distributed.isend.*group|torch.distributed.irecv.*group" modelopt/

Repository: NVIDIA/Model-Optimizer

Length of output: 652


🏁 Script executed:

# Search for any other potential P2P call sites in the file that might have similar issues
rg "distributed.isend|distributed.irecv|distributed.send|distributed.recv|distributed_isend|distributed_recv" modelopt/torch/puzzletron/sewing_kit/core.py -n

Repository: NVIDIA/Model-Optimizer

Length of output: 483


Pass group=process_group to all distributed P2P calls.

The code extracts process_group and converts local peer ranks to global ranks when process_group is not None (lines 646 and 688 via torch.distributed.get_global_rank()), but the actual distributed_isend_obj(), isend(), distributed_recv_obj(), and irecv() calls at lines 667, 671, 690, and 713 do not receive the group parameter. This causes all communication to use the default process group instead of the specified subgroup, resulting in hangs or communication with the wrong peers.

🛠️ Suggested fix
-                            peer_works = distributed_isend_obj(data_descriptors, dst=peer)
+                            peer_works = distributed_isend_obj(
+                                data_descriptors,
+                                dst=peer,
+                                group=process_group,
+                            )
@@
-                                work = torch.distributed.isend(tensor, dst=peer)
+                                work = torch.distributed.isend(
+                                    tensor,
+                                    dst=peer,
+                                    group=process_group,
+                                )
@@
-                        data_descriptors = distributed_recv_obj(src=peer)
+                        data_descriptors = distributed_recv_obj(
+                            src=peer,
+                            group=process_group,
+                        )
@@
-                            work = torch.distributed.irecv(tensor, src=peer)
+                            work = torch.distributed.irecv(
+                                tensor,
+                                src=peer,
+                                group=process_group,
+                            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/sewing_kit/core.py` around lines 624 - 677, The P2P
calls are missing the process group argument so they use the default group;
update all distributed send/recv calls to pass group=process_group where
available: add group=process_group to calls to distributed_isend_obj(...) and
distributed_recv_obj(...), and to torch.distributed.isend(...) and
torch.distributed.irecv(...); keep the existing logic that maps local->global
ranks via torch.distributed.get_global_rank(process_group, peer) and only pass
group when process_group is not None so subgroup communication uses the correct
group.

Comment on lines +405 to +416
def _load_subblock_configs_from_subblock_library(master_puzzle_dir: Path) -> list[SubblockConfig]:
subblocks_df = pd.read_json(master_puzzle_dir / "subblock_library.json")
subblocks_df["attention_config"] = subblocks_df["attention_config"].apply(
partial(_dataclass_from_dict, cls=AttentionConfig)
)
subblocks_df["ffn_config"] = subblocks_df["ffn_config"].apply(
partial(_dataclass_from_dict, cls=FFNConfig)
)
attention_configs = subblocks_df["attention_config"].dropna().drop_duplicates().tolist()
ffn_configs = subblocks_df["ffn_config"].dropna().drop_duplicates().tolist()
subblock_configs = attention_configs + ffn_configs
return subblock_configs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Return the same indexed shape from the fallback loader.

This fallback returns bare AttentionConfig/FFNConfig instances, but calculate_subblock_stats() later indexes each item as ["subblock_config"] and ["parent_layer_indices"]. When replacement_library.json is absent, this path fails immediately instead of producing stats.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
405 - 416, The fallback loader _load_subblock_configs_from_subblock_library
currently returns raw AttentionConfig/FFNConfig objects but callers
(calculate_subblock_stats) expect list entries with keys/fields
"subblock_config" and "parent_layer_indices" (i.e. SubblockConfig-shaped items);
change the function to construct and return SubblockConfig instances (or dicts
matching SubblockConfig) for each unique attention/ffn config, populating
subblock_config with the AttentionConfig/FFNConfig and setting
parent_layer_indices to a sensible default (e.g. empty list) or the original
parent indices if present in the JSON, so the returned list matches the same
indexed shape used by calculate_subblock_stats. Ensure the function name
_load_subblock_configs_from_subblock_library and the
SubblockConfig/AttentionConfig/FFNConfig symbols are used to locate and
instantiate the correct dataclass.

Comment on lines +425 to +443
# Prepare arguments for parallel processing
process_layer_partial = partial(
_process_single_layer,
pruning_mixin=pruning_mixin,
descriptor=descriptor,
parent_state_dict=original_state_dict,
new_state_dict=new_state_dict,
original_config=original_config,
new_config=new_config,
gqa_init_mode=gqa_init_mode,
mlp_init_mode=mlp_init_mode,
mlp_init_config=mlp_init_config,
linear_init_mode=linear_init_mode,
ignored_keys=ignored_keys,
keys=keys,
is_original_mha=is_original_mha,
head_size=head_size,
hidden_size=hidden_size,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Don't share a mutable keys map across worker threads.

create_child_state_dict() passes the same keys dict into every worker, and _init_mlp() mutates it with keys.pop(...). That introduces a real race in the fallback path: workers can remove each other's entries, yielding nondeterministic KeyErrors or copying the wrong leftovers into the final state dict.

Also applies to: 452-455, 694-696

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/bypassed_training/child_init.py` around lines
425 - 443, The issue is that the same mutable keys dict is shared across worker
tasks causing races when _init_mlp mutates it; update each partial so it
receives a worker-local copy of keys (e.g., pass keys.copy() or dict(keys))
instead of the original dict when creating process_layer_partial (and the other
partials at the other sites noted), and ensure
create_child_state_dict/_process_single_layer/_init_mlp keep using their
existing names (create_child_state_dict, _process_single_layer, _init_mlp) but
operate on the copied keys to avoid cross-thread pops and nondeterministic
KeyError or wrong-state copying.

Comment on lines +335 to +360
def _write_file_process_safe(
content: Any,
path: Path | str,
write_fn: Callable[[Any, BinaryIO], None] = _write_text,
) -> None:
"""
Write a file in a multi-process safe way.
If another process tries to write the same file using this method, the current process
"gives up" and assumes that the matter is being taken care of by another process.

write_fn is a function that receives file contents and a binary file object,
and writes the content to the file. It can be _write_text (defined above), or torch.save,
or a similar function (not safetensors.torch.save_file since it expects a path).
"""
with open(path, "wb") as f:
# Try to acquire an exclusive, non-blocking lock
try:
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
except BlockingIOError:
return # Exit immediately if the lock is not acquired

write_fn(content, f) # Write the content if lock is acquired
f.flush() # Ensure data is written to disk

# Release the lock
fcntl.flock(f, fcntl.LOCK_UN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Acquire the file lock before truncating the index file.

open(path, "wb") truncates the file before flock() runs. If another process already owns the lock, this contender still zeroes the file and then returns, so the writer you intended to protect can lose or corrupt the index.

🛠️ Proposed fix
-    with open(path, "wb") as f:
+    with open(path, "ab+") as f:
         # Try to acquire an exclusive, non-blocking lock
         try:
             fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
         except BlockingIOError:
             return  # Exit immediately if the lock is not acquired
-
+        f.seek(0)
+        f.truncate()
         write_fn(content, f)  # Write the content if lock is acquired
         f.flush()  # Ensure data is written to disk
+        os.fsync(f.fileno())
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 335 -
360, The current _write_file_process_safe opens the file with "wb" which
truncates before acquiring the flock, risking data loss; change the open/lock
sequence so you acquire the exclusive non-blocking lock before
truncating/writing: open the file in a non-truncating mode that still creates
the file if missing (e.g., "a+b"), immediately acquire fcntl.flock(f,
fcntl.LOCK_EX | fcntl.LOCK_NB), return on BlockingIOError, then seek to start
and f.truncate(0) (or otherwise reset) before calling write_fn(content, f),
flushing, and finally releasing the lock in _write_file_process_safe. Ensure you
still support write_fn semantics and file creation behavior.

Comment on lines +38 to +45
warnings.filterwarnings("ignore", "You are using `torch.load` with `weights_only=False`*.")


def load_state_dict(checkpoint_dir: Path | str) -> dict[str, torch.Tensor]:
checkpoint_dir = _normalize_checkpoint_dir(checkpoint_dir)

if (state_dict_path := checkpoint_dir / STATE_DICT_FILE_NAME).exists():
return torch.load(state_dict_path, map_location="cpu", weights_only=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

cat -n modelopt/torch/puzzletron/tools/checkpoint_utils.py | head -100

Repository: NVIDIA/Model-Optimizer

Length of output: 4528


🏁 Script executed:

cat pyproject.toml | grep -A 5 "torch"

Repository: NVIDIA/Model-Optimizer

Length of output: 1584


🏁 Script executed:

rg "trust_remote_code" modelopt/torch/puzzletron/tools/checkpoint_utils.py -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 810


Do not force pickle-based torch.load() for state dicts.

These paths explicitly set weights_only=False and then globally suppress the warning. That re-enables arbitrary code execution for untrusted checkpoints even though these files are plain tensor state dicts. Use weights_only=True; if a specific path truly needs pickle, document why that file is trusted inline instead of hiding the warning.

Suggested fix
-warnings.filterwarnings("ignore", "You are using `torch.load` with `weights_only=False`*.")
-
-
 def load_state_dict(checkpoint_dir: Path | str) -> dict[str, torch.Tensor]:
@@
     if (state_dict_path := checkpoint_dir / STATE_DICT_FILE_NAME).exists():
-        return torch.load(state_dict_path, map_location="cpu", weights_only=False)
+        return torch.load(state_dict_path, map_location="cpu", weights_only=True)
@@
     if len(torch_paths) != 0:
-        load_fn = partial(torch.load, map_location="cpu", weights_only=False)
+        load_fn = partial(torch.load, map_location="cpu", weights_only=True)
         file_paths = torch_paths

As per coding guidelines, "Do not use torch.load(..., weights_only=False) unless a documented exception is provided."

Also applies to: 74-80

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py` around lines 38 - 45,
The code is forcing pickle-based torch.load by setting weights_only=False and
globally suppressing the warning; update all torch.load calls in this file
(e.g., load_state_dict and the similar optimizer/state loaders around lines
74-80) to use weights_only=True for plain tensor state dicts, remove the global
warnings.filterwarnings line, and only allow weights_only=False on a per-call
basis with an inline, documented justification (no global suppression) when a
trusted file genuinely requires pickle semantics.

Comment on lines +161 to +177
def copy_tokenizer(
source_dir_or_tokenizer_name: Path | str,
target_dir: Path | str,
on_failure: Literal["raise", "warn"] = "raise",
) -> None:
"""Prefer loading the tokenizer from huggingface hub (when tokenizer_name.txt file is available)
to avoid collision between transformers versions.
"""
source_tokenizer_name_path = Path(source_dir_or_tokenizer_name) / "tokenizer_name.txt"
if source_tokenizer_name_path.exists():
source_dir_or_tokenizer_name = source_tokenizer_name_path.read_text().strip()

tokenizer = None
try:
tokenizer = AutoTokenizer.from_pretrained(
source_dir_or_tokenizer_name, trust_remote_code=True
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find all calls to copy_tokenizer() to understand its usage pattern
rg "copy_tokenizer" --type py -B 3 -A 3

Repository: NVIDIA/Model-Optimizer

Length of output: 3915


🏁 Script executed:

# Check the full function signature and surrounding context
fd "checkpoint_utils.py" --type py | head -1 | xargs cat -n | sed -n '155,185p'

Repository: NVIDIA/Model-Optimizer

Length of output: 235


🏁 Script executed:

# Search for ModelDescriptor classes in puzzletron to understand the pattern
fd "descriptor" --type py | xargs rg "class.*Descriptor|requires_trust_remote_code" -B 2 -A 5 2>/dev/null | head -100

Repository: NVIDIA/Model-Optimizer

Length of output: 7792


🏁 Script executed:

# Get the full function definition and docstring
cat -n modelopt/torch/puzzletron/tools/checkpoint_utils.py | sed -n '161,210p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1690


🏁 Script executed:

# Check if copy_tokenizer is ever passed a descriptor or trust_remote_code-related parameter
rg "copy_tokenizer\(" --context 5

Repository: NVIDIA/Model-Optimizer

Length of output: 3576


copy_tokenizer() should not hard-code trust_remote_code=True.

This helper accepts arbitrary local paths or Hub IDs without any guarantee of trustworthiness, enabling execution of repository-provided Python during tokenizer loading. Add a trust_remote_code: bool = False parameter and pass the descriptor's setting from callers, matching the pattern used by load_model_config() in the same codebase.

Suggested fix
 def copy_tokenizer(
     source_dir_or_tokenizer_name: Path | str,
     target_dir: Path | str,
     on_failure: Literal["raise", "warn"] = "raise",
+    trust_remote_code: bool = False,
 ) -> None:
@@
     try:
         tokenizer = AutoTokenizer.from_pretrained(
-            source_dir_or_tokenizer_name, trust_remote_code=True
+            source_dir_or_tokenizer_name, trust_remote_code=trust_remote_code
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py` around lines 161 - 177,
The copy_tokenizer function currently hard-codes
AutoTokenizer.from_pretrained(..., trust_remote_code=True); change the signature
of copy_tokenizer to accept a trust_remote_code: bool = False parameter and pass
that variable into AutoTokenizer.from_pretrained instead of True, and update any
callers to forward the descriptor's trust_remote_code setting (same pattern used
by load_model_config) so tokenizer loading honors the caller-provided trust flag
rather than always allowing remote code execution.

Comment on lines +72 to +77
elif (
(ranks == "main" and self.global_rank != 0)
or (ranks == "last" and self.local_rank != self.world_size - 1)
or (ranks == "local_main" and self.local_rank != 0)
):
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for how lprint/last rank logging is used in the codebase
rg -n "lprint|ranks.*last" --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 568


🏁 Script executed:

cd modelopt/torch/puzzletron/tools && head -100 logger.py | tail -60

Repository: NVIDIA/Model-Optimizer

Length of output: 2338


🏁 Script executed:

cat -n modelopt/torch/puzzletron/tools/logger.py | sed -n '50,90p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1838


🏁 Script executed:

rg -n "lprint\(" --type=py -A 2 -B 2 | head -50

Repository: NVIDIA/Model-Optimizer

Length of output: 380


🏁 Script executed:

rg -n "local_world_size|LOCAL_WORLD_SIZE" --type=py

Repository: NVIDIA/Model-Optimizer

Length of output: 160


Fix "last" rank logic to use global_rank instead of local_rank.

Line 74 compares self.local_rank against self.world_size - 1, which is incorrect. The docstring states "last" should log from "rank -1 in node 0" (the last global rank), and the lprint() function's docstring confirms this intent. This requires comparing self.global_rank != self.world_size - 1 to match the semantics of "main" (which checks global_rank != 0). The current code mixes local_rank (rank within a node) with world_size (global count), causing the "last" rank logic to fail.

-            or (ranks == "last" and self.local_rank != self.world_size - 1)
+            or (ranks == "last" and self.global_rank != self.world_size - 1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/logger.py` around lines 72 - 77, The "last"
rank check uses self.local_rank against self.world_size - 1 which mixes per-node
rank with global world size; update the condition in the logger method (the
branch handling ranks == "last") to compare self.global_rank to self.world_size
- 1 (i.e., change the check from self.local_rank != self.world_size - 1 to
self.global_rank != self.world_size - 1) so the "last" semantics match the
"main"/global checks used by lprint/logger.

Comment on lines +239 to +241
tokenizer = AutoTokenizer.from_pretrained(
tokenizer_name or args.model_name_or_path, trust_remote_code=True
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CRITICAL: Hardcoded trust_remote_code=True violates security guidelines.

The AutoTokenizer.from_pretrained call hardcodes trust_remote_code=True, which enables arbitrary Python code execution from untrusted model sources. Per coding guidelines, this should be exposed as a caller-configurable parameter defaulting to False, or derived from the ModelDescriptor pattern (as the descriptor already provides requires_trust_remote_code()).

🔒 Proposed fix: Use descriptor's trust_remote_code setting
+from modelopt.torch.puzzletron.anymodel.model_descriptor import ModelDescriptorFactory
+
 def prepare_dataloader(
-    args: DictConfig, tokenizer: PreTrainedTokenizerBase | None = None
+    args: DictConfig, tokenizer: PreTrainedTokenizerBase | None = None, trust_remote_code: bool = False
 ) -> DataLoader:
     if tokenizer is None:
         tokenizer_name = getattr(args, "tokenizer_name", None)
         assert (tokenizer_name is not None) or (args.model_name_or_path is not None)
         tokenizer = AutoTokenizer.from_pretrained(
-            tokenizer_name or args.model_name_or_path, trust_remote_code=True
+            tokenizer_name or args.model_name_or_path, trust_remote_code=trust_remote_code
         )

Then in validate_model, derive trust_remote_code from the descriptor:

trust_remote_code = descriptor.requires_trust_remote_code()
val_dataloader = prepare_dataloader(args, tokenizer, trust_remote_code=trust_remote_code) if dist.is_master() else None

Based on learnings: "trust_remote_code should be determined by the ModelDescriptor pattern, not by a parameter passed to convert_model()."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/puzzletron/tools/validate_model.py` around lines 239 - 241,
Replace the hardcoded trust_remote_code=True in the
AutoTokenizer.from_pretrained call inside validate_model by deriving the flag
from the ModelDescriptor (call descriptor.requires_trust_remote_code()); assign
that result to a local trust_remote_code variable and pass it into
AutoTokenizer.from_pretrained and any downstream callers such as
prepare_dataloader (e.g., prepare_dataloader(...,
trust_remote_code=trust_remote_code)) so that trust_remote_code is determined by
the descriptor and not hardcoded.

@codecov
Copy link

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.19%. Comparing base (291498b) to head (c5ec50b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1121      +/-   ##
==========================================
- Coverage   70.21%   70.19%   -0.02%     
==========================================
  Files         228      228              
  Lines       25952    25956       +4     
==========================================
  Hits        18221    18221              
- Misses       7731     7735       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants