Merge puzzletron compression algorithm#1121
Conversation
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…iv.org/abs/2411.19146 (#464) Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? Add decilm modelling code --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Compress tutorial (PoC) + compress cli app. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Signed-off-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Signed-off-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Liana Mikaelyan <lmikaelyan@nvidia.com> Co-authored-by: Liana Mikaelyan <45925959+LianaMikael@users.noreply.github.com>
#545) ## What does this PR do? Add llama converter (no dependency on internal Nvidia code) - part 1/2 - change top-level dependencies in convert_llama3_to_decilm.py from puzzle_tools.... to modelopt..... - added modelopt.torch._compress.tools module - remove tokenization_mistral.py - not used scope of 2/2 part (will come once part 1/2 is merged): - change all deeper dependencies from from puzzle_tools.... to modelopt.... - test_convert_llama3_config_to_decilm_config.py should run without any internal nvidia dependencies --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ia code) (#552) ## What does this PR do? llama converter is self-contained now (no dependency on internal nvidia code) --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Make score pruning activations self-contained (no dependency on internal Nvidia code) - 1/6 - add integration test for attention pruning (pre-step to make adding activation scoring safer) --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? - Add score_pruning_activations.py Notes: - validate_model.py still depends on Nvidia internal code (will be changed in the subsequent MR) - sharded_checkpoint_utils.py - for now it needs to use DeciLM from internal Nvidia code, to be changed in the next MR --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? Add activation hooks used for pruning --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
…ng is self-contained now (#584) ## What does this PR do? Add sewing kit and utilities used for pruning scoring - pruning scoring is self-contained now - no dependency on internal Nvidia code. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do?
- Add L2NormHook and use it in megatron.py
- Using L2NormHook removes code duplication between
_DynamicSelfAttention and _DynamicMLP
This is the first step towards reusing activation scores logic across
Minitron and Puzzle. Next steps:
- complete redesign of megatron.py - move other activation hooks logic
to hooks.py
- then combined those hooks.py with a similar hooks.py functoriality in
puzzle
(modelopt/torch/_compress/activation_scoring/activation_hooks/hooks.py)
Questions:
- why in the code before and after this redesign we store temp variables
in two ways _register_temp_attribute and self.hook_handle)?
```
self._register_temp_attribute("_activation_hook", activation_hook)
# TODO: confusion: why hook_handle is removed manually in export() and not using _register_temp_attribute?
self.hook_handle = self.linear_fc2.register_forward_hook(activation_hook)
```
---------
Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com>
Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? Add pruning checkpoints for the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add build replacement library to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add subblock stats to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add 1-block scoring to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…utionHook (#610) ## What does this PR do? Add checkpoint save/load to ForwardHook + add IterativeChannelContributionHook. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
## What does this PR do? Add MIP step to the compress algorithm. --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
## What does this PR do? **Type of change:** Improvement <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> - Fix tests for 2-gpu: Some places hard-coded cpu device for distributed communications which was causing this issue - Remove unused constrain_search_space.py - Remove `is_multi_layer_puzzle: False` case - Remove `use_greedy_search: False` case - Remove knapsack mip case - Remove unused `num_solutions` and `minimal_diversity` flags ## Testing <!-- Mention how have you tested your change if applicable. --> - GH CICD test passing - Tested on 2-gpu setup locally as well <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit # Release Notes * **Refactor** * Optimized solver implementation with improved library integration. * Simplified model compression configuration by removing deprecated search options. * Consolidated optimization paths for streamlined processing. * **Chores** * Updated dependencies for improved compatibility. * **Documentation** * Clarified Model-Optimizer installation instructions in examples. <sub>✏️ Tip: You can customize this high-level summary in your review settings.</sub> <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…hooks analysis (#670) ## What does this PR do? Fix a bug in IterativeChannelContributionHook + tools for activation hooks analysis --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Daniel Korzekwa <daniel.korzekwa@gmail.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
…unctions (#667) ## What does this PR do? - Remove `runtime.py` and directly use `modelopt.torch.utils.distributed` functions - Read model_dtype and autocast_dtype from validate_model_defaults.yaml instead of runtime object - Remove more unused functions - Remove unnecessary parse_args for intermediate steps, and improve docstrings ## Testing <!-- Mention how have you tested your change if applicable. --> - No change in functionality - Existing tests passing --------- Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do? Merging dkorzekwa/anymodel_build_library_and_stats into dkorzekwa/anymodel_pruning - this MR is only for reviewing. Ultimately dkorzekwa/anymodel_build_library_and_stats should be merged into feature/puzzletron once dkorzekwa/anymodel_pruning scoring is merged there. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Integrated language-model descriptors to improve compatibility and produce model-specific stats. * **Bug Fixes** * Re‑enabled the single‑process library build and stats workflow step. * **Documentation** * Expanded usage docs and CLI guidance for the replacement library builder. * **Refactor** * Improved typing/annotations and normalized key‑value head handling, affecting attention memory/parameter calculations and textual outputs. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do? Merging dkorzekwa/any_model_calc_one_block_scores into dkorzekwa/anymodel_build_library_and_stats - this MR is only for reviewing. Ultimately dkorzekwa/any_model_calc_one_block_scores should be merged into feature/puzzletron once dkorzekwa/anymodel_build_library_and_stats is merged there. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Enabled distributed scoring step in puzzletron workflow for improved performance. * **Improvements** * Extended model loading support to accommodate additional model formats and configurations. * Optimized GPU memory management during validation with improved cache synchronization. * Streamlined validation function signatures for easier usage. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do? Merging dkorzekwa/mip_and_realize_models into dkorzekwa/any_model_calc_one_block_scores - this MR is only for reviewing. Ultimately dkorzekwa/mip_and_realize_models should be merged into feature/puzzletron once dkorzekwa/any_model_calc_one_block_scores is merged there. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **New Features** * Enabled model realization step during compression workflow after scoring phase completes. * **Bug Fixes** * Fixed key-value head calculation in attention configuration sourcing. * **Tests** * Strengthened validation checks for compression artifacts and output directories; added rank-aware assertions for model compression expectations. * **Chores** * Minor documentation formatting updates. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do? Merging dkorzekwa/any_model_other_models into dkorzekwa/mip_and_realize_models - this MR is only for reviewing. Ultimately dkorzekwa/any_model_other_models should be merged into feature/puzzletron once dkorzekwa/mip_and_realize_models is merged there. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for multiple model architectures: Mistral Small, Nemotron H, Nemotron H v2, Qwen2, Qwen3 8B, and Qwen3 VL 30B. * Introduced new pruning configurations and optimization pipelines for supported models. * Added comprehensive model descriptor framework enabling automated weight conversion and configuration handling. * Extended support for Mixture of Experts (MoE) models with expert removal pruning capabilities. * **Tests** * Enhanced test coverage with parametrized configurations for multiple model variants. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do? Merging dkorzekwa/anymodel_gptoss into dkorzekwa/any_model_other_models - this MR is only for reviewing. Ultimately dkorzekwa/anymodel_gptoss should be merged into feature/puzzletron once dkorzekwa/any_model_other_models is merged there. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added support for GPT-OSS, Nemotron V2, Qwen2, and Qwen3 models. * Enabled MXFP4 quantization for GPT-OSS model compression. * Added expert removal pruning for mixture-of-experts (MoE) models. * **Bug Fixes** * Fixed padding token validation to ensure it doesn't exceed vocabulary size. * **Tests** * Enabled test coverage for GPT-OSS model workflows. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do? Merge any_model tutorial for Puzzletron. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * MIP sweep mode for multi-rate memory-compression searches * Triton-ready HuggingFace deployable, lm-eval adapter, and Ray-compatible inference pathways * Megatron-Bridge distillation CLI/workflow with optional HF export * **New Configurations** * Extensive pruning/memory-sweep profiles for GPT‑Oss, Llama, Mistral, Nemotron, Qwen families * **Documentation** * GptOss guide and conversion example, expanded READMEs, MIP Quick Start, NeMo evaluator notes * **Chores** * Requirements updated (lm-eval bumped; math-verify, ray added) <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do? Merge anymodel mbridge distillation <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** - Heterogeneous-layer support for Puzzletron AnyModel → Megatron-Core conversions - Knowledge-distillation workflow support and tooling - Export of trained models to HuggingFace format (new export helper) - Bridge support extended for Llama and Qwen3 architectures - New CLI options to control HuggingFace export destination and template model * **Tests** - Added integration test validating the distillation + export flow - Removed an obsolete trivial test * **Documentation** - Updated README to document HF export flags * **Chores** - CI: switched example-tests workflow to use NeMo and updated pip invocation <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
#1047) ### What does this PR do? The remaining difference between dkorzekwa/any_model and dkorzekwa/anymodel_mbridgedist <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added an end-to-end MIP sweep workflow that runs sweeps, collects metrics, and writes CSV reports. * **Improvements** * Simplified attention implementation selection and clarified attention display wording. * Realization pipeline now returns realized model solution paths. * Caching behavior adjusted for token ID retrieval. * Test model generation scales layer count with distributed size. * **Bug Fixes** * Checkpoint hook state restored using the correct load API. * **Chores** * Extended pre-commit license hook exclusion. * **Tests** * Added GPU/distributed test fixtures and dtype/checkout helpers. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com> Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com> Co-authored-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do? - Delete unused decilm code <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Removals** * Removed model conversion utilities for Llama-Nemotron format * Removed DeciLM model classes, tokenizer implementations, and configuration utilities * Removed checkpoint import/export functionality * Removed heterogeneous transformer layer specifications and configuration builders * **Updates** * Updated pre-commit configuration for additional file exclusions * Updated imports across modules to reflect removed dependencies <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do? Delete not used decilm code <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Removed DeciLM-specific components including decoder layers, attention implementations, and specialized cache utilities, streamlining the codebase * Updated replacement library to use generic model configurations instead of DeciLM-specific types, improving compatibility with diverse architectures * Cleaned up internal utilities for attention masking, flash attention compatibility, and rotary position embeddings <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do? Integration tests for subblock stats (memory + num_of_params) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Improved teacher model configuration loading and statistics computation for enhanced accuracy in memory and parameter calculations. * **Tests** * Added comprehensive tests validating teacher model memory and parameter statistics calculations with strict accuracy tolerances. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do? Refactoring of subblock stats to stop using DeciLM code and use anymodel instead. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Restructured internal estimation logic for model memory and parameters to support broader model architectures. * Updated model initialization utilities. * **Tests** * Updated baseline metrics for validation tests. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do? Removing unused code from modelopt/torch/puzzletron/decilm/deci_lm_hf_code - completed. <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Chores** * Removed vendored Transformers modules including activation functions, cache utilities, and configuration classes. * Removed custom DeciLM configuration class. * Updated internal type signatures from DeciLMConfig to standard Hugging Face PretrainedConfig. * Simplified internal function signatures by removing unused parameters. * Enhanced code quality checks via pre-commit configuration updates. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
### What does this PR do? Remove hardcoded trust_remote_code=true <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Documentation** * Clarified documentation for parameter counting in model layer calculations. * **Bug Fixes** * Remote code execution is now disabled by default in model configuration loading. Explicitly enable `trust_remote_code` when using custom modeling code. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
### What does this PR do? Use pruning importance hooks from prune/importance_hooks/ in modelopt.torch.puzzletron --------- Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
📝 WalkthroughWalkthroughIntroduces Puzzletron, a comprehensive LLM model compression algorithm featuring model conversion to a heterogeneous block-config format, activation scoring, multi-stage pruning (FFN/attention/expert/hidden-dim), MIP-based architecture search, and model realization. Includes AnyModel framework for supporting multiple model families, distributed execution infrastructure, evaluation tools, and extensive configuration examples for multiple model variants. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI as Main Entrypoint
participant Convert as Model Converter
participant Score as Activation Scoring
participant Prune as Pruning
participant Library as Replacement Library
participant MIP as MIP Optimizer
participant Realize as Model Realization
participant Evaluate as Evaluation
CLI->>Convert: 1. Convert HF model to Puzzletron format
Convert->>Convert: Apply ModelDescriptor, create block_configs
Convert-->>CLI: Converted heterogeneous checkpoint
CLI->>Score: 2. Score pruning activations
Score->>Score: Collect forward activations with hooks
Score-->>CLI: Activation logs per layer
CLI->>Prune: 3. Generate pruned child checkpoints
Prune->>Prune: For each pruning strategy (FFN/attention/expert)
Prune->>Prune: Initialize children from parent via mode
Prune-->>CLI: Multiple pruned checkpoint variants
CLI->>Library: 4. Build replacement library
Library->>Library: Deduplicate subblocks, organize by layer
Library->>Library: Create layer replacement mappings
Library-->>CLI: Replacement library + single sequence solution
CLI->>MIP: 5. Run MIP optimization
MIP->>MIP: Gather subblock stats + metrics
MIP->>MIP: Solve multi-layer selection (memory constraint)
MIP-->>CLI: Ranked solutions with objectives/costs
CLI->>Realize: 6. Realize models from solutions
Realize->>Realize: Load replacement library, select configs
Realize->>Realize: Assemble heterogeneous model
Realize-->>CLI: Final pruned HF checkpoints
CLI->>Evaluate: 7. Evaluate (lm-eval or NeMo Evaluator)
Evaluate->>Evaluate: Benchmark accuracy/throughput
Evaluate-->>CLI: Metrics (loss, token accuracy, speed)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes The PR introduces a substantial, multi-component compression pipeline with significant new abstractions (factory patterns, descriptor-based extensibility, MIP solving, graph stitching), distributed execution logic, and intricate initialization/pruning strategies. While some configurations are repetitive, the core infrastructure (converters, model descriptors, pruning mixins, MIP, replacement library, checkpoint utilities) contains dense logic and heterogeneous change patterns requiring separate reasoning per module. The high file count, variety of concerns (conversion, scoring, optimization, distillation, evaluation), and deep integration with existing modelopt patterns elevate complexity beyond simple refactoring. Possibly related PRs
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
|
|
There was a problem hiding this comment.
Actionable comments posted: 12
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟡 Minor comments (15)
modelopt/torch/puzzletron/tools/logger.py-164-168 (1)
164-168:⚠️ Potential issue | 🟡 MinorDocstring says "rank -1 in node 0" but implementation may not match.
The
lprintdocstring says "Last rank prints only (rank -1 in node 0)" but the implementation checkslocal_rank != world_size - 1. If the intent is the last rank globally, this logic appears incorrect. If the intent is the last local rank within a node, the comparison should be against local world size, not global world size.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/logger.py` around lines 164 - 168, The docstring for lprint mismatches its rank-check logic; update either the docstring or the implementation in lprint to match the intended behavior: if you want "last rank in node 0" (last local rank) change the check to compare local_rank against local_world_size - 1 (use the local world size variable used elsewhere in logger or from torch/distributed) before calling logger.dist_log, or if you intended the global last rank adjust the docstring to "last global rank" and ensure the check compares local_rank against world_size - 1 accordingly; reference the lprint function and logger.dist_log to locate where to change the comparison and/or docstring.modelopt/torch/puzzletron/tools/logger.py-62-66 (1)
62-66:⚠️ Potential issue | 🟡 MinorError message is inconsistent with valid choices.
The error message on line 65 lists
['all', 'main', 'local_main']but'last'is also a valid choice per line 62.🐛 Proposed fix
if ranks not in ["all", "main", "local_main", "last"]: raise NotImplementedError( f"Could not broadcast msg {msg} - " - f"ranks parameters choices are ['all', 'main', 'local_main']. Got {ranks}" + f"ranks parameters choices are ['all', 'main', 'local_main', 'last']. Got {ranks}" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/logger.py` around lines 62 - 66, The NotImplementedError message for the ranks check mentions the wrong set of valid choices; update the f-string in the ranks validation (the raise NotImplementedError block that references the variable ranks and msg) to include 'last' in the displayed list (i.e., ["all", "main", "local_main", "last"]) so the error accurately reflects the valid options.examples/pruning/README.md-10-10 (1)
10-10:⚠️ Potential issue | 🟡 MinorFix compound adjective hyphenation in docs.
At Line 10, use “MIP-based” (not “MIP based”) for correct technical phrasing.
✏️ Proposed edit
-1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using Mixed Integer Programming (MIP) based NAS search algorithm. +1. [Puzzletron](../puzzletron/README.md): An advanced pruning method by NVIDIA using a Mixed Integer Programming (MIP)-based NAS search algorithm.As per coding guidelines, "
**/*.{py,cpp,h,hpp,md}: All code (Python, C++, Markdown, etc.) must adhere to coding standards as checked by pre-commit hooks".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/pruning/README.md` at line 10, Update the compound adjective "MIP based" in the Puzzletron README entry to the hyphenated form "MIP-based" so the phrase reads "MIP-based NAS search algorithm"; locate the text string "Puzzletron" / the line containing "MIP based NAS search algorithm" and replace it with "MIP-based" to comply with the Markdown/style pre-commit checks.examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml-14-17 (1)
14-17:⚠️ Potential issue | 🟡 MinorFix the mismatch between
target_memoryand its comment.
16_000MiB is about15.6GiB;45GiB would be46_080MiB. One of these is wrong, and the current ambiguity can make the example optimize for the wrong budget.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yaml` around lines 14 - 17, The mip.human_constraints.target_memory value (target_memory: 16_000) and its comment (“45 GiB”) disagree; update one to match the other: either set target_memory to 46_080 (MiB) if you mean 45 GiB, or change the comment to match 16_000 MiB (~15.6 GiB). Locate the target_memory key under mip -> human_constraints and make the numeric value and the GiB comment consistent.examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml-14-17 (1)
14-17:⚠️ Potential issue | 🟡 MinorAlign the MiB value with the GiB comment.
90_000MiB is about87.9GiB, not90GiB. If the intended cap is 90 GiB, this should be92_160; otherwise update the comment so the search target is unambiguous.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yaml` around lines 14 - 17, The target_memory under mip.human_constraints is inconsistent with the "90 GiB" comment; update the value of target_memory to 92_160 (MiB) to represent 90 GiB, or alternatively change the comment to the actual GiB represented by 90_000 MiB (~87.9 GiB); edit the target_memory key in this YAML (mip -> human_constraints -> target_memory) accordingly to keep value and comment aligned.examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml-17-17 (1)
17-17:⚠️ Potential issue | 🟡 MinorMemory unit comment is inaccurate.
Line 17 uses
45_000MiB, which is ~43.95 GiB, not 45 GiB. Please update either the value or the comment to avoid misconfigured constraints.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yaml` at line 17, The comment next to the target_memory value is incorrect: target_memory: 45_000 is in MiB (~43.95 GiB), not 45 GiB; either set target_memory to 46_080 (45 * 1024) to represent 45 GiB exactly or update the inline comment to the correct GiB approximation (≈43.95 GiB); locate the target_memory entry and apply one of these fixes so the value and its comment match.modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py-36-52 (1)
36-52:⚠️ Potential issue | 🟡 MinorReturn type annotation mismatch: method declares
List[BlockConfig]but returnsList[dict].The method signature declares a return type of
List[BlockConfig], but line 50 calls.to_dict()on each BlockConfig instance, causing the method to actually returnList[dict]. Update the return type annotation to match the actual behavior.🔧 Suggested fix
`@staticmethod` - def create_block_configs_from_main_config(config: LlamaConfig) -> List[BlockConfig]: + def create_block_configs_from_main_config(config: LlamaConfig) -> List[dict]: """Create uniform block configs for all Llama layers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/models/llama/llama_converter.py` around lines 36 - 52, The function create_block_configs_from_main_config currently builds BlockConfig objects but calls .to_dict(), so its declared return type List[BlockConfig] is incorrect; update the signature to return List[dict] (or List[MutableMapping] if you prefer an abstract type) to match the actual returned value, and ensure the typing import (List) is present/adjusted accordingly; reference create_block_configs_from_main_config, LlamaConfig, BlockConfig, and the .to_dict() call when making the change.modelopt/torch/puzzletron/anymodel/converter/converter_factory.py-43-47 (1)
43-47:⚠️ Potential issue | 🟡 MinorHandle potential
Nonefrominspect.getmodule().
inspect.getmodule(ref)can returnNoneif the module cannot be determined, which would cause anAttributeErrorwhen accessing.__name__.🛡️ Proposed fix
- raise KeyError( - f"Could not register `{cls_name}`: {cls_type}, " - f"`{cls_name}` is already registered and points to " - f"`{inspect.getmodule(ref).__name__}.{ref.__name__}`" - ) + module = inspect.getmodule(ref) + module_name = module.__name__ if module else "<unknown>" + raise KeyError( + f"Could not register `{cls_name}`: {cls_type}, " + f"`{cls_name}` is already registered and points to " + f"`{module_name}.{ref.__name__}`" + )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/anymodel/converter/converter_factory.py` around lines 43 - 47, The KeyError message in converter_factory.py assumes inspect.getmodule(ref) always returns a module; update the error construction in the registration failure branch (where cls_name, cls_type and ref are used) to guard against None from inspect.getmodule(ref) by using a safe fallback (e.g., module = inspect.getmodule(ref); module_name = module.__name__ if module is not None else getattr(ref, "__module__", "unknown")) and use module_name rather than inspect.getmodule(ref).__name__; ensure ref.__name__ (or getattr(ref, "__name__", repr(ref))) is similarly accessed safely so the f-string does not raise AttributeError.modelopt/torch/puzzletron/puzzletron.py-60-74 (1)
60-74:⚠️ Potential issue | 🟡 MinorStep numbering is inconsistent.
Comments jump from "Step 2" to "Step 4", skipping "Step 3". This appears to be a typo or leftover from a refactor. Consider renumbering for clarity:
📝 Proposed fix
# Step 2: pruning_ckpts (single process) if dist.is_master(): pruning_ckpts.launch_prune_ckpt(hydra_cfg) dist.barrier() - # Step 4: build_library_and_stats (single process) + # Step 3: build_library_and_stats (single process) if dist.is_master(): build_library_and_stats.launch_build_library_and_stats(hydra_cfg) dist.barrier() - # Step 5: calc_one_block_scores (distributed processing) + # Step 4: calc_one_block_scores (distributed processing) scoring.launch_scoring(hydra_cfg) - # Step 6: mip_and_realize_models (distributed processing) + # Step 5: mip_and_realize_models (distributed processing) mip_and_realize_models.launch_mip_and_realize_model(hydra_cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/puzzletron.py` around lines 60 - 74, The step comments are misnumbered (jump from "Step 2" to "Step 4"); update the inline step comments in puzzletron.py to use a consistent sequence (e.g., Step 2 → Step 3 → Step 4 etc.) around the blocks that call pruning_ckpts.launch_prune_ckpt, build_library_and_stats.launch_build_library_and_stats, scoring.launch_scoring, and mip_and_realize_models.launch_mip_and_realize_model so the step numbers reflect the actual order of execution.examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml-20-31 (1)
20-31:⚠️ Potential issue | 🟡 MinorRemove unused
ffn_listdefinition or align withintermediate_size_listusage.The file serves as a defaults template for Hydra configuration, but there's an inconsistency:
ffn_listis defined on line 21 with no value, yet the actual FFN pruning configuration (ffn_pruning.yaml) populates and usesintermediate_size_listinstead. Either renameffn_listtointermediate_size_listfor consistency, or remove the unusedffn_listdefinition.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yaml` around lines 20 - 31, The defaults file defines an unused key ffn_list while the FFN pruning logic uses intermediate_size_list (see ffn_pruning.yaml); fix by either removing the ffn_list entry entirely or renaming it to intermediate_size_list and ensuring the init mode key (mlp_init_mode) matches the expected name used by ffn_pruning.yaml so the Hydra defaults align with the pruning config.modelopt/torch/puzzletron/tools/hydra_utils.py-29-36 (1)
29-36:⚠️ Potential issue | 🟡 MinorValidate
blockandmbsbefore division inwarmup_steps.
block=0ormbs=0causes a runtime failure. Add explicit positive-value checks and fail early with a clear error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/hydra_utils.py` around lines 29 - 36, The warmup_steps function currently divides by block and mbs which will raise on zero; update warmup_steps to validate that block and mbs are positive integers (e.g., block > 0 and mbs > 0) at the top of the function and raise a clear ValueError if not; keep the existing logic (compute steps = (tokens//block)//mbs, compute w = pct * steps, return max(1, round(w))) but fail fast with an explanatory error mentioning the parameter names block and mbs so callers get a clear message when invalid values are passed.modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py-46-47 (1)
46-47:⚠️ Potential issue | 🟡 MinorAvoid mutating caller-owned
activation_hooks_kwargs.This function modifies the input dict in place (
["model"] = model), which can leak state across calls. Copy it first before enrichment.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.py` around lines 46 - 47, The code mutates the caller-owned dict activation_hooks_kwargs by setting activation_hooks_kwargs["model"] = model; instead, make a shallow copy of activation_hooks_kwargs (e.g., new_kwargs = activation_hooks_kwargs.copy() or dict(activation_hooks_kwargs)), add the "model" key to the copy, and use/return that copy so the original dict isn’t mutated; update any return/usage to use the new_kwargs variable.examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml-13-13 (1)
13-13:⚠️ Potential issue | 🟡 MinorMinor typo in comment.
ppathshould bepath.Proposed fix
-dataset_path: ??? # ppath to Nemotron-Post-Training-Dataset-v2 +dataset_path: ??? # path to Nemotron-Post-Training-Dataset-v2🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yaml` at line 13, Fix the typo in the YAML comment for the dataset key: change the comment after the dataset_path entry so it reads "path to Nemotron-Post-Training-Dataset-v2" instead of "ppath to Nemotron-Post-Training-Dataset-v2" so the comment for the dataset_path key is correct and clear.modelopt/torch/puzzletron/tools/validate_model.py-138-140 (1)
138-140:⚠️ Potential issue | 🟡 MinorPotential issue:
validation_full_iterscould be zero.If
args.eval_samplesis less thanargs.micro_batch_size, the integer division results invalidation_full_iters = 0. This could cause unexpected behavior in downstream code that expects at least one iteration.🛡️ Proposed fix: Add validation or use ceiling division
validation_full_iters = ( - args.eval_samples // args.micro_batch_size + max(1, args.eval_samples // args.micro_batch_size) ) # model pipeline, single data rankOr add an assertion:
validation_full_iters = ( args.eval_samples // args.micro_batch_size ) # model pipeline, single data rank + assert validation_full_iters > 0, f"eval_samples ({args.eval_samples}) must be >= micro_batch_size ({args.micro_batch_size})"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/tools/validate_model.py` around lines 138 - 140, The calculation of validation_full_iters can yield zero when args.eval_samples < args.micro_batch_size; update the logic around validation_full_iters in validate_model.py to ensure at least one validation iteration by either validating inputs (raise/exit if args.eval_samples < args.micro_batch_size) or computing a ceiling division and then taking max(1, ...); reference the variable validation_full_iters and the CLI args args.eval_samples and args.micro_batch_size when making the change so downstream loops that expect >=1 iteration do not break.modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py-316-321 (1)
316-321:⚠️ Potential issue | 🟡 MinorActually emit the missing-MoE-stats warning.
Warning(...)only instantiates an object; it does not notify the caller. Right now a missingmoe_stats.jsonsilently changesactive_paramsfrom MoE-aware counts to total params. Usewarnings.warn(...)ormprint(...)here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines 316 - 321, The code creates a pathlib object moe_stats_file and on missing file only instantiates Warning(...) which does nothing; update the block around moe_stats_file (using variables moe_stats_file, moe_stats_filename, master_puzzle_dir and the downstream active_params logic) to actually emit a warning by calling warnings.warn(...) (or mprint(...)) with a clear message like f"MOE stats file {moe_stats_file} does not exist, can't calculate num active params", and add `import warnings` at the top if using warnings.warn; keep setting moe_stats_file = None so the fallback to total params remains.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 31341e54-9727-48f6-b865-606bb7607464
⛔ Files ignored due to path filters (2)
examples/puzzletron/mip_sweep_example.pngis excluded by!**/*.pnguv.lockis excluded by!**/*.lock
📒 Files selected for processing (212)
.github/CODEOWNERS.github/workflows/example_tests.yml.github/workflows/gpu_tests.yml.pre-commit-config.yamlexamples/pruning/README.mdexamples/puzzletron/GPTOSS.mdexamples/puzzletron/README.mdexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/gptoss-20b_remove_experts_memory.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_model_defaults.yamlexamples/puzzletron/configs/gptoss-20b_remove_experts_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/Llama-3_1-8B.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/attn_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/llama-3_1-8B_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/Llama-3_2-3B.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/llama-3_2-3B_pruneffn_memory.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/llama-3_2-3B_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/Mistral-Small-24B.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/mistral-small-24b-instruct-2501_pruneffn_memory.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/attn_pruning.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/mistral-small-24b-instruct-2501_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/nemotron_nano_12b_v2_pruneffn_memory.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/pruning/attn_pruning.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/pruning/ffn_pruning.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/pruning/pruning_defaults.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/validate_model_defaults.yamlexamples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/attn_pruning.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/qwen2_5_7b_instruct_pruneffn_memory.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/qwen2_5_7b_instruct_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/attn_pruning.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/hidden_dim_pruning.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/pruning_defaults.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/qwen3_8b_pruneffn_memory.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_model_defaults.yamlexamples/puzzletron/configs/qwen3-8b_pruneffn_memory/validate_solutions_defaults.yamlexamples/puzzletron/evaluation/hf_deployable_anymodel.pyexamples/puzzletron/evaluation/lm_eval_anymodel.pyexamples/puzzletron/evaluation/nemo_evaluator_instructions.mdexamples/puzzletron/main.pyexamples/puzzletron/mbridge_distillation/README.mdexamples/puzzletron/mbridge_distillation/distill_hf.pyexamples/puzzletron/requirements.txtmodelopt/torch/prune/importance_hooks/__init__.pymodelopt/torch/prune/importance_hooks/base_hooks.pymodelopt/torch/prune/importance_hooks/expert_removal_hooks.pymodelopt/torch/puzzletron/README.mdmodelopt/torch/puzzletron/__init__.pymodelopt/torch/puzzletron/activation_scoring/activation_hooks/__init__.pymodelopt/torch/puzzletron/activation_scoring/activation_hooks/utils.pymodelopt/torch/puzzletron/activation_scoring/score_pruning_activations.pymodelopt/torch/puzzletron/anymodel/README.mdmodelopt/torch/puzzletron/anymodel/__init__.pymodelopt/torch/puzzletron/anymodel/converter/__init__.pymodelopt/torch/puzzletron/anymodel/converter/convert_any_model.pymodelopt/torch/puzzletron/anymodel/converter/converter.pymodelopt/torch/puzzletron/anymodel/converter/converter_factory.pymodelopt/torch/puzzletron/anymodel/model_descriptor/__init__.pymodelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor.pymodelopt/torch/puzzletron/anymodel/model_descriptor/model_descriptor_factory.pymodelopt/torch/puzzletron/anymodel/models/__init__.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/__init__.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_converter.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/gpt_oss/gpt_oss_pruned_to_mxfp4.pymodelopt/torch/puzzletron/anymodel/models/llama/__init__.pymodelopt/torch/puzzletron/anymodel/models/llama/llama_converter.pymodelopt/torch/puzzletron/anymodel/models/llama/llama_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/mistral_small/__init__.pymodelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_converter.pymodelopt/torch/puzzletron/anymodel/models/mistral_small/mistral_small_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/__init__.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_converter.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/__init__.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_converter.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen2/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_8b/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen3_8b/qwen3_8b_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen3_8b/qwen3_8b_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/__init__.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/qwen3_vl_30b_a3b_instruct_converter.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl_30b_a3b_instruct/qwen3_vl_30b_a3b_instruct_model_descriptor.pymodelopt/torch/puzzletron/anymodel/puzzformer/__init__.pymodelopt/torch/puzzletron/anymodel/puzzformer/no_op.pymodelopt/torch/puzzletron/anymodel/puzzformer/utils.pymodelopt/torch/puzzletron/build_library_and_stats.pymodelopt/torch/puzzletron/dataset/__init__.pymodelopt/torch/puzzletron/dataset/prepare_dataset.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/__init__.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/block_config.pymodelopt/torch/puzzletron/decilm/deci_lm_hf_code/modeling_decilm.pymodelopt/torch/puzzletron/export/mbridge/__init__.pymodelopt/torch/puzzletron/export/mbridge/base.pymodelopt/torch/puzzletron/export/mbridge/distillation_provider.pymodelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.pymodelopt/torch/puzzletron/export/mbridge/llama.pymodelopt/torch/puzzletron/export/mbridge/qwen3.pymodelopt/torch/puzzletron/mip/mip_and_realize_models.pymodelopt/torch/puzzletron/mip/mip_with_multi_layer_replacements.pymodelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/mip/sweep.pymodelopt/torch/puzzletron/mip/utils.pymodelopt/torch/puzzletron/nas/plugins/puzzletron_nas_plugin.pymodelopt/torch/puzzletron/pruning/expert_removal_pruning_mixin.pymodelopt/torch/puzzletron/pruning/ffn_intermediate_pruning_mixin.pymodelopt/torch/puzzletron/pruning/kv_heads_pruning_mixin.pymodelopt/torch/puzzletron/pruning/pruning_ckpts.pymodelopt/torch/puzzletron/pruning/pruning_mixin.pymodelopt/torch/puzzletron/pruning/pruning_utils.pymodelopt/torch/puzzletron/puzzletron.pymodelopt/torch/puzzletron/replacement_library/build_replacement_library.pymodelopt/torch/puzzletron/replacement_library/replacement_library.pymodelopt/torch/puzzletron/replacement_library/replacement_utils.pymodelopt/torch/puzzletron/scoring/scoring.pymodelopt/torch/puzzletron/sewing_kit/__init__.pymodelopt/torch/puzzletron/sewing_kit/core.pymodelopt/torch/puzzletron/sewing_kit/passage/__init__.pymodelopt/torch/puzzletron/sewing_kit/passage/core.pymodelopt/torch/puzzletron/sewing_kit/utils.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_params_and_memory.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pymodelopt/torch/puzzletron/tools/__init__.pymodelopt/torch/puzzletron/tools/bypassed_training/child_init.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pymodelopt/torch/puzzletron/tools/checkpoint_utils.pymodelopt/torch/puzzletron/tools/checkpoint_utils_hf.pymodelopt/torch/puzzletron/tools/common.pymodelopt/torch/puzzletron/tools/hydra_utils.pymodelopt/torch/puzzletron/tools/kd_model.pymodelopt/torch/puzzletron/tools/logger.pymodelopt/torch/puzzletron/tools/post_init_sparse.pymodelopt/torch/puzzletron/tools/robust_json.pymodelopt/torch/puzzletron/tools/sharded_checkpoint_utils.pymodelopt/torch/puzzletron/tools/validate_model.pymodelopt/torch/puzzletron/tools/validate_puzzle_with_multi_replacements.pymodelopt/torch/puzzletron/tools/validation_utils.pymodelopt/torch/puzzletron/utils/checkpoint_manager.pymodelopt/torch/puzzletron/utils/data/dataloaders.pymodelopt/torch/puzzletron/utils/data/dataset.pymodelopt/torch/puzzletron/utils/dummy_modules.pymodelopt/torch/puzzletron/utils/parsing.pymodelopt/torch/puzzletron/utils/utils.pymodelopt/torch/puzzletron/utils/validate_runtime_pipeline.pymodelopt/torch/puzzletron/utils/validation.pymodelopt/torch/utils/robust_json.pypyproject.tomltests/_test_utils/torch/puzzletron/utils.pytests/examples/puzzletron/mbridge_distillation/test_distill_hf.pytests/gpu/torch/conftest.pytests/gpu/torch/puzzletron/conftest.pytests/gpu/torch/puzzletron/nas/plugins/test_nas_convert.pytests/gpu/torch/puzzletron/nas/plugins/test_nas_search.pytests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen2.5-7B-Instruct/Qwen2.5-7B-Instruct.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen2.5-7B-Instruct/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/Qwen3-8B.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-8B/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/Qwen3-VL-30B-A3B-Instruct.yamltests/gpu/torch/puzzletron/resources/configs/Qwen/Qwen3-VL-30B-A3B-Instruct/pruning/expert_pruning.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/Llama-3.1-8B-Instruct-attn-pruning.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/Llama-3.1-8B-Instruct.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/pruning/attn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.1-8B-Instruct/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.2-3B-Instruct/Llama-3.2-3B-Instruct.yamltests/gpu/torch/puzzletron/resources/configs/meta-llama/Llama-3.2-3B-Instruct/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/mistralai/Mistral-Small-24B-Instruct-2501/Mistral-Small-24B-Instruct-2501.yamltests/gpu/torch/puzzletron/resources/configs/mistralai/Mistral-Small-24B-Instruct-2501/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/pruning/expert_pruning.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-Nano-12B-v2/NVIDIA-Nemotron-Nano-12B-v2.yamltests/gpu/torch/puzzletron/resources/configs/nvidia/NVIDIA-Nemotron-Nano-12B-v2/pruning/ffn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/openai/gpt-oss-20b/gpt-oss-20b.yamltests/gpu/torch/puzzletron/resources/configs/openai/gpt-oss-20b/pruning/expert_removal.yamltests/gpu/torch/puzzletron/resources/configs/pruning/attn_pruning.yamltests/gpu/torch/puzzletron/resources/configs/pruning/ffn_pruning_base.yamltests/gpu/torch/puzzletron/resources/configs/pruning/hidden_dim_pruning.yamltests/gpu/torch/puzzletron/resources/configs/pruning/pruning_defaults.yamltests/gpu/torch/puzzletron/resources/configs/validate_model_defaults.yamltests/gpu/torch/puzzletron/resources/configs/validate_solutions_defaults.yamltests/gpu/torch/puzzletron/resources/tokenizer/special_tokens_map.jsontests/gpu/torch/puzzletron/resources/tokenizer/tokenizer.jsontests/gpu/torch/puzzletron/resources/tokenizer/tokenizer_config.jsontests/gpu/torch/puzzletron/resources/tokenizer/truncate_tokenizer.pytests/gpu/torch/puzzletron/test_puzzletron.pytox.ini
| solutions_to_validate: | ||
| skip_validation: false | ||
| save_models: false | ||
| bigger_is_better: false | ||
| sort_solutions_by: | ||
| calculate_full_score_ablations: false |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
python -m pip install --quiet pyyaml
python - <<'PY'
from pathlib import Path
import yaml
p = Path("examples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yaml")
data = yaml.safe_load(p.read_text())
print("Top-level keys:", list(data.keys()))
print("solutions_to_validate value:", data.get("solutions_to_validate"))
for k in ["skip_validation", "save_models", "bigger_is_better", "sort_solutions_by"]:
print(f"top-level has '{k}':", k in data)
PYRepository: NVIDIA/Model-Optimizer
Length of output: 413
Indentation bug: solutions_to_validate block is not nested.
Lines 6-10 define keys at the top level instead of under solutions_to_validate, which will cause the solutions_to_validate key to be null rather than a dictionary. This breaks any config consumption expecting solutions_to_validate.skip_validation etc.
🐛 Proposed fix
solutions_to_validate:
-skip_validation: false
-save_models: false
-bigger_is_better: false
-sort_solutions_by:
-calculate_full_score_ablations: false
+ skip_validation: false
+ save_models: false
+ bigger_is_better: false
+ sort_solutions_by:
+ calculate_full_score_ablations: false📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| solutions_to_validate: | |
| skip_validation: false | |
| save_models: false | |
| bigger_is_better: false | |
| sort_solutions_by: | |
| calculate_full_score_ablations: false | |
| solutions_to_validate: | |
| skip_validation: false | |
| save_models: false | |
| bigger_is_better: false | |
| sort_solutions_by: | |
| calculate_full_score_ablations: false |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In
`@examples/puzzletron/configs/nemotron-nano-12b-v2/validate_solutions_defaults.yaml`
around lines 5 - 10, The YAML file has an indentation bug: the keys
skip_validation, save_models, bigger_is_better, sort_solutions_by, and
calculate_full_score_ablations must be nested under the solutions_to_validate
mapping; update the block so that solutions_to_validate is a mapping containing
those keys (i.e., indent those five keys one level under solutions_to_validate)
so that code referencing solutions_to_validate.skip_validation etc. sees a
dictionary rather than null.
| def _assert_valid_config(args, puzzle_profile): | ||
| required_args = ( | ||
| "subblock_stats_path", | ||
| "objective", | ||
| "output_path", | ||
| ) | ||
| missing_args = [arg for arg in required_args if arg not in args or getattr(args, arg) is None] | ||
| if missing_args: | ||
| mprint(f"error: The following arguments are required: {', '.join(missing_args)}") | ||
| exit(1) | ||
|
|
||
| # Make sure we have specified subblock_stats_args | ||
| if "subblock_stats_args" not in args and "subblock_stats_args" not in puzzle_profile: | ||
| mprint( | ||
| "error: Must specify `subblock_stats_arrs` in either puzzle_profile or as a commandline arg." | ||
| ) | ||
| exit(1) | ||
|
|
||
| # Make sure we have specified constraints | ||
| if ( | ||
| "mip_constraints" not in args | ||
| and "human_constraints" not in args | ||
| and "mip_constraints" not in puzzle_profile | ||
| and "human_constraints" not in puzzle_profile | ||
| ): | ||
| mprint( | ||
| "error: Must specify either `mip_constraints` or `human_constraints` in one of puzzle_profile or as a commandline argument." | ||
| ) | ||
| exit(1) |
There was a problem hiding this comment.
The CLI validation path crashes on argparse.Namespace.
These membership checks use arg in args, but Namespace is not iterable. Running the script via __main__ therefore throws TypeError before any friendly validation message is emitted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/mip/run_puzzle.py` around lines 395 - 423, The
validation uses membership checks like "subblock_stats_args" in args and
"mip_constraints" in args but args is an argparse.Namespace (not iterable),
which raises TypeError; update _assert_valid_config to use getattr(args,
'<name>', None) checks instead of "in args" (e.g., replace
`"subblock_stats_args" not in args` with `getattr(args, "subblock_stats_args",
None) is None`) and similarly for `"mip_constraints"` and `"human_constraints"`;
keep the existing checks against puzzle_profile (which may be a dict) unchanged
and ensure the missing_args logic still uses getattr as it already does.
| process_group = node.target.process_group | ||
| peers = node.target.peer_rank | ||
| if not isinstance(peers, Sequence): | ||
| peers = [peers] | ||
|
|
||
| if len(values_to_node) > 0: | ||
| items_to_send = list(self.create_output_overrides(values_to_node).items()) | ||
|
|
||
| data_descriptors: list[RemoteDataDescriptor] = [] | ||
| tensors_to_send: list[torch.Tensor] = [] | ||
|
|
||
| for key, value in items_to_send: | ||
| if isinstance(value, torch.Tensor): | ||
| if value.is_cuda: | ||
| tensor_device = "cuda" | ||
| elif value.is_cpu: | ||
| tensor_device = "cpu" | ||
| else: | ||
| raise RuntimeError( | ||
| f"Invalid tensor device to send to remote target: {value.device}" | ||
| ) | ||
|
|
||
| data_descriptor = RemoteTensorDataDescriptor( | ||
| key=key, | ||
| device=tensor_device, | ||
| dtype=value.dtype, | ||
| shape=value.shape, | ||
| ) | ||
| tensors_to_send.append(value) | ||
|
|
||
| else: | ||
| data_descriptor = RemotePythonDataDescriptor( | ||
| key=key, | ||
| value=value, | ||
| ) | ||
|
|
||
| data_descriptors.append(data_descriptor) | ||
|
|
||
| works: list[Optional[torch.distributed.Work]] = [] | ||
| for peer in peers: | ||
| if process_group is not None: | ||
| peer = torch.distributed.get_global_rank(process_group, peer) | ||
|
|
||
| peer_works = distributed_isend_obj(data_descriptors, dst=peer) | ||
| works.extend(peer_works) | ||
|
|
||
| for tensor in tensors_to_send: | ||
| work = torch.distributed.isend(tensor, dst=peer) | ||
| works.append(work) | ||
|
|
||
| if node.target.blocking: | ||
| for work in works: | ||
| if work is not None: | ||
| work.wait() |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, explore the file structure and locate the target file
find . -name "core.py" -path "*puzzletron/sewing_kit*" | head -20Repository: NVIDIA/Model-Optimizer
Length of output: 167
🏁 Script executed:
# Check the file size and read the relevant sections
wc -l modelopt/torch/puzzletron/sewing_kit/core.pyRepository: NVIDIA/Model-Optimizer
Length of output: 114
🏁 Script executed:
# Read the specific lines mentioned in the review (624-677 and 687-718)
sed -n '620,730p' modelopt/torch/puzzletron/sewing_kit/core.py | cat -nRepository: NVIDIA/Model-Optimizer
Length of output: 6346
🏁 Script executed:
# Search for all uses of distributed_isend_obj, distributed_recv_obj in the file
rg "distributed_(isend_obj|recv_obj|isend|irecv)" modelopt/torch/puzzletron/sewing_kit/core.py -B2 -A2Repository: NVIDIA/Model-Optimizer
Length of output: 698
🏁 Script executed:
# Check the function signatures - find where these are imported from
rg "from.*distributed_isend_obj|import.*distributed_isend_obj" modelopt/torch/puzzletron/sewing_kit/ -B2 -A2Repository: NVIDIA/Model-Optimizer
Length of output: 445
🏁 Script executed:
# Check if there's a utils module or similar that defines distributed_isend_obj
find modelopt/ -name "*.py" -type f -exec rg -l "def distributed_isend_obj|def distributed_recv_obj" {} \;Repository: NVIDIA/Model-Optimizer
Length of output: 111
🏁 Script executed:
# Check the function signatures in utils.py
sed -n '1,200p' modelopt/torch/puzzletron/sewing_kit/utils.py | grep -A 15 "def distributed_isend_obj\|def distributed_recv_obj"Repository: NVIDIA/Model-Optimizer
Length of output: 48
🏁 Script executed:
# Get full definitions of both functions
rg "def distributed_(isend_obj|recv_obj)" modelopt/torch/puzzletron/sewing_kit/utils.py -A 20Repository: NVIDIA/Model-Optimizer
Length of output: 1582
🏁 Script executed:
# Check PyTorch distributed documentation - search for torch.distributed.isend signature
rg "torch.distributed.isend|torch.distributed.irecv" modelopt/torch/puzzletron/sewing_kit/ -B2 -A8Repository: NVIDIA/Model-Optimizer
Length of output: 3244
🏁 Script executed:
# Search for any examples in the codebase that use group parameter with these functions
rg "distributed_isend_obj.*group|torch.distributed.isend.*group|torch.distributed.irecv.*group" modelopt/Repository: NVIDIA/Model-Optimizer
Length of output: 652
🏁 Script executed:
# Search for any other potential P2P call sites in the file that might have similar issues
rg "distributed.isend|distributed.irecv|distributed.send|distributed.recv|distributed_isend|distributed_recv" modelopt/torch/puzzletron/sewing_kit/core.py -nRepository: NVIDIA/Model-Optimizer
Length of output: 483
Pass group=process_group to all distributed P2P calls.
The code extracts process_group and converts local peer ranks to global ranks when process_group is not None (lines 646 and 688 via torch.distributed.get_global_rank()), but the actual distributed_isend_obj(), isend(), distributed_recv_obj(), and irecv() calls at lines 667, 671, 690, and 713 do not receive the group parameter. This causes all communication to use the default process group instead of the specified subgroup, resulting in hangs or communication with the wrong peers.
🛠️ Suggested fix
- peer_works = distributed_isend_obj(data_descriptors, dst=peer)
+ peer_works = distributed_isend_obj(
+ data_descriptors,
+ dst=peer,
+ group=process_group,
+ )
@@
- work = torch.distributed.isend(tensor, dst=peer)
+ work = torch.distributed.isend(
+ tensor,
+ dst=peer,
+ group=process_group,
+ )
@@
- data_descriptors = distributed_recv_obj(src=peer)
+ data_descriptors = distributed_recv_obj(
+ src=peer,
+ group=process_group,
+ )
@@
- work = torch.distributed.irecv(tensor, src=peer)
+ work = torch.distributed.irecv(
+ tensor,
+ src=peer,
+ group=process_group,
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/sewing_kit/core.py` around lines 624 - 677, The P2P
calls are missing the process group argument so they use the default group;
update all distributed send/recv calls to pass group=process_group where
available: add group=process_group to calls to distributed_isend_obj(...) and
distributed_recv_obj(...), and to torch.distributed.isend(...) and
torch.distributed.irecv(...); keep the existing logic that maps local->global
ranks via torch.distributed.get_global_rank(process_group, peer) and only pass
group when process_group is not None so subgroup communication uses the correct
group.
| def _load_subblock_configs_from_subblock_library(master_puzzle_dir: Path) -> list[SubblockConfig]: | ||
| subblocks_df = pd.read_json(master_puzzle_dir / "subblock_library.json") | ||
| subblocks_df["attention_config"] = subblocks_df["attention_config"].apply( | ||
| partial(_dataclass_from_dict, cls=AttentionConfig) | ||
| ) | ||
| subblocks_df["ffn_config"] = subblocks_df["ffn_config"].apply( | ||
| partial(_dataclass_from_dict, cls=FFNConfig) | ||
| ) | ||
| attention_configs = subblocks_df["attention_config"].dropna().drop_duplicates().tolist() | ||
| ffn_configs = subblocks_df["ffn_config"].dropna().drop_duplicates().tolist() | ||
| subblock_configs = attention_configs + ffn_configs | ||
| return subblock_configs |
There was a problem hiding this comment.
Return the same indexed shape from the fallback loader.
This fallback returns bare AttentionConfig/FFNConfig instances, but calculate_subblock_stats() later indexes each item as ["subblock_config"] and ["parent_layer_indices"]. When replacement_library.json is absent, this path fails immediately instead of producing stats.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py` around lines
405 - 416, The fallback loader _load_subblock_configs_from_subblock_library
currently returns raw AttentionConfig/FFNConfig objects but callers
(calculate_subblock_stats) expect list entries with keys/fields
"subblock_config" and "parent_layer_indices" (i.e. SubblockConfig-shaped items);
change the function to construct and return SubblockConfig instances (or dicts
matching SubblockConfig) for each unique attention/ffn config, populating
subblock_config with the AttentionConfig/FFNConfig and setting
parent_layer_indices to a sensible default (e.g. empty list) or the original
parent indices if present in the JSON, so the returned list matches the same
indexed shape used by calculate_subblock_stats. Ensure the function name
_load_subblock_configs_from_subblock_library and the
SubblockConfig/AttentionConfig/FFNConfig symbols are used to locate and
instantiate the correct dataclass.
| # Prepare arguments for parallel processing | ||
| process_layer_partial = partial( | ||
| _process_single_layer, | ||
| pruning_mixin=pruning_mixin, | ||
| descriptor=descriptor, | ||
| parent_state_dict=original_state_dict, | ||
| new_state_dict=new_state_dict, | ||
| original_config=original_config, | ||
| new_config=new_config, | ||
| gqa_init_mode=gqa_init_mode, | ||
| mlp_init_mode=mlp_init_mode, | ||
| mlp_init_config=mlp_init_config, | ||
| linear_init_mode=linear_init_mode, | ||
| ignored_keys=ignored_keys, | ||
| keys=keys, | ||
| is_original_mha=is_original_mha, | ||
| head_size=head_size, | ||
| hidden_size=hidden_size, | ||
| ) |
There was a problem hiding this comment.
Don't share a mutable keys map across worker threads.
create_child_state_dict() passes the same keys dict into every worker, and _init_mlp() mutates it with keys.pop(...). That introduces a real race in the fallback path: workers can remove each other's entries, yielding nondeterministic KeyErrors or copying the wrong leftovers into the final state dict.
Also applies to: 452-455, 694-696
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/bypassed_training/child_init.py` around lines
425 - 443, The issue is that the same mutable keys dict is shared across worker
tasks causing races when _init_mlp mutates it; update each partial so it
receives a worker-local copy of keys (e.g., pass keys.copy() or dict(keys))
instead of the original dict when creating process_layer_partial (and the other
partials at the other sites noted), and ensure
create_child_state_dict/_process_single_layer/_init_mlp keep using their
existing names (create_child_state_dict, _process_single_layer, _init_mlp) but
operate on the copied keys to avoid cross-thread pops and nondeterministic
KeyError or wrong-state copying.
| def _write_file_process_safe( | ||
| content: Any, | ||
| path: Path | str, | ||
| write_fn: Callable[[Any, BinaryIO], None] = _write_text, | ||
| ) -> None: | ||
| """ | ||
| Write a file in a multi-process safe way. | ||
| If another process tries to write the same file using this method, the current process | ||
| "gives up" and assumes that the matter is being taken care of by another process. | ||
|
|
||
| write_fn is a function that receives file contents and a binary file object, | ||
| and writes the content to the file. It can be _write_text (defined above), or torch.save, | ||
| or a similar function (not safetensors.torch.save_file since it expects a path). | ||
| """ | ||
| with open(path, "wb") as f: | ||
| # Try to acquire an exclusive, non-blocking lock | ||
| try: | ||
| fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB) | ||
| except BlockingIOError: | ||
| return # Exit immediately if the lock is not acquired | ||
|
|
||
| write_fn(content, f) # Write the content if lock is acquired | ||
| f.flush() # Ensure data is written to disk | ||
|
|
||
| # Release the lock | ||
| fcntl.flock(f, fcntl.LOCK_UN) |
There was a problem hiding this comment.
Acquire the file lock before truncating the index file.
open(path, "wb") truncates the file before flock() runs. If another process already owns the lock, this contender still zeroes the file and then returns, so the writer you intended to protect can lose or corrupt the index.
🛠️ Proposed fix
- with open(path, "wb") as f:
+ with open(path, "ab+") as f:
# Try to acquire an exclusive, non-blocking lock
try:
fcntl.flock(f, fcntl.LOCK_EX | fcntl.LOCK_NB)
except BlockingIOError:
return # Exit immediately if the lock is not acquired
-
+ f.seek(0)
+ f.truncate()
write_fn(content, f) # Write the content if lock is acquired
f.flush() # Ensure data is written to disk
+ os.fsync(f.fileno())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/checkpoint_utils_hf.py` around lines 335 -
360, The current _write_file_process_safe opens the file with "wb" which
truncates before acquiring the flock, risking data loss; change the open/lock
sequence so you acquire the exclusive non-blocking lock before
truncating/writing: open the file in a non-truncating mode that still creates
the file if missing (e.g., "a+b"), immediately acquire fcntl.flock(f,
fcntl.LOCK_EX | fcntl.LOCK_NB), return on BlockingIOError, then seek to start
and f.truncate(0) (or otherwise reset) before calling write_fn(content, f),
flushing, and finally releasing the lock in _write_file_process_safe. Ensure you
still support write_fn semantics and file creation behavior.
| warnings.filterwarnings("ignore", "You are using `torch.load` with `weights_only=False`*.") | ||
|
|
||
|
|
||
| def load_state_dict(checkpoint_dir: Path | str) -> dict[str, torch.Tensor]: | ||
| checkpoint_dir = _normalize_checkpoint_dir(checkpoint_dir) | ||
|
|
||
| if (state_dict_path := checkpoint_dir / STATE_DICT_FILE_NAME).exists(): | ||
| return torch.load(state_dict_path, map_location="cpu", weights_only=False) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cat -n modelopt/torch/puzzletron/tools/checkpoint_utils.py | head -100Repository: NVIDIA/Model-Optimizer
Length of output: 4528
🏁 Script executed:
cat pyproject.toml | grep -A 5 "torch"Repository: NVIDIA/Model-Optimizer
Length of output: 1584
🏁 Script executed:
rg "trust_remote_code" modelopt/torch/puzzletron/tools/checkpoint_utils.py -A 2Repository: NVIDIA/Model-Optimizer
Length of output: 810
Do not force pickle-based torch.load() for state dicts.
These paths explicitly set weights_only=False and then globally suppress the warning. That re-enables arbitrary code execution for untrusted checkpoints even though these files are plain tensor state dicts. Use weights_only=True; if a specific path truly needs pickle, document why that file is trusted inline instead of hiding the warning.
Suggested fix
-warnings.filterwarnings("ignore", "You are using `torch.load` with `weights_only=False`*.")
-
-
def load_state_dict(checkpoint_dir: Path | str) -> dict[str, torch.Tensor]:
@@
if (state_dict_path := checkpoint_dir / STATE_DICT_FILE_NAME).exists():
- return torch.load(state_dict_path, map_location="cpu", weights_only=False)
+ return torch.load(state_dict_path, map_location="cpu", weights_only=True)
@@
if len(torch_paths) != 0:
- load_fn = partial(torch.load, map_location="cpu", weights_only=False)
+ load_fn = partial(torch.load, map_location="cpu", weights_only=True)
file_paths = torch_pathsAs per coding guidelines, "Do not use torch.load(..., weights_only=False) unless a documented exception is provided."
Also applies to: 74-80
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py` around lines 38 - 45,
The code is forcing pickle-based torch.load by setting weights_only=False and
globally suppressing the warning; update all torch.load calls in this file
(e.g., load_state_dict and the similar optimizer/state loaders around lines
74-80) to use weights_only=True for plain tensor state dicts, remove the global
warnings.filterwarnings line, and only allow weights_only=False on a per-call
basis with an inline, documented justification (no global suppression) when a
trusted file genuinely requires pickle semantics.
| def copy_tokenizer( | ||
| source_dir_or_tokenizer_name: Path | str, | ||
| target_dir: Path | str, | ||
| on_failure: Literal["raise", "warn"] = "raise", | ||
| ) -> None: | ||
| """Prefer loading the tokenizer from huggingface hub (when tokenizer_name.txt file is available) | ||
| to avoid collision between transformers versions. | ||
| """ | ||
| source_tokenizer_name_path = Path(source_dir_or_tokenizer_name) / "tokenizer_name.txt" | ||
| if source_tokenizer_name_path.exists(): | ||
| source_dir_or_tokenizer_name = source_tokenizer_name_path.read_text().strip() | ||
|
|
||
| tokenizer = None | ||
| try: | ||
| tokenizer = AutoTokenizer.from_pretrained( | ||
| source_dir_or_tokenizer_name, trust_remote_code=True | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Find all calls to copy_tokenizer() to understand its usage pattern
rg "copy_tokenizer" --type py -B 3 -A 3Repository: NVIDIA/Model-Optimizer
Length of output: 3915
🏁 Script executed:
# Check the full function signature and surrounding context
fd "checkpoint_utils.py" --type py | head -1 | xargs cat -n | sed -n '155,185p'Repository: NVIDIA/Model-Optimizer
Length of output: 235
🏁 Script executed:
# Search for ModelDescriptor classes in puzzletron to understand the pattern
fd "descriptor" --type py | xargs rg "class.*Descriptor|requires_trust_remote_code" -B 2 -A 5 2>/dev/null | head -100Repository: NVIDIA/Model-Optimizer
Length of output: 7792
🏁 Script executed:
# Get the full function definition and docstring
cat -n modelopt/torch/puzzletron/tools/checkpoint_utils.py | sed -n '161,210p'Repository: NVIDIA/Model-Optimizer
Length of output: 1690
🏁 Script executed:
# Check if copy_tokenizer is ever passed a descriptor or trust_remote_code-related parameter
rg "copy_tokenizer\(" --context 5Repository: NVIDIA/Model-Optimizer
Length of output: 3576
copy_tokenizer() should not hard-code trust_remote_code=True.
This helper accepts arbitrary local paths or Hub IDs without any guarantee of trustworthiness, enabling execution of repository-provided Python during tokenizer loading. Add a trust_remote_code: bool = False parameter and pass the descriptor's setting from callers, matching the pattern used by load_model_config() in the same codebase.
Suggested fix
def copy_tokenizer(
source_dir_or_tokenizer_name: Path | str,
target_dir: Path | str,
on_failure: Literal["raise", "warn"] = "raise",
+ trust_remote_code: bool = False,
) -> None:
@@
try:
tokenizer = AutoTokenizer.from_pretrained(
- source_dir_or_tokenizer_name, trust_remote_code=True
+ source_dir_or_tokenizer_name, trust_remote_code=trust_remote_code
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/checkpoint_utils.py` around lines 161 - 177,
The copy_tokenizer function currently hard-codes
AutoTokenizer.from_pretrained(..., trust_remote_code=True); change the signature
of copy_tokenizer to accept a trust_remote_code: bool = False parameter and pass
that variable into AutoTokenizer.from_pretrained instead of True, and update any
callers to forward the descriptor's trust_remote_code setting (same pattern used
by load_model_config) so tokenizer loading honors the caller-provided trust flag
rather than always allowing remote code execution.
| elif ( | ||
| (ranks == "main" and self.global_rank != 0) | ||
| or (ranks == "last" and self.local_rank != self.world_size - 1) | ||
| or (ranks == "local_main" and self.local_rank != 0) | ||
| ): | ||
| return |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for how lprint/last rank logging is used in the codebase
rg -n "lprint|ranks.*last" --type=pyRepository: NVIDIA/Model-Optimizer
Length of output: 568
🏁 Script executed:
cd modelopt/torch/puzzletron/tools && head -100 logger.py | tail -60Repository: NVIDIA/Model-Optimizer
Length of output: 2338
🏁 Script executed:
cat -n modelopt/torch/puzzletron/tools/logger.py | sed -n '50,90p'Repository: NVIDIA/Model-Optimizer
Length of output: 1838
🏁 Script executed:
rg -n "lprint\(" --type=py -A 2 -B 2 | head -50Repository: NVIDIA/Model-Optimizer
Length of output: 380
🏁 Script executed:
rg -n "local_world_size|LOCAL_WORLD_SIZE" --type=pyRepository: NVIDIA/Model-Optimizer
Length of output: 160
Fix "last" rank logic to use global_rank instead of local_rank.
Line 74 compares self.local_rank against self.world_size - 1, which is incorrect. The docstring states "last" should log from "rank -1 in node 0" (the last global rank), and the lprint() function's docstring confirms this intent. This requires comparing self.global_rank != self.world_size - 1 to match the semantics of "main" (which checks global_rank != 0). The current code mixes local_rank (rank within a node) with world_size (global count), causing the "last" rank logic to fail.
- or (ranks == "last" and self.local_rank != self.world_size - 1)
+ or (ranks == "last" and self.global_rank != self.world_size - 1)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/logger.py` around lines 72 - 77, The "last"
rank check uses self.local_rank against self.world_size - 1 which mixes per-node
rank with global world size; update the condition in the logger method (the
branch handling ranks == "last") to compare self.global_rank to self.world_size
- 1 (i.e., change the check from self.local_rank != self.world_size - 1 to
self.global_rank != self.world_size - 1) so the "last" semantics match the
"main"/global checks used by lprint/logger.
| tokenizer = AutoTokenizer.from_pretrained( | ||
| tokenizer_name or args.model_name_or_path, trust_remote_code=True | ||
| ) |
There was a problem hiding this comment.
CRITICAL: Hardcoded trust_remote_code=True violates security guidelines.
The AutoTokenizer.from_pretrained call hardcodes trust_remote_code=True, which enables arbitrary Python code execution from untrusted model sources. Per coding guidelines, this should be exposed as a caller-configurable parameter defaulting to False, or derived from the ModelDescriptor pattern (as the descriptor already provides requires_trust_remote_code()).
🔒 Proposed fix: Use descriptor's trust_remote_code setting
+from modelopt.torch.puzzletron.anymodel.model_descriptor import ModelDescriptorFactory
+
def prepare_dataloader(
- args: DictConfig, tokenizer: PreTrainedTokenizerBase | None = None
+ args: DictConfig, tokenizer: PreTrainedTokenizerBase | None = None, trust_remote_code: bool = False
) -> DataLoader:
if tokenizer is None:
tokenizer_name = getattr(args, "tokenizer_name", None)
assert (tokenizer_name is not None) or (args.model_name_or_path is not None)
tokenizer = AutoTokenizer.from_pretrained(
- tokenizer_name or args.model_name_or_path, trust_remote_code=True
+ tokenizer_name or args.model_name_or_path, trust_remote_code=trust_remote_code
)Then in validate_model, derive trust_remote_code from the descriptor:
trust_remote_code = descriptor.requires_trust_remote_code()
val_dataloader = prepare_dataloader(args, tokenizer, trust_remote_code=trust_remote_code) if dist.is_master() else NoneBased on learnings: "trust_remote_code should be determined by the ModelDescriptor pattern, not by a parameter passed to convert_model()."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/puzzletron/tools/validate_model.py` around lines 239 - 241,
Replace the hardcoded trust_remote_code=True in the
AutoTokenizer.from_pretrained call inside validate_model by deriving the flag
from the ModelDescriptor (call descriptor.requires_trust_remote_code()); assign
that result to a local trust_remote_code variable and pass it into
AutoTokenizer.from_pretrained and any downstream callers such as
prepare_dataloader (e.g., prepare_dataloader(...,
trust_remote_code=trust_remote_code)) so that trust_remote_code is determined by
the descriptor and not hardcoded.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1121 +/- ##
==========================================
- Coverage 70.21% 70.19% -0.02%
==========================================
Files 228 228
Lines 25952 25956 +4
==========================================
Hits 18221 18221
- Misses 7731 7735 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
What does this PR do?
Implement puzzletron compression algorithm based on Puzzle paper (https://arxiv.org/abs/2411.19146)
Th list of reviewed and merged MRs that resulted in the feature/puzzletron branch:
Merging dkorzekwa/any_model to feature/puzzletron
Add anymodel directories to feature/puzzletron by danielkorzekwa · Pull Request #974 · NVIDIA/Model-Optimizer - merged
Draft: anymodel activation scoring by danielkorzekwa · Pull Request #989 · NVIDIA/Model-Optimizer - merged
Draft: Merge anymodel pruning by danielkorzekwa · Pull Request #990 · NVIDIA/Model-Optimizer - merged
Draft: Merging anymodel:build_library_and_stats by danielkorzekwa · Pull Request #993 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/any model calc one block scores by danielkorzekwa · Pull Request #994 · NVIDIA/Model-Optimizer - merged
Draft: merge any_model: mip_and_realize_models by danielkorzekwa · Pull Request #995 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/any model other modeqls by danielkorztiekwa · Pull Request #1007 · NVIDIA/Model-Optimizer - merged
PR to 1007: #1039 - merged
Dkorzekwa/anymodel gptoss by danielkorzekwa · Pull Request #1020 · NVIDIA/Model-Optimizer - merged
Merge any_model tutorial by danielkorzekwa · Pull Request #1035 · NVIDIA/Model-Optimizer - merged
Merge mbridge distillation for any_model by danielkorzekwa · Pull Request #1036 · NVIDIA/Model-Optimizer - merged
MR branch for the remaining difference between dkorzekwa/any_model an… by danielkorzekwa · Pull Request #1047 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/decilm hf code cleanup by danielkorzekwa · Pull Request #1071 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/decilm hf code cleanup 2 by danielkorzekwa · Pull Request #1073 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/anymodel subblock stats by danielkorzekwa · Pull Request #1085 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/anymodel subblock stats nodecilm by danielkorzekwa · Pull Request #1102 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/decilm cleanup post subblockstats by danielkorzekwa · Pull Request #1103 · NVIDIA/Model-Optimizer - merged
code clean up by danielkorzekwa · Pull Request #1110 · NVIDIA/Model-Optimizer - merged
Merging into main:
Activation hooks redesign (reuse hooks component across both minitron and puzzletron) by danielkorzekwa · Pull Request #1022 · NVIDIA/Model-Optimizer - merged
Dkorzekwa/puzzletron use importance hooks from prune by danielkorzekwa · Pull Request #1115 · NVIDIA/Model-Optimizer - merged
Usage
Puzzletron tutorial:
https://github.com/NVIDIA/Model-Optimizer/tree/feature/puzzletron/examples/puzzletron
Testing
The main e2e test for compressing 9 models with Puzzletron:
https://github.com/NVIDIA/Model-Optimizer/blob/feature/puzzletron/tests/gpu/torch/puzzletron/test_puzzletron.py
Before your PR is "Ready for review"
CONTRIBUTING.md: ✅Summary by CodeRabbit
New Features
Documentation
Chores