Skip to content

Remove custom DistillationProvider and simplify mbridge distillation and hf export#1122

Open
kevalmorabia97 wants to merge 5 commits intofeature/puzzletronfrom
kmorabia/puzzletron-remove-distillation-provider
Open

Remove custom DistillationProvider and simplify mbridge distillation and hf export#1122
kevalmorabia97 wants to merge 5 commits intofeature/puzzletronfrom
kmorabia/puzzletron-remove-distillation-provider

Conversation

@kevalmorabia97
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 commented Mar 25, 2026

  • In nemo:26.02.01 container, we have DistillationProvider fix for homogeneous models already. That seems sufficient for Heterogeneous models as well so removing copied DistillationProvider to simplify
  • Make Qwen3 and QwenVL descriptors generic similar to other descriptors so we can also run with different variants e.g. Qwen3-0.6B
  • Replace hacky megatron to hf export logic with correct one

Summary by CodeRabbit

  • Refactor
    • Reworked distillation and HuggingFace export flow to use upstream bridge/export APIs, removed local monkey-patching and extra exception logging, and simplified distributed cleanup.
  • Chores
    • Consolidated and renamed Qwen3 / Qwen3-VL model and converter registrations; updated pruning configs, CLI export flags, and packaging lint/dependency settings.
  • Tests
    • Updated integration tests to use Qwen3 checkpoints and adjusted export verification.
  • Documentation
    • Updated README example to reflect new CLI usage.

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 requested review from a team as code owners March 25, 2026 19:13
@kevalmorabia97 kevalmorabia97 requested a review from realAsma March 25, 2026 19:13
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Mar 25, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Removed local Megatron-Bridge distillation/export helpers and monkey-patching; switched to upstream Megatron-Bridge APIs for distillation and HF export. Renamed multiple Qwen3/Qwen3-VL converter and model descriptor classes; updated pruning configs, AnyModel imports, examples, and tests to match. Miscellaneous small cleanups and lint/dependency edits.

Changes

Cohort / File(s) Summary
Distillation example & tests
examples/puzzletron/mbridge_distillation/distill_hf.py, examples/puzzletron/mbridge_distillation/README.md, tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Removed local monkey-patching and local HF-export helper; now use megatron.bridge imports and AutoBridge.from_hf_pretrained(...).export_ckpt(...). CLI arg changes (removed required --hf_model), tests updated to Qwen3 checkpoints and Path usage.
Deleted mbridge utilities
modelopt/torch/puzzletron/export/mbridge/distillation_provider.py (deleted), modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py (deleted)
Removed local DistillationProvider implementation and export_to_hf_and_copy_config; upstream Megatron-Bridge provider/bridge used instead.
AnyModel package index
modelopt/torch/puzzletron/anymodel/models/__init__.py
Switched top-level imports to load qwen3 and qwen3_vl modules (replacing older 8B/30B-specific modules) so updated factories register on import.
Qwen3 renames
modelopt/torch/puzzletron/anymodel/models/qwen3/__init__.py, .../qwen3/qwen3_converter.py, .../qwen3/qwen3_model_descriptor.py
Renamed converter and descriptor classes from 8B-specific names to generalized Qwen3* identifiers and updated module exports/registrations.
Qwen3‑VL renames & cleanup
modelopt/torch/puzzletron/anymodel/models/qwen3_vl/__init__.py, .../qwen3_vl/qwen3_vl_converter.py, .../qwen3_vl/qwen3_vl_model_descriptor.py
Renamed VL converter/descriptor and several layer descriptor dataclasses to Qwen3VL*; removed an unused torch.nn import and tightened a default factory.
Pruning config updates
examples/puzzletron/configs/qwen3-8b_pruneffn_memory/pruning/ffn_pruning.yaml, tests/gpu/.../Qwen3-8B/pruning/ffn_pruning.yaml, tests/gpu/.../Qwen3-VL-30B-A3B-Instruct/pruning/expert_pruning.yaml
Updated _target_ entries to reference the renamed/generalized layer descriptor classes and made minor inline-comment spacing tweaks.
Typing & small cleanups
modelopt/torch/puzzletron/sewing_kit/utils.py, modelopt/torch/puzzletron/sewing_kit/core.py, modelopt/torch/puzzletron/mip/run_puzzle.py, modelopt/torch/puzzletron/mip/utils.py, modelopt/torch/puzzletron/anymodel/models/nemotron_h/..., .../nemotron_h_v2/..., .../qwen2/..., modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py, modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
Minor edits: removed redundant pass statements, switched runtime imports to TYPE_CHECKING, adjusted typing casts, replaced exit(1) with sys.exit(1), and simplified an assertion.
Build metadata / lint config
pyproject.toml
Removed exact omegaconf==2.3.0 pin from puzzletron extras and removed several Ruff ignore codes from extend-ignore and per-file ignore lists.
Test resources
tests/gpu/torch/puzzletron/resources/configs/...
Mirrored pruning config updates in test resources to reference renamed descriptor classes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 36.36% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main changes: removing the custom DistillationProvider and simplifying mbridge distillation/HF export logic.
Security Anti-Patterns ✅ Passed Comprehensive security scan found no violations of security anti-patterns: no unsafe torch.load/numpy.load calls, no hardcoded trust_remote_code=True, no eval/exec on external input, and no nosec comments.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch kmorabia/puzzletron-remove-distillation-provider

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Mar 25, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1122/

Built to branch gh-pages at 2026-03-25 21:45 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)

29-29: Make Puzzletron bridge registration explicit.

After switching these imports to upstream Megatron-Bridge, the script now relies on the unrelated export_to_hf_and_copy_config import at Line 49 to pull in modelopt.torch.puzzletron.export.mbridge.__init__, which is what registers the Puzzletron adapters. That hidden dependency is easy to break later by moving the export helper behind --hf_export_path or removing it. Please register the mbridge package explicitly near provider setup so AutoBridge.from_hf_pretrained() does not depend on an export-only import.

♻️ One way to make the dependency explicit
 import argparse
+import importlib
 import os
 import traceback
@@
 def main(args: argparse.Namespace):
+    # Register Puzzletron Megatron-Bridge adapters before resolving HF providers.
+    importlib.import_module("modelopt.torch.puzzletron.export.mbridge")
+
     checkpoint_dir = os.path.join(args.output_dir, "checkpoints")

Also applies to: 43-43

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py` at line 29, The
script currently relies on a side-effect import (export_to_hf_and_copy_config)
to register the Puzzletron mbridge adapters, which is brittle; explicitly import
the mbridge registration module near the provider setup so
AutoBridge.from_hf_pretrained() sees the adapters. Add a direct import of
modelopt.torch.puzzletron.export.mbridge (or its __init__) alongside
convert_to_distillation_provider before calling convert_to_distillation_provider
/ AutoBridge.from_hf_pretrained to ensure Puzzletron adapters are registered
regardless of export helper usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Line 29: The script currently relies on a side-effect import
(export_to_hf_and_copy_config) to register the Puzzletron mbridge adapters,
which is brittle; explicitly import the mbridge registration module near the
provider setup so AutoBridge.from_hf_pretrained() sees the adapters. Add a
direct import of modelopt.torch.puzzletron.export.mbridge (or its __init__)
alongside convert_to_distillation_provider before calling
convert_to_distillation_provider / AutoBridge.from_hf_pretrained to ensure
Puzzletron adapters are registered regardless of export helper usage.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ddbcd8d7-8874-4595-8696-cbb5fb8f2ebe

📥 Commits

Reviewing files that changed from the base of the PR and between c5ec50b and 3358605.

📒 Files selected for processing (2)
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • modelopt/torch/puzzletron/export/mbridge/distillation_provider.py
💤 Files with no reviewable changes (1)
  • modelopt/torch/puzzletron/export/mbridge/distillation_provider.py

@codecov
Copy link

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.17%. Comparing base (c5ec50b) to head (cd83b43).

Additional details and impacted files
@@                  Coverage Diff                   @@
##           feature/puzzletron    #1122      +/-   ##
======================================================
- Coverage               70.19%   70.17%   -0.03%     
======================================================
  Files                     228      228              
  Lines                   25956    25956              
======================================================
- Hits                    18221    18215       -6     
- Misses                   7735     7741       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…ants

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner March 25, 2026 21:05
@kevalmorabia97 kevalmorabia97 changed the title Remove custom DistillationProvider as not needed Remove custom DistillationProvider and simplify mbridge distillation and hf export Mar 25, 2026
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
examples/puzzletron/mbridge_distillation/README.md (1)

92-92: ⚠️ Potential issue | 🟡 Minor

Stale documentation: --hf-model / --hf_model argument no longer exists.

This line still references --hf-model / --hf_model which was removed from the script. The documentation should be updated to reflect that only --hf_export_path is needed for export, and the student model architecture is inferred from --student_hf_path.

📝 Suggested fix
-- Add `--hf-export-path` (or `--hf_export_path`) to automatically export the final checkpoint to HuggingFace format after distillation. When exporting, you must also provide `--hf-model` / `--hf_model` as the HuggingFace model ID for the export template (e.g., `meta-llama/Llama-3.1-8B-Instruct`). It should match the base architecture of the student model. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation).
+- Add `--hf_export_path` to automatically export the final checkpoint to HuggingFace format after distillation. The export uses the student model architecture from `--student_hf_path`. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/README.md` at line 92, Update the
README line that mentions export flags to remove the obsolete `--hf-model` /
`--hf_model` parameter and clarify that only `--hf-export-path` (or
`--hf_export_path`) is required to export the final checkpoint to HuggingFace
format; also state that the student model architecture is inferred from the
provided `--student_hf_path` rather than requiring a separate HuggingFace model
ID, and ensure any example invocation and the sentence about evaluation still
point readers to the main README's evaluation section.
🧹 Nitpick comments (1)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)

270-270: Clarify why subblocks_safetensors/ directory is pre-created.

The purpose of creating subblocks_safetensors/ manually is unclear. If this is a workaround for an AutoBridge.export_ckpt limitation, consider adding a brief comment explaining why it's needed.

💡 Suggested clarification
         bridge = AutoBridge.from_hf_pretrained(
             args.student_hf_path, trust_remote_code=args.trust_remote_code
         )
+        # Create subblocks_safetensors dir required by AutoBridge.export_ckpt
         os.makedirs(os.path.join(args.hf_export_path, "subblocks_safetensors"), exist_ok=True)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py` at line 270, Add a
brief explanatory comment immediately above the os.makedirs call that creates
"subblocks_safetensors" in distill_hf.py describing that this pre-creation is a
deliberate workaround for AutoBridge.export_ckpt (or the downstream export
routine) which expects that directory to already exist when it writes subblock
safetensor files; mention the function name AutoBridge.export_ckpt and the
reason (preventing a missing-directory error during export) so future
maintainers know why the directory is created and can remove the workaround if
export behavior is fixed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 256-276: Add a pre-check and clear error handling before calling
bridge.export_ckpt: verify the expected checkpoint path (checkpoint_dir +
f"iter_{args.train_iters:07d}") exists when args.hf_export_path is set and only
proceed with dist.cleanup()/bridge.export_ckpt if the path is present; otherwise
raise or log a descriptive error mentioning the missing checkpoint path and the
expected location so users know to check train_iters/save_interval. Wrap the
export call (bridge.export_ckpt) in a try/except to catch exceptions and emit a
helpful message including the checkpoint path and original exception before
re-raising or exiting.

In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Line 35: The test currently overwrites the pytest-provided tmp_path with a
hardcoded Path("/tmp/test_distill_hf"); revert to using the fixture-provided
tmp_path variable instead (remove the assignment to tmp_path and use the
existing tmp_path in test_distill_hf.py) or, if durable artifacts are required,
switch to tmp_path_factory to create a named directory; update any references in
the test to use the fixture tmp_path (or tmp_path_factory-created path) so test
isolation and cleanup are preserved.

---

Outside diff comments:
In `@examples/puzzletron/mbridge_distillation/README.md`:
- Line 92: Update the README line that mentions export flags to remove the
obsolete `--hf-model` / `--hf_model` parameter and clarify that only
`--hf-export-path` (or `--hf_export_path`) is required to export the final
checkpoint to HuggingFace format; also state that the student model architecture
is inferred from the provided `--student_hf_path` rather than requiring a
separate HuggingFace model ID, and ensure any example invocation and the
sentence about evaluation still point readers to the main README's evaluation
section.

---

Nitpick comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Line 270: Add a brief explanatory comment immediately above the os.makedirs
call that creates "subblocks_safetensors" in distill_hf.py describing that this
pre-creation is a deliberate workaround for AutoBridge.export_ckpt (or the
downstream export routine) which expects that directory to already exist when it
writes subblock safetensor files; mention the function name
AutoBridge.export_ckpt and the reason (preventing a missing-directory error
during export) so future maintainers know why the directory is created and can
remove the workaround if export behavior is fixed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 50bcee85-fd58-4bac-aed8-960d6c3c234d

📥 Commits

Reviewing files that changed from the base of the PR and between 0e076e7 and 6d45b3a.

📒 Files selected for processing (5)
  • examples/puzzletron/mbridge_distillation/README.md
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
  • pyproject.toml
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (2)
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
  • pyproject.toml

Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/puzzletron-remove-distillation-provider branch from 6d45b3a to 64d4099 Compare March 25, 2026 21:38
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
@kevalmorabia97 kevalmorabia97 force-pushed the kmorabia/puzzletron-remove-distillation-provider branch from 64d4099 to cd83b43 Compare March 25, 2026 21:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)

256-277: ⚠️ Potential issue | 🟡 Minor

Validate the final checkpoint before tearing down distributed state.

Line 273 assumes iter_{args.train_iters:07d} exists. If that directory is missing, the script tears down the process group and then fails deep inside the export path with a much less actionable error. Check the checkpoint directory first on all ranks, then wrap the rank-0 export with a targeted exception.

🐛 Proposed fix
     # Export to HuggingFace format if hf_export_path is provided
     if args.hf_export_path:
+        export_ckpt_dir = os.path.join(checkpoint_dir, f"iter_{args.train_iters:07d}")
+        if not os.path.isdir(export_ckpt_dir):
+            raise FileNotFoundError(
+                "Expected distilled checkpoint at "
+                f"{export_ckpt_dir} before HF export. "
+                "Check --train_iters / --eval_interval and confirm training completed."
+            )
+
         # Save rank before destroying process group (dist.rank() won't work after destruction)
         is_rank_0 = dist.rank() == 0
 
         # Destroy process group on all ranks - export_ckpt will create its own temporary one
         # This prevents cleanup from hanging (cleanup tries to barrier, but rank 0 would be gone)
@@
         if is_rank_0:
             bridge = AutoBridge.from_hf_pretrained(
                 args.student_hf_path, trust_remote_code=args.trust_remote_code
             )
             # Create subblocks_safetensors directory else safetensors saving will fail
             os.makedirs(os.path.join(args.hf_export_path, "subblocks_safetensors"), exist_ok=True)
-            bridge.export_ckpt(
-                megatron_path=f"{checkpoint_dir}/iter_{args.train_iters:07d}",
-                hf_path=args.hf_export_path,
-                show_progress=True,
-                strict=True,
-            )
+            try:
+                bridge.export_ckpt(
+                    megatron_path=export_ckpt_dir,
+                    hf_path=args.hf_export_path,
+                    show_progress=True,
+                    strict=True,
+                )
+            except Exception as exc:
+                raise RuntimeError(
+                    f"Failed to export HF checkpoint from {export_ckpt_dir} "
+                    f"to {args.hf_export_path}"
+                ) from exc
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/puzzletron/mbridge_distillation/distill_hf.py` around lines 256 -
277, Before calling dist.cleanup(), verify that the final checkpoint directory
(f"{checkpoint_dir}/iter_{args.train_iters:07d}") exists on all ranks (e.g., use
os.path.isdir and a collective check) and abort early with a clear log if
missing; then after teardown, keep the rank-0 export
(AutoBridge.from_hf_pretrained + bridge.export_ckpt) wrapped in a targeted
try/except that logs the exact exception and context (checkpoint path,
args.train_iters, hf_export_path) and re-raises or exits cleanly so failures
during bridge.export_ckpt produce actionable messages rather than failing
silently after process group destruction.
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py (1)

35-35: ⚠️ Potential issue | 🟠 Major

Keep the pytest-provided tmp_path.

Line 35 replaces pytest's isolated temp dir with a shared /tmp/test_distill_hf. That makes reruns and parallel jobs share checkpoints/export artifacts, so this test can pass or fail based on stale files instead of the current run.

🐛 Proposed fix
-    tmp_path = Path("/tmp/test_distill_hf")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py` at line
35, The test overwrites the pytest-provided tmp_path by assigning tmp_path =
Path("/tmp/test_distill_hf"), causing shared state between runs; remove that
assignment so the pytest tmp_path fixture is used (or, if you need a named
subdirectory, replace the assignment with tmp_path = tmp_path /
"test_distill_hf" and create it), i.e., locate the tmp_path assignment in
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py and stop
overriding the tmp_path fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 256-277: Before calling dist.cleanup(), verify that the final
checkpoint directory (f"{checkpoint_dir}/iter_{args.train_iters:07d}") exists on
all ranks (e.g., use os.path.isdir and a collective check) and abort early with
a clear log if missing; then after teardown, keep the rank-0 export
(AutoBridge.from_hf_pretrained + bridge.export_ckpt) wrapped in a targeted
try/except that logs the exact exception and context (checkpoint path,
args.train_iters, hf_export_path) and re-raises or exits cleanly so failures
during bridge.export_ckpt produce actionable messages rather than failing
silently after process group destruction.

In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Line 35: The test overwrites the pytest-provided tmp_path by assigning
tmp_path = Path("/tmp/test_distill_hf"), causing shared state between runs;
remove that assignment so the pytest tmp_path fixture is used (or, if you need a
named subdirectory, replace the assignment with tmp_path = tmp_path /
"test_distill_hf" and create it), i.e., locate the tmp_path assignment in
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py and stop
overriding the tmp_path fixture.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: dd51ba9d-fc48-4011-8d50-aa1cb63cd88d

📥 Commits

Reviewing files that changed from the base of the PR and between 6d45b3a and 64d4099.

📒 Files selected for processing (15)
  • examples/puzzletron/mbridge_distillation/README.md
  • examples/puzzletron/mbridge_distillation/distill_hf.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • pyproject.toml
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (8)
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • pyproject.toml
  • modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
✅ Files skipped from review due to trivial changes (1)
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/puzzletron/mbridge_distillation/README.md

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py (1)

118-145: ⚠️ Potential issue | 🟠 Major

This fixture still doesn’t exercise the heterogeneous distillation path.

Both checkpoints are created from the same Qwen/Qwen3-0.6B template and both keep hybrid_override_pattern=None, so after convert_model(...) this test is still validating a homogeneous setup. That leaves the PR’s key claim—removing the custom DistillationProvider is still safe for heterogeneous models—untested here. Please make one side genuinely heterogeneous before conversion (for example by varying per-layer FFN or block configs).

As per coding guidelines, "Write tests using pytest for all new features and examples; organize tests into tests/unit (fast CPU-based), tests/gpu (fast GPU-based), tests/gpu_megatron (Megatron-Core), tests/gpu_trtllm (TensorRT-LLM), and tests/examples (integration tests)" and "All test coverage checks in PRs must pass for new features and examples."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py` around
lines 118 - 145, The test currently creates two identical checkpoints (both via
create_and_save_small_hf_model with hf_model_name="Qwen/Qwen3-0.6B" and
hybrid_override_pattern=None) so it never exercises heterogeneous distillation;
update the fixture so one side is genuinely different before conversion (e.g.,
call create_and_save_small_hf_model for teacher_hf_dir with a different
hf_model_name or supply a non-None hybrid_override_pattern that alters per-layer
FFN/block configs), keep student_hf_dir unchanged, then run convert_model on
both student_hf_dir and teacher_hf_dir as before so convert_model and subsequent
distillation see heterogeneous inputs (refer to create_and_save_small_hf_model,
hybrid_override_pattern, convert_model, student_hf_dir, teacher_hf_dir).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Around line 118-145: The test currently creates two identical checkpoints
(both via create_and_save_small_hf_model with hf_model_name="Qwen/Qwen3-0.6B"
and hybrid_override_pattern=None) so it never exercises heterogeneous
distillation; update the fixture so one side is genuinely different before
conversion (e.g., call create_and_save_small_hf_model for teacher_hf_dir with a
different hf_model_name or supply a non-None hybrid_override_pattern that alters
per-layer FFN/block configs), keep student_hf_dir unchanged, then run
convert_model on both student_hf_dir and teacher_hf_dir as before so
convert_model and subsequent distillation see heterogeneous inputs (refer to
create_and_save_small_hf_model, hybrid_override_pattern, convert_model,
student_hf_dir, teacher_hf_dir).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 91a932d2-8d5f-4071-9418-bf2c9352d03d

📥 Commits

Reviewing files that changed from the base of the PR and between 64d4099 and cd83b43.

📒 Files selected for processing (12)
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.py
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • modelopt/torch/puzzletron/sewing_kit/utils.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
  • pyproject.toml
  • tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (7)
  • modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
  • modelopt/torch/puzzletron/mip/utils.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
  • modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
  • modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
  • modelopt/torch/puzzletron/sewing_kit/core.py
  • pyproject.toml
✅ Files skipped from review due to trivial changes (2)
  • modelopt/torch/puzzletron/mip/run_puzzle.py
  • modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant