Remove custom DistillationProvider and simplify mbridge distillation and hf export#1122
Conversation
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRemoved local Megatron-Bridge distillation/export helpers and monkey-patching; switched to upstream Megatron-Bridge APIs for distillation and HF export. Renamed multiple Qwen3/Qwen3-VL converter and model descriptor classes; updated pruning configs, AnyModel imports, examples, and tests to match. Miscellaneous small cleanups and lint/dependency edits. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
🧹 Nitpick comments (1)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)
29-29: Make Puzzletron bridge registration explicit.After switching these imports to upstream Megatron-Bridge, the script now relies on the unrelated
export_to_hf_and_copy_configimport at Line 49 to pull inmodelopt.torch.puzzletron.export.mbridge.__init__, which is what registers the Puzzletron adapters. That hidden dependency is easy to break later by moving the export helper behind--hf_export_pathor removing it. Please register thembridgepackage explicitly near provider setup soAutoBridge.from_hf_pretrained()does not depend on an export-only import.♻️ One way to make the dependency explicit
import argparse +import importlib import os import traceback @@ def main(args: argparse.Namespace): + # Register Puzzletron Megatron-Bridge adapters before resolving HF providers. + importlib.import_module("modelopt.torch.puzzletron.export.mbridge") + checkpoint_dir = os.path.join(args.output_dir, "checkpoints")Also applies to: 43-43
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/mbridge_distillation/distill_hf.py` at line 29, The script currently relies on a side-effect import (export_to_hf_and_copy_config) to register the Puzzletron mbridge adapters, which is brittle; explicitly import the mbridge registration module near the provider setup so AutoBridge.from_hf_pretrained() sees the adapters. Add a direct import of modelopt.torch.puzzletron.export.mbridge (or its __init__) alongside convert_to_distillation_provider before calling convert_to_distillation_provider / AutoBridge.from_hf_pretrained to ensure Puzzletron adapters are registered regardless of export helper usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Line 29: The script currently relies on a side-effect import
(export_to_hf_and_copy_config) to register the Puzzletron mbridge adapters,
which is brittle; explicitly import the mbridge registration module near the
provider setup so AutoBridge.from_hf_pretrained() sees the adapters. Add a
direct import of modelopt.torch.puzzletron.export.mbridge (or its __init__)
alongside convert_to_distillation_provider before calling
convert_to_distillation_provider / AutoBridge.from_hf_pretrained to ensure
Puzzletron adapters are registered regardless of export helper usage.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ddbcd8d7-8874-4595-8696-cbb5fb8f2ebe
📒 Files selected for processing (2)
examples/puzzletron/mbridge_distillation/distill_hf.pymodelopt/torch/puzzletron/export/mbridge/distillation_provider.py
💤 Files with no reviewable changes (1)
- modelopt/torch/puzzletron/export/mbridge/distillation_provider.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/puzzletron #1122 +/- ##
======================================================
- Coverage 70.19% 70.17% -0.03%
======================================================
Files 228 228
Lines 25956 25956
======================================================
- Hits 18221 18215 -6
- Misses 7735 7741 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…ants Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
examples/puzzletron/mbridge_distillation/README.md (1)
92-92:⚠️ Potential issue | 🟡 MinorStale documentation:
--hf-model/--hf_modelargument no longer exists.This line still references
--hf-model/--hf_modelwhich was removed from the script. The documentation should be updated to reflect that only--hf_export_pathis needed for export, and the student model architecture is inferred from--student_hf_path.📝 Suggested fix
-- Add `--hf-export-path` (or `--hf_export_path`) to automatically export the final checkpoint to HuggingFace format after distillation. When exporting, you must also provide `--hf-model` / `--hf_model` as the HuggingFace model ID for the export template (e.g., `meta-llama/Llama-3.1-8B-Instruct`). It should match the base architecture of the student model. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation). +- Add `--hf_export_path` to automatically export the final checkpoint to HuggingFace format after distillation. The export uses the student model architecture from `--student_hf_path`. The exported model can be evaluated for accuracy using the evaluation tools described in the main [README.md](../README.md#evaluation).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/mbridge_distillation/README.md` at line 92, Update the README line that mentions export flags to remove the obsolete `--hf-model` / `--hf_model` parameter and clarify that only `--hf-export-path` (or `--hf_export_path`) is required to export the final checkpoint to HuggingFace format; also state that the student model architecture is inferred from the provided `--student_hf_path` rather than requiring a separate HuggingFace model ID, and ensure any example invocation and the sentence about evaluation still point readers to the main README's evaluation section.
🧹 Nitpick comments (1)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)
270-270: Clarify whysubblocks_safetensors/directory is pre-created.The purpose of creating
subblocks_safetensors/manually is unclear. If this is a workaround for anAutoBridge.export_ckptlimitation, consider adding a brief comment explaining why it's needed.💡 Suggested clarification
bridge = AutoBridge.from_hf_pretrained( args.student_hf_path, trust_remote_code=args.trust_remote_code ) + # Create subblocks_safetensors dir required by AutoBridge.export_ckpt os.makedirs(os.path.join(args.hf_export_path, "subblocks_safetensors"), exist_ok=True)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/mbridge_distillation/distill_hf.py` at line 270, Add a brief explanatory comment immediately above the os.makedirs call that creates "subblocks_safetensors" in distill_hf.py describing that this pre-creation is a deliberate workaround for AutoBridge.export_ckpt (or the downstream export routine) which expects that directory to already exist when it writes subblock safetensor files; mention the function name AutoBridge.export_ckpt and the reason (preventing a missing-directory error during export) so future maintainers know why the directory is created and can remove the workaround if export behavior is fixed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 256-276: Add a pre-check and clear error handling before calling
bridge.export_ckpt: verify the expected checkpoint path (checkpoint_dir +
f"iter_{args.train_iters:07d}") exists when args.hf_export_path is set and only
proceed with dist.cleanup()/bridge.export_ckpt if the path is present; otherwise
raise or log a descriptive error mentioning the missing checkpoint path and the
expected location so users know to check train_iters/save_interval. Wrap the
export call (bridge.export_ckpt) in a try/except to catch exceptions and emit a
helpful message including the checkpoint path and original exception before
re-raising or exiting.
In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Line 35: The test currently overwrites the pytest-provided tmp_path with a
hardcoded Path("/tmp/test_distill_hf"); revert to using the fixture-provided
tmp_path variable instead (remove the assignment to tmp_path and use the
existing tmp_path in test_distill_hf.py) or, if durable artifacts are required,
switch to tmp_path_factory to create a named directory; update any references in
the test to use the fixture tmp_path (or tmp_path_factory-created path) so test
isolation and cleanup are preserved.
---
Outside diff comments:
In `@examples/puzzletron/mbridge_distillation/README.md`:
- Line 92: Update the README line that mentions export flags to remove the
obsolete `--hf-model` / `--hf_model` parameter and clarify that only
`--hf-export-path` (or `--hf_export_path`) is required to export the final
checkpoint to HuggingFace format; also state that the student model architecture
is inferred from the provided `--student_hf_path` rather than requiring a
separate HuggingFace model ID, and ensure any example invocation and the
sentence about evaluation still point readers to the main README's evaluation
section.
---
Nitpick comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Line 270: Add a brief explanatory comment immediately above the os.makedirs
call that creates "subblocks_safetensors" in distill_hf.py describing that this
pre-creation is a deliberate workaround for AutoBridge.export_ckpt (or the
downstream export routine) which expects that directory to already exist when it
writes subblock safetensor files; mention the function name
AutoBridge.export_ckpt and the reason (preventing a missing-directory error
during export) so future maintainers know why the directory is created and can
remove the workaround if export behavior is fixed.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 50bcee85-fd58-4bac-aed8-960d6c3c234d
📒 Files selected for processing (5)
examples/puzzletron/mbridge_distillation/README.mdexamples/puzzletron/mbridge_distillation/distill_hf.pymodelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.pypyproject.tomltests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (2)
- modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
- pyproject.toml
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
6d45b3a to
64d4099
Compare
Signed-off-by: Keval Morabia <28916987+kevalmorabia97@users.noreply.github.com>
64d4099 to
cd83b43
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
examples/puzzletron/mbridge_distillation/distill_hf.py (1)
256-277:⚠️ Potential issue | 🟡 MinorValidate the final checkpoint before tearing down distributed state.
Line 273 assumes
iter_{args.train_iters:07d}exists. If that directory is missing, the script tears down the process group and then fails deep inside the export path with a much less actionable error. Check the checkpoint directory first on all ranks, then wrap the rank-0 export with a targeted exception.🐛 Proposed fix
# Export to HuggingFace format if hf_export_path is provided if args.hf_export_path: + export_ckpt_dir = os.path.join(checkpoint_dir, f"iter_{args.train_iters:07d}") + if not os.path.isdir(export_ckpt_dir): + raise FileNotFoundError( + "Expected distilled checkpoint at " + f"{export_ckpt_dir} before HF export. " + "Check --train_iters / --eval_interval and confirm training completed." + ) + # Save rank before destroying process group (dist.rank() won't work after destruction) is_rank_0 = dist.rank() == 0 # Destroy process group on all ranks - export_ckpt will create its own temporary one # This prevents cleanup from hanging (cleanup tries to barrier, but rank 0 would be gone) @@ if is_rank_0: bridge = AutoBridge.from_hf_pretrained( args.student_hf_path, trust_remote_code=args.trust_remote_code ) # Create subblocks_safetensors directory else safetensors saving will fail os.makedirs(os.path.join(args.hf_export_path, "subblocks_safetensors"), exist_ok=True) - bridge.export_ckpt( - megatron_path=f"{checkpoint_dir}/iter_{args.train_iters:07d}", - hf_path=args.hf_export_path, - show_progress=True, - strict=True, - ) + try: + bridge.export_ckpt( + megatron_path=export_ckpt_dir, + hf_path=args.hf_export_path, + show_progress=True, + strict=True, + ) + except Exception as exc: + raise RuntimeError( + f"Failed to export HF checkpoint from {export_ckpt_dir} " + f"to {args.hf_export_path}" + ) from exc🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/puzzletron/mbridge_distillation/distill_hf.py` around lines 256 - 277, Before calling dist.cleanup(), verify that the final checkpoint directory (f"{checkpoint_dir}/iter_{args.train_iters:07d}") exists on all ranks (e.g., use os.path.isdir and a collective check) and abort early with a clear log if missing; then after teardown, keep the rank-0 export (AutoBridge.from_hf_pretrained + bridge.export_ckpt) wrapped in a targeted try/except that logs the exact exception and context (checkpoint path, args.train_iters, hf_export_path) and re-raises or exits cleanly so failures during bridge.export_ckpt produce actionable messages rather than failing silently after process group destruction.tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py (1)
35-35:⚠️ Potential issue | 🟠 MajorKeep the pytest-provided
tmp_path.Line 35 replaces pytest's isolated temp dir with a shared
/tmp/test_distill_hf. That makes reruns and parallel jobs share checkpoints/export artifacts, so this test can pass or fail based on stale files instead of the current run.🐛 Proposed fix
- tmp_path = Path("/tmp/test_distill_hf")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py` at line 35, The test overwrites the pytest-provided tmp_path by assigning tmp_path = Path("/tmp/test_distill_hf"), causing shared state between runs; remove that assignment so the pytest tmp_path fixture is used (or, if you need a named subdirectory, replace the assignment with tmp_path = tmp_path / "test_distill_hf" and create it), i.e., locate the tmp_path assignment in tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py and stop overriding the tmp_path fixture.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@examples/puzzletron/mbridge_distillation/distill_hf.py`:
- Around line 256-277: Before calling dist.cleanup(), verify that the final
checkpoint directory (f"{checkpoint_dir}/iter_{args.train_iters:07d}") exists on
all ranks (e.g., use os.path.isdir and a collective check) and abort early with
a clear log if missing; then after teardown, keep the rank-0 export
(AutoBridge.from_hf_pretrained + bridge.export_ckpt) wrapped in a targeted
try/except that logs the exact exception and context (checkpoint path,
args.train_iters, hf_export_path) and re-raises or exits cleanly so failures
during bridge.export_ckpt produce actionable messages rather than failing
silently after process group destruction.
In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Line 35: The test overwrites the pytest-provided tmp_path by assigning
tmp_path = Path("/tmp/test_distill_hf"), causing shared state between runs;
remove that assignment so the pytest tmp_path fixture is used (or, if you need a
named subdirectory, replace the assignment with tmp_path = tmp_path /
"test_distill_hf" and create it), i.e., locate the tmp_path assignment in
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py and stop
overriding the tmp_path fixture.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: dd51ba9d-fc48-4011-8d50-aa1cb63cd88d
📒 Files selected for processing (15)
examples/puzzletron/mbridge_distillation/README.mdexamples/puzzletron/mbridge_distillation/distill_hf.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.pymodelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.pymodelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/mip/utils.pymodelopt/torch/puzzletron/sewing_kit/core.pymodelopt/torch/puzzletron/sewing_kit/utils.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pypyproject.tomltests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (8)
- modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
- modelopt/torch/puzzletron/mip/utils.py
- modelopt/torch/puzzletron/sewing_kit/core.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
- modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
- pyproject.toml
- modelopt/torch/puzzletron/export/mbridge/export_mbridge_to_hf.py
✅ Files skipped from review due to trivial changes (1)
- modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
🚧 Files skipped from review as they are similar to previous changes (1)
- examples/puzzletron/mbridge_distillation/README.md
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py (1)
118-145:⚠️ Potential issue | 🟠 MajorThis fixture still doesn’t exercise the heterogeneous distillation path.
Both checkpoints are created from the same
Qwen/Qwen3-0.6Btemplate and both keephybrid_override_pattern=None, so afterconvert_model(...)this test is still validating a homogeneous setup. That leaves the PR’s key claim—removing the customDistillationProvideris still safe for heterogeneous models—untested here. Please make one side genuinely heterogeneous before conversion (for example by varying per-layer FFN or block configs).As per coding guidelines, "Write tests using pytest for all new features and examples; organize tests into
tests/unit(fast CPU-based),tests/gpu(fast GPU-based),tests/gpu_megatron(Megatron-Core),tests/gpu_trtllm(TensorRT-LLM), andtests/examples(integration tests)" and "All test coverage checks in PRs must pass for new features and examples."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py` around lines 118 - 145, The test currently creates two identical checkpoints (both via create_and_save_small_hf_model with hf_model_name="Qwen/Qwen3-0.6B" and hybrid_override_pattern=None) so it never exercises heterogeneous distillation; update the fixture so one side is genuinely different before conversion (e.g., call create_and_save_small_hf_model for teacher_hf_dir with a different hf_model_name or supply a non-None hybrid_override_pattern that alters per-layer FFN/block configs), keep student_hf_dir unchanged, then run convert_model on both student_hf_dir and teacher_hf_dir as before so convert_model and subsequent distillation see heterogeneous inputs (refer to create_and_save_small_hf_model, hybrid_override_pattern, convert_model, student_hf_dir, teacher_hf_dir).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@tests/examples/puzzletron/mbridge_distillation/test_distill_hf.py`:
- Around line 118-145: The test currently creates two identical checkpoints
(both via create_and_save_small_hf_model with hf_model_name="Qwen/Qwen3-0.6B"
and hybrid_override_pattern=None) so it never exercises heterogeneous
distillation; update the fixture so one side is genuinely different before
conversion (e.g., call create_and_save_small_hf_model for teacher_hf_dir with a
different hf_model_name or supply a non-None hybrid_override_pattern that alters
per-layer FFN/block configs), keep student_hf_dir unchanged, then run
convert_model on both student_hf_dir and teacher_hf_dir as before so
convert_model and subsequent distillation see heterogeneous inputs (refer to
create_and_save_small_hf_model, hybrid_override_pattern, convert_model,
student_hf_dir, teacher_hf_dir).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 91a932d2-8d5f-4071-9418-bf2c9352d03d
📒 Files selected for processing (12)
modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.pymodelopt/torch/puzzletron/anymodel/models/qwen3_vl/qwen3_vl_model_descriptor.pymodelopt/torch/puzzletron/mip/run_puzzle.pymodelopt/torch/puzzletron/mip/utils.pymodelopt/torch/puzzletron/sewing_kit/core.pymodelopt/torch/puzzletron/sewing_kit/utils.pymodelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.pymodelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.pypyproject.tomltests/examples/puzzletron/mbridge_distillation/test_distill_hf.py
💤 Files with no reviewable changes (7)
- modelopt/torch/puzzletron/anymodel/models/qwen2/qwen2_model_descriptor.py
- modelopt/torch/puzzletron/mip/utils.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h/nemotron_h_model_descriptor.py
- modelopt/torch/puzzletron/anymodel/models/nemotron_h_v2/nemotron_h_v2_model_descriptor.py
- modelopt/torch/puzzletron/subblock_stats/calc_subblock_stats.py
- modelopt/torch/puzzletron/sewing_kit/core.py
- pyproject.toml
✅ Files skipped from review due to trivial changes (2)
- modelopt/torch/puzzletron/mip/run_puzzle.py
- modelopt/torch/puzzletron/tools/bypassed_training/init_child_from_parent.py
Summary by CodeRabbit