Skip to content

Add custom MoE quantization guide for HuggingFace models#1118

Open
cjluo-nv wants to merge 1 commit intomainfrom
chenjiel/moe
Open

Add custom MoE quantization guide for HuggingFace models#1118
cjluo-nv wants to merge 1 commit intomainfrom
chenjiel/moe

Conversation

@cjluo-nv
Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv commented Mar 25, 2026

What does this PR do?

Type of change: documentation

Adds a new guide (examples/llm_ptq/moe.md) that explains how model developers can add quantization support for custom MoE architectures from HuggingFace checkpoints. The guide covers all existing MoE patterns in ModelOpt's huggingface.py plugin and provides step-by-step instructions for patching new ones.

Covered patterns:

  • Default HF SparseMoE (auto-detected Mixtral/Qwen3Moe/NemotronH-style)
  • Llama4-style BMM MoE with transposed quantization
  • GPT-OSS BMM MoE with functional interception
  • DBRX MoE with two-level weight expansion
  • Qwen3VL MoE with fused weight expansion + transpose
  • Qwen3.5 MoE with fused weight expansion (no transpose)
  • Stepfun Step3p5 MoE with expand-then-reconstruct for vLLM compatibility

Also adds a link to the new guide from examples/llm_ptq/README.md.

Testing

N/A — documentation only.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
  • Did you write any new necessary tests?: N/A
  • Did you update Changelog?: N/A

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive guide for extending quantization support for custom Mixture-of-Experts architectures, including registration approaches and step-by-step implementation instructions.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
@cjluo-nv cjluo-nv requested a review from a team as a code owner March 25, 2026 00:14
@cjluo-nv cjluo-nv requested a review from sugunav14 March 25, 2026 00:14
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea47c7f7-3c40-4eec-8cef-4c707358e698

📥 Commits

Reviewing files that changed from the base of the PR and between 3431f8b and 5874df3.

📒 Files selected for processing (2)
  • examples/llm_ptq/README.md
  • examples/llm_ptq/moe.md

📝 Walkthrough

Walkthrough

This change adds comprehensive documentation for extending ModelOpt's Torch quantization capabilities to custom Mixture-of-Experts (MoE) architectures. A new guide file is created describing supported MoE quantization patterns and implementation approaches, with a reference added to the main README.

Changes

Cohort / File(s) Summary
MoE Quantization Documentation
examples/llm_ptq/README.md, examples/llm_ptq/moe.md
Added new documentation guide explaining how to extend quantization support for custom MoE architectures, including details on existing supported patterns, registration approaches, and step-by-step implementation instructions. README updated with reference to new guide.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change—adding documentation for custom MoE quantization support for HuggingFace models, which aligns with both the README update and the new moe.md guide.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns ✅ Passed Pull request contains only documentation changes (moe.md and README.md updates) with no executable code, no security anti-patterns detected (no torch.load, eval, exec, trust_remote_code=True, or # nosec comments).

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch chenjiel/moe

Comment @coderabbitai help to get the list of available commands and usage tips.

@cjluo-nv cjluo-nv changed the title Add documentations about how to support customized MOE Add custom MoE quantization guide for HuggingFace models Mar 25, 2026
@cjluo-nv cjluo-nv requested a review from Edwardf0t1 March 25, 2026 00:16
@github-actions
Copy link
Copy Markdown
Contributor

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1118/

Built to branch gh-pages at 2026-03-25 00:19 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.21%. Comparing base (3431f8b) to head (5874df3).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1118      +/-   ##
==========================================
- Coverage   70.22%   70.21%   -0.02%     
==========================================
  Files         228      228              
  Lines       25952    25952              
==========================================
- Hits        18224    18221       -3     
- Misses       7728     7731       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant