Add custom MoE quantization guide for HuggingFace models by cjluo-nv · Pull Request #1118 · NVIDIA/Model-Optimizer

cjluo-nv · 2026-03-25T00:14:43Z

What does this PR do?

Type of change: documentation

Adds a new guide (examples/llm_ptq/moe.md) that explains how model developers can add quantization support for custom MoE architectures from HuggingFace checkpoints. The guide covers all existing MoE patterns in ModelOpt's huggingface.py plugin and provides step-by-step instructions for patching new ones.

Covered patterns:

Default HF SparseMoE (auto-detected Mixtral/Qwen3Moe/NemotronH-style)
Llama4-style BMM MoE with transposed quantization
GPT-OSS BMM MoE with functional interception
DBRX MoE with two-level weight expansion
Qwen3VL MoE with fused weight expansion + transpose
Qwen3.5 MoE with fused weight expansion (no transpose)
Stepfun Step3p5 MoE with expand-then-reconstruct for vLLM compatibility

Also adds a link to the new guide from examples/llm_ptq/README.md.

Testing

N/A — documentation only.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: N/A
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: N/A
Did you update Changelog?: N/A

Summary by CodeRabbit

Documentation
- Added a comprehensive guide for extending quantization support for custom Mixture-of-Experts architectures, including registration approaches and step-by-step implementation instructions.

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

coderabbitai · 2026-03-25T00:15:00Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea47c7f7-3c40-4eec-8cef-4c707358e698

📥 Commits

Reviewing files that changed from the base of the PR and between 3431f8b and 5874df3.

📒 Files selected for processing (2)

examples/llm_ptq/README.md
examples/llm_ptq/moe.md

📝 Walkthrough

Walkthrough

This change adds comprehensive documentation for extending ModelOpt's Torch quantization capabilities to custom Mixture-of-Experts (MoE) architectures. A new guide file is created describing supported MoE quantization patterns and implementation approaches, with a reference added to the main README.

Changes

Cohort / File(s)	Summary
MoE Quantization Documentation `examples/llm_ptq/README.md`, `examples/llm_ptq/moe.md`	Added new documentation guide explaining how to extend quantization support for custom MoE architectures, including details on existing supported patterns, registration approaches, and step-by-step implementation instructions. README updated with reference to new guide.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change—adding documentation for custom MoE quantization support for HuggingFace models, which aligns with both the README update and the new moe.md guide.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Security Anti-Patterns	✅ Passed	Pull request contains only documentation changes (moe.md and README.md updates) with no executable code, no security anti-patterns detected (no torch.load, eval, exec, trust_remote_code=True, or # nosec comments).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch chenjiel/moe

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-25T00:19:13Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1118/
Built to branch `gh-pages` at 2026-03-25 00:19 UTC. Preview will be ready when the GitHub Pages deployment is complete.

codecov · 2026-03-25T00:27:13Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 70.21%. Comparing base (3431f8b) to head (5874df3).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1118      +/-   ##
==========================================
- Coverage   70.22%   70.21%   -0.02%     
==========================================
  Files         228      228              
  Lines       25952    25952              
==========================================
- Hits        18224    18221       -3     
- Misses       7728     7731       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add documentations about how to support customized MOE

5874df3

Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>

cjluo-nv requested a review from a team as a code owner March 25, 2026 00:14

cjluo-nv requested a review from sugunav14 March 25, 2026 00:14

cjluo-nv changed the title ~~Add documentations about how to support customized MOE~~ Add custom MoE quantization guide for HuggingFace models Mar 25, 2026

cjluo-nv requested a review from Edwardf0t1 March 25, 2026 00:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add custom MoE quantization guide for HuggingFace models#1118

Add custom MoE quantization guide for HuggingFace models#1118
cjluo-nv wants to merge 1 commit intomainfrom
chenjiel/moe

cjluo-nv commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Mar 25, 2026

Built to branch `gh-pages` at 2026-03-25 00:19 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Mar 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cjluo-nv commented Mar 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Testing

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

github-actions bot commented Mar 25, 2026

Built to branch gh-pages at 2026-03-25 00:19 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

codecov bot commented Mar 25, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

cjluo-nv commented Mar 25, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 25, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-03-25 00:19 UTC.
Preview will be ready when the GitHub Pages deployment is complete.