Skip to content

Add nvfp4_mse and nvfp4_local_hessian options to the ptq script.#1113

Merged
kevalmorabia97 merged 3 commits intoNVIDIA:mainfrom
bkartal-dev:fix-example-script
Apr 18, 2026
Merged

Add nvfp4_mse and nvfp4_local_hessian options to the ptq script.#1113
kevalmorabia97 merged 3 commits intoNVIDIA:mainfrom
bkartal-dev:fix-example-script

Conversation

@bkartal-dev
Copy link
Copy Markdown
Contributor

@bkartal-dev bkartal-dev commented Mar 24, 2026

What does this PR do?

Type of change: Bugfix

Add newly added quant configs to the example PTQ script.

Testing

I have locally run auto_quantize with these two quant_configs, and obtained successfully exported HF artifacts.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

  • Is this change backward compatible?: ✅ / ❌ / N/A
  • If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: ✅ / ❌ / N/A
  • Did you write any new necessary tests?: ✅ / ❌ / N/A
  • Did you update Changelog?: ✅ / ❌ / N/A

Additional Information

Summary by CodeRabbit

  • New Features

    • Added support for two new quantization formats: nvfp4_mse and nvfp4_local_hessian, expanding export options available when using auto-quantize.
  • Bug Fixes / UX

    • Updated the invalid-quantization error message to list the newly accepted format identifiers.

@bkartal-dev bkartal-dev requested a review from a team as a code owner March 24, 2026 18:42
@bkartal-dev bkartal-dev requested a review from Edwardf0t1 March 24, 2026 18:42
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 24, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 24, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 75c87cf9-d695-4469-9649-3ec414b34459

📥 Commits

Reviewing files that changed from the base of the PR and between f8ee452 and 97551cd.

📒 Files selected for processing (2)
  • examples/llm_ptq/hf_ptq.py
  • examples/llm_ptq/scripts/huggingface_example.sh
✅ Files skipped from review due to trivial changes (1)
  • examples/llm_ptq/scripts/huggingface_example.sh
🚧 Files skipped from review as they are similar to previous changes (1)
  • examples/llm_ptq/hf_ptq.py

📝 Walkthrough

Walkthrough

The Python auto-quantize whitelist now accepts nvfp4_local_hessian. The shell example's case whitelist and its "Unknown quant argument" message were expanded to include nvfp4_mse and nvfp4_local_hessian.

Changes

Cohort / File(s) Summary
Python quantization whitelist
examples/llm_ptq/hf_ptq.py
Added nvfp4_local_hessian to the hardcoded qformat_list used by auto_quantize() so it passes the assertion and can be resolved via QUANT_CFG_CHOICES.
Shell example whitelist & message
examples/llm_ptq/scripts/huggingface_example.sh
Extended the case $qformat in whitelist to include nvfp4_mse and nvfp4_local_hessian; updated the "Unknown quant argument" error output to include these identifiers in the list of valid options.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: adding two new quantization format options (nvfp4_mse and nvfp4_local_hessian) to the PTQ example script.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Security Anti-Patterns ✅ Passed The pull request contains only configuration changes extending quantization format validation whitelists by adding two new format strings to assertion and case statement without introducing security-sensitive patterns.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Signed-off-by: Bilal Kartal <bkartal@nvidia.com>
@realAsma realAsma requested a review from cjluo-nv March 24, 2026 19:14
Copy link
Copy Markdown
Contributor

@Edwardf0t1 Edwardf0t1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the conflict.

Signed-off-by: bkartal-dev <bkartal@nvidia.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 24, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 77.21%. Comparing base (76b6fd5) to head (97551cd).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1113      +/-   ##
==========================================
- Coverage   77.22%   77.21%   -0.02%     
==========================================
  Files         459      459              
  Lines       48975    48975              
==========================================
- Hits        37822    37815       -7     
- Misses      11153    11160       +7     
Flag Coverage Δ
examples 41.18% <ø> (-0.16%) ⬇️
unit 52.28% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@bkartal-dev
Copy link
Copy Markdown
Contributor Author

merge conflict is resolved.

@kevalmorabia97 kevalmorabia97 added the cherry-pick-0.44 After code freeze, cherry-pick into release branch for next rc. Only for bug fixes and doc updates label Mar 25, 2026
@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test c50a346

@kevalmorabia97 kevalmorabia97 enabled auto-merge (squash) March 25, 2026 16:59
@cjluo-nv
Copy link
Copy Markdown
Collaborator

Could you update readme explaining these options? Or this time is to enable end2end but without the need to promote these two methods?

@bkartal-dev
Copy link
Copy Markdown
Contributor Author

This only updates the end2end example script so that it does not fail when already supported features are selected.

Copy link
Copy Markdown
Collaborator

@cjluo-nv cjluo-nv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary: Adds nvfp4_mse and nvfp4_local_hessian to the allowed quantization format whitelists in the example PTQ script (hf_ptq.py) and its shell wrapper (huggingface_example.sh), so these already-supported engine configs can be used end-to-end without the scripts rejecting them.

Issues Found:

  1. [Readability] huggingface_example.sh:56 — The case pattern and error message are a single very long line that is becoming hard to scan. The new formats are appended at different positions (e.g., nvfp4_local_hessian tacked on at the end, nvfp4_mse inserted mid-list), making future diffs messy. Not blocking, but consider alphabetical or grouped ordering, or a multi-line case pattern using \ continuation for readability as this list grows.

  2. [Correctness] huggingface_example.sh:56 vs hf_ptq.py:270-286 — The shell script whitelist now includes nvfp4_svdquant and fp8_pc_pt which are not in the Python auto_quantize whitelist. This means a user could pass these formats through the shell script but hit the Python assertion when --export_fmt hf triggers auto_quantize. This is a pre-existing issue (not introduced by this PR), but worth noting as the lists continue to diverge.

  3. [Tests] No tests are added, but this is acceptable — the change only extends validation whitelists in example scripts, and the author confirms local end-to-end testing of both new configs. Codecov confirms no coverage regression.

Suggestions:

  • Consider extracting the supported format list into a single source of truth (e.g., a shared constant or config) to prevent the Python and shell whitelists from drifting further apart.

Overall Assessment: Clean, minimal change that unblocks already-supported quantization formats in the example scripts. No correctness risk from the changes themselves.

@kevalmorabia97 kevalmorabia97 removed the cherry-pick-0.44 After code freeze, cherry-pick into release branch for next rc. Only for bug fixes and doc updates label Apr 18, 2026
@kevalmorabia97
Copy link
Copy Markdown
Collaborator

/ok to test 97551cd

@kevalmorabia97 kevalmorabia97 merged commit 92622a9 into NVIDIA:main Apr 18, 2026
41 of 42 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants