[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst by shengliangxu · Pull Request #1094 · NVIDIA/Model-Optimizer

shengliangxu · 2026-03-22T23:33:02Z

What does this PR do?

Summary

Redesigns the quant_cfg configuration format in ModelOpt's PyTorch quantization stack, replacing the previous dict-based format with an ordered list of typed QuantizerCfgEntry dicts.

Motivation

The old quant_cfg dict had several pain points:

Ambiguous precedence: no explicit way to reason about which entry wins when multiple keys match a quantizer
Mixed key namespaces: wildcard paths and PyTorch class names lived in the same dict level, requiring ad-hoc dispatch
Magic "default" key: an implicit, undocumented catch-all that was easy to misuse
Poor composability: merging two configs required dict updates that silently discarded keys
No YAML round-trip fidelity: the nested structure couldn't be expressed cleanly in YAML

New format

quant_cfg is now an ordered list of QuantizerCfgEntry TypedDicts. Each entry has:

quantizer_path (required): fnmatch wildcard matched against quantizer module names
cfg (optional): dict (or list of dicts) of QuantizerAttributeConfig fields
enable (optional): toggles quantizer on/off independently of cfg
parent_class (optional): restricts match to quantizers inside the given PyTorch module class

Entries are applied in list order; later entries override earlier ones. The canonical pattern is deny-all first (_base_disable_all), then selectively re-enable, then apply standard exclusions (_default_disabled_quantizer_cfg).

Changes

config.py: QuantizerCfgEntry TypedDict and normalize_quant_cfg_list() added; _default_disabled_quantizer_cfg / _mamba_moe_disabled_quantizer_cfg converted from dicts to lists; all built-in configs (INT8_DEFAULT_CFG, FP8_DEFAULT_CFG, NVFP4_DEFAULT_CFG, etc.) converted to list format; two QuantizeConfig field validators added for normalization and early validation
conversion.py: set_quantizer_by_cfg rewritten to iterate the list directly; set_quantizer_attributes_full and set_quantizer_attributes_partial distinguished; parent_class resolution uses new _DMRegistryCls.get_nn_class() against the original nn.Module class
dynamic.py: _DMRegistryCls.get_nn_class() added to resolve string class names to their original nn.Module subclass for isinstance checks
tensor_quantizer.py: set_from_attribute_config signature narrowed to QuantizerAttributeConfig; uses model_dump() (not exclude_unset) to enforce entry atomicity — unset fields reset to defaults rather than inheriting from prior entries
YAML recipes: all modelopt_recipes/general/ptq/*.yml files updated to the new list format
Docs: new docs/source/guides/_quant_cfg.rst covering entry format, ordering, atomicity, and common patterns

Backward compatibility

normalize_quant_cfg_list() is called automatically by the QuantizeConfig Pydantic validator, so existing code using the old single-key dict format ({"*weight_quantizer": {"num_bits": 8}}) continues to work without modification.

Test coverage

Unit testing: new unit tests to cover the semantic changes
system testing:

python examples/llm_ptq/hf_ptq.py \
      --model Qwen/Qwen3-8B  \
      --recipe general/ptq/fp8_default-fp8_kv \
      --export_path=build/fp8_default-fp8_kv42  \
      --calib_size=16 \
      --batch_size=0 \
      --trust_remote_code 
      --export_fmt=hf

Additional Information

Summary by CodeRabbit

New Features
- Introduced an ordered, list-based quantization config format with per-rule selectors, explicit enable/cfg fields, and last-match-wins precedence.
Documentation
- Added a dedicated guide for the new quant_cfg schema and updated guides, examples, and notebooks to show the list-based format.
Refactor
- Migrated configs, APIs, and internal flows to the new list-oriented quant_cfg shape for consistent precedence and validation.
Tests
- Updated and added tests to validate normalization, validation rules, and list-based behavior.

copy-pr-bot · 2026-03-22T23:33:06Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-03-22T23:33:09Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Refactors quantization configuration from a wildcard-keyed dict schema to an ordered list-of-entry schema (entries with quantizer_path, optional parent_class, optional cfg, optional enable) and updates runtime APIs, normalization/validation, examples, recipes, tests, backends, and docs to the new format.

Changes

Cohort / File(s)	Summary
Core Config & Types `modelopt/torch/quantization/config.py`	Replaced dict-based `quant_cfg` with `list[QuantizerCfgEntry]`; added `QuantizerCfgEntry` TypedDict, `normalize_quant_cfg_list()`, Pydantic validators, canonical defaults, and stricter validation (entries must include `cfg` or `enable`).
Conversion API & Attribute Setters `modelopt/torch/quantization/conversion.py`	`set_quantizer_by_cfg`/`set_quantizer_by_cfg_context` now accept normalized list configs; added `set_quantizer_attributes_full` and `set_quantizer_attributes_partial`; centralized matching (`_match_quantizer`) and split full-vs-partial application logic.
Quantization Runtime & Model Integration `modelopt/torch/quantization/model_quant.py`, `modelopt/torch/quantization/model_calib.py`, `modelopt/torch/quantization/algorithms.py`, `modelopt/torch/quantization/compress.py`	Adapted codepaths to read/produce list-form `quant_cfg`; updated estimation/compression logic to extract nested `cfg`; replaced `set_quantizer_attribute` usage with partial/full attribute setters; KV-cache and disable/enable flows updated to append list entries.
Quantizer Implementations & Utilities `modelopt/torch/quantization/nn/modules/tensor_quantizer.py`, `modelopt/torch/quantization/nn/modules/...`, `modelopt/torch/quantization/utils/core_utils.py`	Adjusted type annotations and setters (`axis`, `block_sizes`) to keep calibrator state in sync; updated helper APIs to accept/return list-form config and to append KV-cache entries; added/updated public typing imports.
Backends Availability Checks `modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py`, `.../nvfp4_gemm.py`	Switched logic to scan `quant_cfg` list for `input_quantizer`/`weight_quantizer` entries and assert extracted `cfg` types before validating module compatibility.
Conversion Entry Points & Export `modelopt/torch/quantization/conversion.py`, `modelopt/torch/export/unified_export_hf.py`, `modelopt/onnx/llm_export_utils/quantization_utils.py`	Pass normalized list-form `quant_cfg` into conversion and export helpers; `get_quant_config` now rebuilds/appends lm_head overrides as list entries.
Examples & Notebooks `examples/...` (`examples/diffusers/...`, `examples/llm_ptq/...`, `examples/llm_autodeploy/run_auto_quantize.py`, `examples/llm_eval/...`, `examples/vllm_serve/fakequant_worker.py`, `examples/windows/...`, `examples/llm_qat/main.py`, notebooks)`	Converted example configs and helper logic from dict-based `quant_cfg` to list entries; updated lookups from direct dict indexing to list traversal/search/append; adjusted a few function signatures to accept list-form KV configs.
Recipes (YAML) `modelopt_recipes/general/ptq/*.yml`	Refactored recipe `quant_cfg` sections from key-mapped forms to ordered lists of rule objects (`quantizer_path`, `enable`, `cfg`, optional `parent_class`); replaced implicit `default` semantics with explicit catch-all entries.
Tests & Test Utilities `tests/_test_utils/...`, `tests/.../torch/quantization/`, `tests/unit/torch/quantization/`, `tests/gpu/...`	Migrated fixtures, helpers and tests to list-based `quant_cfg`; updated call sites to use `set_quantizer_attributes_partial`/`set_quantizer_attributes_full`; added tests for `normalize_quant_cfg_list()` and new attribute-setter semantics; adapted many assertions and search logic to iterate over list entries.
Docs & Guides `docs/source/guides/1_quantization.rst`, `docs/source/guides/_pytorch_quantization.rst`, `docs/source/guides/_quant_cfg.rst`	Added `_quant_cfg.rst` documenting the new list schema, per-entry fields, validation rules, precedence/atomicity semantics; updated PyTorch quantization guide examples to deep-copy defaults and to use list-based `quant_cfg`; added page to toctree.
Misc Examples / Config Constants `examples/diffusers/quantization/config.py`, `tests/_test_utils/...`	Reworked module-level default config constants to list form; adjusted helper setters to inject `trt_high_precision_dtype` or calibrator configs into nested `cfg` entries via list traversal/appends.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 60.40% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The PR title accurately describes the main change: converting quant_cfg from dict-based to list-based format with design documented in _quant_cfg.rst.
Security Anti-Patterns	✅ Passed	No security anti-patterns found: torch.load, numpy.load, trust_remote_code, eval/exec, or `#nosec` comments absent from all modified files.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch shengliangx/quant_cfg-list

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Right now the quant_cfg is a dict, but we are using the quant_cfg as if it is a list. When we apply the quant_cfg, we enumerate the items in the dict and apply the config one by one in modelopt/torch/quantization/conversion.py. This implementation actually has the semantic that the latter configs has higher precedence than the former configs. However, dicts do not have reliable ordering. Therefore, we make quant_cfg a list of patterns: 1. The latter config patterns have higher precedence. A latter config in the list overrides a fomer config if they target the same module. 2. A config to each module is atomic, each config provides the full information. We do not compose a quant module config from multiple config lines Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

…s_partial set_quantizer_attributes_full updates the full quantizer attributes, it has the atomic semantic set_quantizer_attributes_partial updates just a partial set of quantizer attributes, it has the merge semantic Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

codecov · 2026-03-23T02:06:32Z

Codecov Report

❌ Patch coverage is 83.76963% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.18%. Comparing base (291498b) to head (cd65849).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
modelopt/torch/quantization/algorithms.py	75.00%	8 Missing ⚠️
modelopt/torch/quantization/utils/core_utils.py	14.28%	6 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py	0.00%	5 Missing ⚠️
modelopt/torch/quantization/conversion.py	90.74%	5 Missing ⚠️
modelopt/torch/quantization/config.py	94.59%	4 Missing ⚠️
modelopt/torch/quantization/model_quant.py	50.00%	2 Missing ⚠️
modelopt/torch/quantization/compress.py	66.66%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1094      +/-   ##
==========================================
- Coverage   70.21%   70.18%   -0.03%     
==========================================
  Files         228      229       +1     
  Lines       25952    26113     +161     
==========================================
+ Hits        18221    18327     +106     
- Misses       7731     7786      +55

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

coderabbitai

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

examples/llm_autodeploy/run_auto_quantize.py (1)
134-143: ⚠️ Potential issue | 🔴 Critical

Remove hardcoded trust_remote_code=True in HF model and tokenizer loading.

Both AutoModelForCausalLM.from_pretrained() (line 134) and AutoTokenizer.from_pretrained() (line 142) hardcode trust_remote_code=True, creating a critical RCE vector from untrusted model repositories. Expose this as a user-controlled parameter defaulting to False in the modelopt_ptq() function signature.
🔒 Proposed fix
 def modelopt_ptq(
     model_path: str,
     output_dir: str,
     qformat: str | None = None,
     num_samples: int = 512,
     auto_quantize_bits: float | None = None,
     calib_dataset: str = "cnn_dailymail",
     calib_batch_size: int = 8,
+    trust_remote_code: bool = False,
 ) -> torch.nn.Module:
     """Quantize the model with modelopt."""
     model = AutoModelForCausalLM.from_pretrained(
-        model_path, trust_remote_code=True, torch_dtype="auto", device_map="auto"
+        model_path,
+        trust_remote_code=trust_remote_code,
+        torch_dtype="auto",
+        device_map="auto",
     )
     model.eval()
 
     tokenizer = AutoTokenizer.from_pretrained(
         model_path,
         model_max_length=2048,
         padding_side="left",
-        trust_remote_code=True,
+        trust_remote_code=trust_remote_code,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_autodeploy/run_auto_quantize.py` around lines 134 - 143, The
code currently hardcodes trust_remote_code=True when calling
AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained, creating
an RCE risk; update the modelopt_ptq(...) function signature to accept a new
parameter (e.g., trust_remote_code: bool = False) and replace the hardcoded True
with that parameter for both AutoModelForCausalLM.from_pretrained and
AutoTokenizer.from_pretrained so callers can opt-in explicitly while defaulting
to False.
examples/llm_ptq/example_utils.py (1)
208-247: ⚠️ Potential issue | 🟠 Major

Normalize quant_cfg["quant_cfg"] before mutating it.

Lines 210 and 244-247 now assume the new list schema. If a caller still passes a legacy dict-style config, the AWQ branch will fail before mtq.quantize() reaches the backward-compat normalizer, and the phi4mm branch will hit .append() on a dict. Even in list form, walking from the front can rewrite a shadowed *weight_quantizer rule instead of the effective last match.
💡 Suggested fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list
@@
 def build_quant_cfg(
@@
 ) -> dict[str, Any]:
     quant_cfg = copy.deepcopy(quant_cfg)
+    quant_cfg["quant_cfg"] = normalize_quant_cfg_list(quant_cfg["quant_cfg"])
     if "awq" in str(quant_cfg.get("algorithm")):
         weight_quantizer_entry = next(
             e
-            for e in quant_cfg["quant_cfg"]
-            if isinstance(e, dict) and e.get("quantizer_path") == "*weight_quantizer"
+            for e in reversed(quant_cfg["quant_cfg"])
+            if e.get("quantizer_path") == "*weight_quantizer"
         )
         weight_quantizer = weight_quantizer_entry.get("cfg", {})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/example_utils.py` around lines 208 - 247, The code mutates
quant_cfg["quant_cfg"] assuming it's a list; normalize quant_cfg["quant_cfg"] to
the list schema up front (convert legacy dict form into the new list form)
before any mutations so weight_quantizer lookup and .append() calls are safe;
when locating the weight quantizer use a reverse-scan (e.g., search from the
end) to find the effective "*weight_quantizer" entry referenced by
weight_quantizer_entry so awq_block_size handling (weight_quantizer_entry /
weight_quantizer) and the model_type == "phi4mm" .append() calls operate on the
correct, normalized list.
modelopt/onnx/llm_export_utils/quantization_utils.py (1)
57-111: ⚠️ Potential issue | 🟠 Major

Copy the preset before rewriting quant_cfg.

Line 111 mutates mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place. With list-based rules, repeated calls can now retain or duplicate previously appended lm_head entries, so get_quant_config() becomes order-dependent across calls in the same process.
💡 Suggested fix
+import copy
 import time
@@
     if precision == "fp8":
-        quant_cfg = mtq.FP8_DEFAULT_CFG
+        quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG)
 
     elif precision == "nvfp4":
-        quant_cfg = mtq.NVFP4_DEFAULT_CFG
+        quant_cfg = copy.deepcopy(mtq.NVFP4_DEFAULT_CFG)
 
     elif precision == "int4_awq":
-        quant_cfg = mtq.INT4_AWQ_CFG
+        quant_cfg = copy.deepcopy(mtq.INT4_AWQ_CFG)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py` around lines 57 - 111,
get_quant_config currently mutates the shared presets mtq.FP8_DEFAULT_CFG /
mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which causes duplicated
lm_head entries on repeated calls; fix by making a deep copy of the chosen
preset (e.g., import copy and do quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG)
/ copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) / copy.deepcopy(mtq.INT4_AWQ_CFG) inside
get_quant_config) before building quant_cfg_list and appending lm_head entries
so the original mtq presets remain unchanged.
modelopt/torch/quantization/conversion.py (1)
423-454: ⚠️ Potential issue | 🟠 Major

Reject sequential cfgs here or restore full module topology.

This context manager snapshots only TensorQuantizer properties. If a temporary config contains a list cfg, set_quantizer_by_cfg() can replace a TensorQuantizer with a SequentialQuantizer, and the exit path never converts it back. The docstring already says sequential cfg lists are unsupported here, but that isn’t enforced.
🧯 Suggested fix
     quant_cfg = normalize_quant_cfg_list(quant_cfg)
+    if any(isinstance(entry["cfg"], list) for entry in quant_cfg):
+        raise ValueError("set_quantizer_by_cfg_context does not support sequential cfg lists.")
 
     original_attributes = {}
modelopt/torch/quantization/config.py (1)
541-545: ⚠️ Potential issue | 🔴 Critical

Remove enable from _nvfp4_cfg_bs32.

The nested "enable": True in _nvfp4_cfg_bs32 (line 545) causes a duplicate keyword argument error at runtime. When _nvfp4_selective_quant_cfg() passes this dict as the cfg field in QuantizerCfgEntry (line 559), the conversion code later unpacks it with QuantizerAttributeConfig(**cfg, enable=enable) (conversion.py line 266). This passes enable twice and raises TypeError before conversion can complete.
Suggested fix
 _nvfp4_cfg_bs32 = {
     "num_bits": (2, 1),
     "block_sizes": {-1: 32, "type": "dynamic", "scale_bits": (4, 3)},
-    "enable": True,
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 541 - 545, The dict
_nvfp4_cfg_bs32 contains an internal "enable" key which causes a duplicate
keyword when later unpacked into QuantizerAttributeConfig; remove the "enable"
entry from _nvfp4_cfg_bs32 so it only contains its attribute settings (num_bits,
block_sizes, scale_bits/type entries) and let the caller
(_nvfp4_selective_quant_cfg -> QuantizerCfgEntry) supply the enable flag
separately; specifically edit the _nvfp4_cfg_bs32 constant to drop the "enable":
True key so that conversion code that does QuantizerAttributeConfig(**cfg,
enable=enable) no longer passes enable twice.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/guides/_pytorch_quantization.rst`:
- Around line 260-272: The example references _default_disabled_quantizer_cfg
but doesn't define or import it; fix by either adding an import for it (e.g.,
from mtq.config import _default_disabled_quantizer_cfg) at the top of the
snippet or qualify the reference inline as
mtq.config._default_disabled_quantizer_cfg so the example is self-contained and
runnable; update the code block containing "quant_cfg" to use the qualified name
or add the import near the example.

In `@docs/source/guides/_quant_cfg.rst`:
- Around line 123-143: The docs currently state that _base_disable_all (a.k.a.
the deny-all entry) is prepended to every built-in config; update the text to
accurately say that the deny-all entry (referenced as _base_disable_all or
mtq.config._default_disabled_quantizer_cfg) is included in most built-in configs
but not all, and call out that some specialized recipes (e.g., the KV-only
configs in the quantization config set) intentionally omit the deny-all prelude;
adjust the example wording and the note to avoid the absolute "every built-in
config" claim and add a short parenthetical pointer to the specialized recipes
that omit it (referencing their config names such as the KV-only configs).

In `@examples/diffusers/quantization/config.py`:
- Around line 106-109: The loop that injects trt_high_precision_dtype into
quant_config["quant_cfg"] only handles when cfg is a dict (variable p) and skips
the case where cfg is a list of attribute dicts; update the logic in the loop
that iterates quant_config["quant_cfg"] to detect if p is a list and, when so,
iterate its elements and for each element that is a dict and contains "num_bits"
but not "trt_high_precision_dtype", set element["trt_high_precision_dtype"] =
trt_high_precision_dtype; keep the existing dict-handling branch for backward
compatibility so both dict and list cfg forms are supported.
- Around line 128-147: The current append in reset_set_int8_config causes
duplicate entries in quant_config["quant_cfg"]; instead, make the operation
idempotent by searching quant_config["quant_cfg"] for an existing dict whose
"quantizer_path" equals aq_name and replace that dict in-place (or update its
"cfg") if found, otherwise append the new entry; modify the code around the
append that builds the payload (the block referencing aq_name,
quant_config["quant_cfg"], and the PercentileCalibrator tuple) to perform a
find-and-replace by quantizer_path rather than always appending.

In `@examples/llm_eval/quantization_utils.py`:
- Around line 36-49: The loop that detects BMM-attention entries fails because
mtq_cfg["quant_cfg"] is now a list of dicts; update _quantize_model_with_dataset
to handle both list and dict formats by inspecting each entry: if an entry is a
dict, read its "quantizer_path" (or keys for legacy string entries) and test
whether the path string contains "bmm_quantizer" or matches "*bmm_quantizer"; if
mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old
membership check. Ensure this detection is used before calling
register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via
"quantizer_path": "*bmm_quantizer" are not skipped.

In `@examples/llm_ptq/hf_ptq.py`:
- Line 332: The code is using attribute access on plain dicts in
_default_disabled_quantizer_cfg which will raise AttributeError; update the
disabled_layers construction to access the dict key "quantizer_path" for each
entry (i.e., replace entry.quantizer_path with entry["quantizer_path"]) so that
disabled_layers is built from the correct dictionary key.

In `@examples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynb`:
- Around line 193-198: The code fails because quant_cfg["quant_cfg"] contains
dict entries, not 2-tuples; change the extraction to find the dict whose
quantizer_path (or appropriate key) equals "*weight_quantizer" and then take its
"cfg" field (e.g. use next(entry["cfg"] for entry in quant_cfg["quant_cfg"] if
entry.get("quantizer_path") == "*weight_quantizer")), then handle the case where
weight_quantizer is a list as before and set weight_quantizer["block_sizes"][-1]
= 128; update references to quant_cfg, weight_quantizer, and
quant_cfg["quant_cfg"] accordingly.

In `@examples/vllm_serve/fakequant_worker.py`:
- Around line 173-184: The code currently finds only the first kv rule
(kv_entry) and appends minimal clones at the end losing order and full fields;
instead iterate over kv_quant_cfg with indexes, find every entry where
isinstance(e, dict) and e.get("quantizer_path") == "*[kv]_bmm_quantizer", and
for each matching entry insert (not append) two full deep-copied clones of that
entire entry right after the original entry with quantizer_path set to
"*kv_c_bmm_quantizer" and "*k_pe_bmm_quantizer" respectively so you preserve
list order and all other fields (enable, parent_class, etc.) — operate on
kv_quant_cfg and use copy.deepcopy to clone the whole entry before changing
quantizer_path.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`:
- Around line 260-271: The quant_cfg list currently uses tuple entries and
embeds "enable" inside the quantizer config dict (_nvfp4_cfg), which fails
validation in normalize_quant_cfg_list(); change quant_cfg to a list of
QuantizerCfgEntry dicts where each entry is e.g. {"quantizer_path": "<pattern>",
"cfg": <cfg_dict>} for enabled entries (move "enable" out of _nvfp4_cfg) and
{"quantizer_path": "<pattern>", "enable": False} for disabled patterns (use
SENSITIVE_LAYER_PATTERNS and the f"*transformer_blocks.{i}.*" patterns for
exclude_blocks), ensuring _nvfp4_cfg no longer contains a top-level "enable" key
and only used as the "cfg" value.

In `@modelopt/torch/quantization/algorithms.py`:
- Around line 1333-1344: The loop in get_auto_quantize_config is calling
_match_quantizer_cfg with only the short names
("input_quantizer"/"weight_quantizer"), so path- or parent_class-scoped
selectors never match and enable-only matches are dropped; change the call to
pass the fully qualified quantizer name (e.g.,
f"{module_name}.{quantizer_attr}") to _match_quantizer_cfg and ensure the caller
appends an entry when either matched_cfg is not None or matched_enable is True
(so enable-only matches are preserved); apply the same fix for the analogous
block that starts at the later occurrence (the other loop referencing
_match_quantizer_cfg, module_name, and quantizer_attr).
- Around line 96-98: The code currently drops the entry-level "enable" flag and
estimates compression from every entry's "cfg"; change the selection to only
include cfgs from QuantizerCfgEntry items where e.get("enable", True) is truthy
and the "cfg" is not None. Concretely, in the block using quant_cfg.quant_cfg
and calling estimate_quant_compression_for_quantizer, filter entries by
e.get("enable", True) before extracting e.get("cfg") (and still skip None cfgs)
so estimate_quant_compression_for_quantizer receives only enabled cfgs.

In `@modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py`:
- Around line 100-116: The availability probe currently uses next(...) without a
default and assertions on input_cfg/weight_cfg which can raise
StopIteration/AssertionError; change the two next(...) calls that produce
input_cfg and weight_cfg to include a default (e.g., next(..., None)) and
replace the assert isinstance(...) checks with explicit conditional checks that
return False from the probe when input_cfg or weight_cfg is missing or not a
dict; ensure you reference FP8_DEFAULT_CFG["quant_cfg"], quant_cfg_list,
input_cfg and weight_cfg in the probe logic so malformed or missing default
entries cause the function to return False instead of throwing.

In `@modelopt/torch/quantization/backends/nvfp4_gemm.py`:
- Around line 216-231: The availability check currently uses next(...) without
defaults and asserts which can raise instead of returning False; update the
_nvfp4_availability_check implementation to retrieve input_cfg and weight_cfg
from quant_cfg_list using next(..., None) (or equivalent), verify they are dicts
with proper "cfg" entries and that their "quantizer_path" matches
"*input_quantizer" / "*weight_quantizer", and if either is missing or not a dict
return False instead of asserting or allowing StopIteration; reference the
symbols quant_cfg_list, input_cfg, weight_cfg and the existing next(...)/assert
lines to locate and change the lookup and validation to defensive checks that
return False on malformed/missing configs.

In `@modelopt/torch/quantization/config.py`:
- Around line 1625-1631: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged causing legacy dict-style quant_cfg to bypass
normalization; update normalize_quant_cfg to detect dict inputs and pass them
through normalize_quant_cfg_list (which handles dict -> entry conversion via
_dict_to_entry) so legacy {"*weight_quantizer": {...}} shapes are converted to
the expected list form; ensure this change preserves existing behavior for lists
and other non-dict types so downstream callers like need_calibration (which
iterates entries) receive normalized entry lists.

In `@modelopt/torch/quantization/conversion.py`:
- Around line 341-356: The current logic in _match_quantizer branch mutates
existing SequentialQuantizer instances instead of fully replacing them when the
incoming attributes change shape; to fix, ensure full replacement instead of
partial assignment: when attributes is a list, if module is not a
SequentialQuantizer create a new SequentialQuantizer (as before) OR if module is
a SequentialQuantizer but len(attributes) != len(module) replace the existing
module with a new SequentialQuantizer of the correct length before calling
set_from_attribute_config; conversely, when attributes is not a list and module
is a SequentialQuantizer, replace it with a fresh TensorQuantizer instance and
then call set_from_attribute_config on that new instance (use
quant_model.get_submodule/name.rpartition to locate parent_module and setattr to
swap the module).

In `@modelopt/torch/quantization/model_quant.py`:
- Around line 183-191: The docstring for quantize() is out of sync: it still
describes quant_cfg as a dict mapping wildcard keys, but the implementation and
example use a list-of-dicts schema. Update the prose in the quantize() doc block
to describe quant_cfg as a list where each item is a dict with keys like
"quantizer_path" (wildcard pattern), optional "enable" (bool), optional "cfg"
(dict, e.g., {"num_bits": int, "axis": int}), and that multiple entries are
applied in order; also mention the top-level "algorithm" field (e.g., "max") and
how entries such as
{"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and
{"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are
interpreted by quantize()/quant_cfg.

In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 835-836: The code is appending dict.items() tuples into
quant_cfg["quant_cfg"], creating mixed tuples instead of the expected
list[QuantizerCfgEntry] and can error when quant_cfg["quant_cfg"] is a dict;
normalize both sources into proper QuantizerCfgEntry dicts: ensure inner is a
list (if quant_cfg.get("quant_cfg") is a dict, convert it to a list of entries),
and transform kv_cache_quant_cfg (the dict of pattern->cfg) into a list of
{"quantizer_path": pattern, ...cfg} or at least {"quantizer_path": pattern,
"enable": cfg.get("enable", False)} entries before concatenating; assign the
concatenated list back to quant_cfg["quant_cfg"] so downstream consumers only
see a homogeneous list[QuantizerCfgEntry].

In `@tests/_test_utils/torch/quantization/quantize_common.py`:
- Around line 50-61: The loop over config["quant_cfg"] looks for a
"*weight_quantizer" entry and mutates its "block_sizes" but silently does
nothing if no match is found; add a fail-fast check by tracking whether a
matching entry was found (e.g., set a boolean flag when pat ==
"*weight_quantizer") and after the loop raise an explicit error (ValueError or
AssertionError) if no match was found so tests fail early rather than silently
proceeding without applying block_size to the intended entry.

In `@tests/gpu/torch/quantization/test_quantize_cuda.py`:
- Around line 133-135: The test is mutating the shared constant
mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by editing config["quant_cfg"] in place;
instead, create a deep copy of that constant (e.g., via copy.deepcopy) into the
local variable config before patching so changes to entry.setdefault("cfg",
{})["block_sizes"] = {-1: 8, -2: 8} only affect this test; locate the code that
assigns config from mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a
deep-copied version so modifications to quant_cfg/ block_sizes/quantizer_path do
not leak to other parametrized cases.

In `@tests/unit/torch/quantization/test_autoquant.py`:
- Around line 492-507: The current assertion allows tuple-style payloads by
falling back to entry[0]/entry[1]; restrict the check to dict-style
QuantizerCfgEntry entries only by iterating over config["quant_cfg"] and
filtering with isinstance(entry, dict), then assert that some dict has
entry.get("quantizer_path") == "*" and entry.get("enable") is False; update the
assertion around the quant_cfg check in
tests/unit/torch/quantization/test_autoquant.py accordingly (referencing
config["quant_cfg"] and the dict keys "quantizer_path" and "enable").

---

Outside diff comments:
In `@examples/llm_autodeploy/run_auto_quantize.py`:
- Around line 134-143: The code currently hardcodes trust_remote_code=True when
calling AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained,
creating an RCE risk; update the modelopt_ptq(...) function signature to accept
a new parameter (e.g., trust_remote_code: bool = False) and replace the
hardcoded True with that parameter for both AutoModelForCausalLM.from_pretrained
and AutoTokenizer.from_pretrained so callers can opt-in explicitly while
defaulting to False.

In `@examples/llm_ptq/example_utils.py`:
- Around line 208-247: The code mutates quant_cfg["quant_cfg"] assuming it's a
list; normalize quant_cfg["quant_cfg"] to the list schema up front (convert
legacy dict form into the new list form) before any mutations so
weight_quantizer lookup and .append() calls are safe; when locating the weight
quantizer use a reverse-scan (e.g., search from the end) to find the effective
"*weight_quantizer" entry referenced by weight_quantizer_entry so awq_block_size
handling (weight_quantizer_entry / weight_quantizer) and the model_type ==
"phi4mm" .append() calls operate on the correct, normalized list.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 57-111: get_quant_config currently mutates the shared presets
mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which
causes duplicated lm_head entries on repeated calls; fix by making a deep copy
of the chosen preset (e.g., import copy and do quant_cfg =
copy.deepcopy(mtq.FP8_DEFAULT_CFG) / copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) /
copy.deepcopy(mtq.INT4_AWQ_CFG) inside get_quant_config) before building
quant_cfg_list and appending lm_head entries so the original mtq presets remain
unchanged.

In `@modelopt/torch/quantization/config.py`:
- Around line 541-545: The dict _nvfp4_cfg_bs32 contains an internal "enable"
key which causes a duplicate keyword when later unpacked into
QuantizerAttributeConfig; remove the "enable" entry from _nvfp4_cfg_bs32 so it
only contains its attribute settings (num_bits, block_sizes, scale_bits/type
entries) and let the caller (_nvfp4_selective_quant_cfg -> QuantizerCfgEntry)
supply the enable flag separately; specifically edit the _nvfp4_cfg_bs32
constant to drop the "enable": True key so that conversion code that does
QuantizerAttributeConfig(**cfg, enable=enable) no longer passes enable twice.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fcb69c76-5a46-4dd0-82f2-905613c6cc83

📥 Commits

Reviewing files that changed from the base of the PR and between 08e5f92 and 5115452.

📒 Files selected for processing (53)

docs/source/guides/1_quantization.rst
docs/source/guides/_pytorch_quantization.rst
docs/source/guides/_quant_cfg.rst
examples/diffusers/quantization/config.py
examples/llm_autodeploy/run_auto_quantize.py
examples/llm_eval/quantization_utils.py
examples/llm_ptq/example_utils.py
examples/llm_ptq/hf_ptq.py
examples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynb
examples/llm_ptq/notebooks/3_PTQ_AutoQuantization.ipynb
examples/llm_qat/main.py
examples/vllm_serve/fakequant_worker.py
examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py
modelopt/onnx/llm_export_utils/quantization_utils.py
modelopt/torch/export/unified_export_hf.py
modelopt/torch/quantization/algorithms.py
modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py
modelopt/torch/quantization/backends/nvfp4_gemm.py
modelopt/torch/quantization/compress.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/conversion.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/model_quant.py
modelopt/torch/quantization/nn/modules/tensor_quantizer.py
modelopt/torch/quantization/utils/core_utils.py
modelopt/torch/sparsity/attention_sparsity/conversion.py
modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml
modelopt_recipes/general/ptq/nvfp4_default-fp8_kv.yml
modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml
modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.yml
tests/_test_utils/torch/export/utils.py
tests/_test_utils/torch/quantization/onnx_export.py
tests/_test_utils/torch/quantization/quantize_common.py
tests/gpu/torch/quantization/test_hadamard.py
tests/gpu/torch/quantization/test_quant_rnn_cuda.py
tests/gpu/torch/quantization/test_quantize_cuda.py
tests/gpu_megatron/torch/peft/plugins/test_megatron_peft.py
tests/gpu_megatron/torch/quantization/plugins/test_apex.py
tests/gpu_megatron/torch/quantization/plugins/test_megatron.py
tests/unit/recipe/test_loader.py
tests/unit/torch/quantization/plugins/test_attention_quant.py
tests/unit/torch/quantization/plugins/test_huggingface.py
tests/unit/torch/quantization/plugins/test_peft.py
tests/unit/torch/quantization/test_autoquant.py
tests/unit/torch/quantization/test_compute_quantization_mse.py
tests/unit/torch/quantization/test_config_validation.py
tests/unit/torch/quantization/test_custom_backend.py
tests/unit/torch/quantization/test_quant_activations.py
tests/unit/torch/quantization/test_quant_batchnorm.py
tests/unit/torch/quantization/test_quant_rnn.py
tests/unit/torch/quantization/test_quantize_cpu.py
tests/unit/torch/quantization/test_quantize_replace.py
tests/unit/torch/quantization/test_tensor_quant_cpu.py

coderabbitai · 2026-03-23T04:24:21Z

docs/source/guides/_pytorch_quantization.rst

+        "quant_cfg": [
+            # Disable all quantizers by default, then enable selectively
+            {"quantizer_path": "*", "enable": False},
+
            # Configure weight quantizers with 4-bit precision and 128-element blocks
-            "*weight_quantizer": {"num_bits": 4, "block_sizes": {-1: 128}, "enable": True},
+            {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 4, "block_sizes": {-1: 128}}, "enable": True},

            # Configure input quantizers with 8-bit dynamic quantization
-            "*input_quantizer": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}},
+            {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}},

            # Include default disabled quantizer configurations
-            **_default_disabled_quantizer_cfg,
-        },
+            *_default_disabled_quantizer_cfg,
+        ],


⚠️ Potential issue | 🟡 Minor

Define or qualify _default_disabled_quantizer_cfg in this example.

This snippet now references _default_disabled_quantizer_cfg without importing or qualifying it, so it fails when copied verbatim. Please either add the import or qualify it as mtq.config._default_disabled_quantizer_cfg.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/source/guides/_pytorch_quantization.rst` around lines 260 - 272, The example references _default_disabled_quantizer_cfg but doesn't define or import it; fix by either adding an import for it (e.g., from mtq.config import _default_disabled_quantizer_cfg) at the top of the snippet or qualify the reference inline as mtq.config._default_disabled_quantizer_cfg so the example is self-contained and runnable; update the code block containing "quant_cfg" to use the qualified name or add the import near the example.

coderabbitai · 2026-03-23T04:24:21Z

docs/source/guides/_quant_cfg.rst

+The recommended pattern used by all built-in configs is:
+
+.. code-block:: python
+
+    "quant_cfg": [
+        # 1. Deny all quantizers by default
+        {"quantizer_path": "*", "enable": False},
+
+        # 2. Enable and configure the target quantizers
+        {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
+        {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},
+
+        # 3. Apply standard exclusions last (BatchNorm, LM head, MoE routers, etc.)
+        *mtq.config._default_disabled_quantizer_cfg,
+    ]
+
+.. note::
+
+    The deny-all entry ``{"quantizer_path": "*", "enable": False}`` is available as
+    :data:`modelopt.torch.quantization.config._base_disable_all` and is prepended to every
+    built-in config. This ensures quantizers not explicitly targeted remain disabled.


⚠️ Potential issue | 🟡 Minor

Don’t describe _base_disable_all as universal across built-ins.

Several specialized recipes in modelopt/torch/quantization/config.py intentionally omit the deny-all prelude (for example the KV-only configs). Saying it is prepended to every built-in config will send users to the wrong pattern for those formats.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@docs/source/guides/_quant_cfg.rst` around lines 123 - 143, The docs currently state that _base_disable_all (a.k.a. the deny-all entry) is prepended to every built-in config; update the text to accurately say that the deny-all entry (referenced as _base_disable_all or mtq.config._default_disabled_quantizer_cfg) is included in most built-in configs but not all, and call out that some specialized recipes (e.g., the KV-only configs in the quantization config set) intentionally omit the deny-all prelude; adjust the example wording and the note to avoid the absolute "every built-in config" claim and add a short parenthetical pointer to the specialized recipes that omit it (referencing their config names such as the KV-only configs).

coderabbitai · 2026-03-23T04:24:21Z

examples/diffusers/quantization/config.py

+    for entry in quant_config["quant_cfg"]:
+        p = entry.get("cfg", {})
+        if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p:
            p["trt_high_precision_dtype"] = trt_high_precision_dtype


⚠️ Potential issue | 🟡 Minor

Handle sequential cfg lists when injecting trt_high_precision_dtype.

The new quant_cfg format allows cfg to be a list of attribute dicts, but Line 107 only updates plain dicts. Any sequential entry will skip this dtype injection entirely.

🛠️ Proposed fix

- for entry in quant_config["quant_cfg"]: - p = entry.get("cfg", {}) - if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p: - p["trt_high_precision_dtype"] = trt_high_precision_dtype + for entry in quant_config["quant_cfg"]: + cfg = entry.get("cfg") + cfg_items = [cfg] if isinstance(cfg, dict) else (cfg or []) + for attrs in cfg_items: + if "num_bits" in attrs and "trt_high_precision_dtype" not in attrs: + attrs["trt_high_precision_dtype"] = trt_high_precision_dtype

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/diffusers/quantization/config.py` around lines 106 - 109, The loop that injects trt_high_precision_dtype into quant_config["quant_cfg"] only handles when cfg is a dict (variable p) and skips the case where cfg is a list of attribute dicts; update the logic in the loop that iterates quant_config["quant_cfg"] to detect if p is a list and, when so, iterate its elements and for each element that is a dict and contains "num_bits" but not "trt_high_precision_dtype", set element["trt_high_precision_dtype"] = trt_high_precision_dtype; keep the existing dict-handling branch for backward compatibility so both dict and list cfg forms are supported.

coderabbitai · 2026-03-23T04:24:22Z

examples/diffusers/quantization/config.py

+            quant_config["quant_cfg"].append(
+                {
+                    "quantizer_path": aq_name,
+                    "cfg": {
                        "num_bits": 8,
                        "axis": None,
-                        "percentile": percentile,
-                        "total_step": n_steps,
-                        "collect_method": collect_method,
+                        "calibrator": (
+                            PercentileCalibrator,
+                            (),
+                            {
+                                "num_bits": 8,
+                                "axis": None,
+                                "percentile": percentile,
+                                "total_step": n_steps,
+                                "collect_method": collect_method,
+                            },
+                        ),
                    },
-                ),
-            }
+                }
+            )


⚠️ Potential issue | 🟡 Minor

Make reset_set_int8_config() idempotent under list-based quant_cfg.

Line 128 switched this helper from a replace-by-key update to an unconditional append. Reusing the same quant_config object now accumulates duplicate or stale *{name}*input_quantizer* rules, which changes later-match behavior under the new schema.

🧼 Proposed fix

if isinstance(module, nn.Conv2d): aq_name = f"*{name}*input_quantizer*" + quant_config["quant_cfg"] = [ + entry + for entry in quant_config["quant_cfg"] + if not (isinstance(entry, dict) and entry.get("quantizer_path") == aq_name) + ] quant_config["quant_cfg"].append( { "quantizer_path": aq_name,

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/diffusers/quantization/config.py` around lines 128 - 147, The current append in reset_set_int8_config causes duplicate entries in quant_config["quant_cfg"]; instead, make the operation idempotent by searching quant_config["quant_cfg"] for an existing dict whose "quantizer_path" equals aq_name and replace that dict in-place (or update its "cfg") if found, otherwise append the new entry; modify the code around the append that builds the payload (the block referencing aq_name, quant_config["quant_cfg"], and the PercentileCalibrator tuple) to perform a find-and-replace by quantizer_path rather than always appending.

coderabbitai · 2026-03-23T04:24:22Z

examples/llm_eval/quantization_utils.py

+        "quant_cfg": [
+            *mtq.config._base_disable_all,
+            {
+                "quantizer_path": "*weight_quantizer",
+                "cfg": {"num_bits": 4, "block_sizes": {-1: 128}},
+                "enable": True,
+            },
+            {
+                "quantizer_path": "*input_quantizer",
+                "cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}},
+            },
            # Disable sensitive layers such as `lm_head`, gate layers in MoE etc.
-            **mtq.config._default_disabled_quantizer_cfg,
-        },
+            *mtq.config._default_disabled_quantizer_cfg,
+        ],


⚠️ Potential issue | 🟠 Major

Update the BMM-attention detection for list-based quant_cfg.

After switching quant_cfg to a list of entries, _quantize_model_with_dataset()'s later for key in mtq_cfg["quant_cfg"] / "bmm_quantizer" in key check no longer works: iterating the list yields dicts, so that membership test only inspects dict keys. Configs that target *bmm_quantizer now skip register_hf_attentions_on_the_fly(net) and silently miss KV-attention quantization.

💡 Suggested update outside this hunk

- quantize_bmm_attention = False - for key in mtq_cfg["quant_cfg"]: - if "bmm_quantizer" in key: - quantize_bmm_attention = True + quantize_bmm_attention = any( + "bmm_quantizer" in entry["quantizer_path"] + for entry in mtq.config.normalize_quant_cfg_list(mtq_cfg["quant_cfg"]) + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@examples/llm_eval/quantization_utils.py` around lines 36 - 49, The loop that detects BMM-attention entries fails because mtq_cfg["quant_cfg"] is now a list of dicts; update _quantize_model_with_dataset to handle both list and dict formats by inspecting each entry: if an entry is a dict, read its "quantizer_path" (or keys for legacy string entries) and test whether the path string contains "bmm_quantizer" or matches "*bmm_quantizer"; if mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old membership check. Ensure this detection is used before calling register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via "quantizer_path": "*bmm_quantizer" are not skipped.

coderabbitai · 2026-03-23T04:24:22Z

modelopt/torch/quantization/model_quant.py

+                    "quant_cfg": [
+                        # Disable all quantizers by default
+                        {"quantizer_path": "*", "enable": False},
                        # "num_bits" specifies the number of bits for quantization
                        # "axis" specifies the axis for quantization
-                        "*weight_quantizer": {"num_bits": 8, "axis": 0},
-                        "*input_quantizer": {"num_bits": 8, "axis": -1},
-
-                        # Default quantization settings
-                        "default": {"num_bits": 8, "axis": None},
-                    }
-                    "algorithm": "max"
+                        {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
+                        {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": -1}},
+                    ],
+                    "algorithm": "max",


⚠️ Potential issue | 🟡 Minor

Finish updating this quantize() doc block to the list-based schema.

Lines 183-191 show the new list-of-dicts format, but the prose immediately above still describes quant_cfg as a dictionary mapping wildcard keys to attributes. That leaves the public API docs contradictory in the same section.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/model_quant.py` around lines 183 - 191, The docstring for quantize() is out of sync: it still describes quant_cfg as a dict mapping wildcard keys, but the implementation and example use a list-of-dicts schema. Update the prose in the quantize() doc block to describe quant_cfg as a list where each item is a dict with keys like "quantizer_path" (wildcard pattern), optional "enable" (bool), optional "cfg" (dict, e.g., {"num_bits": int, "axis": int}), and that multiple entries are applied in order; also mention the top-level "algorithm" field (e.g., "max") and how entries such as {"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and {"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are interpreted by quantize()/quant_cfg.

modelopt/torch/quantization/utils/core_utils.py

coderabbitai · 2026-03-23T04:24:22Z

tests/_test_utils/torch/quantization/quantize_common.py

+    for entry in config["quant_cfg"]:
+        pat = (
+            entry["quantizer_path"]
+            if isinstance(entry, dict) and "quantizer_path" in entry
+            else entry[0]
+        )
+        if pat == "*weight_quantizer":
+            if isinstance(entry, dict) and "quantizer_path" in entry:
+                entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size}
+            else:
+                entry[1]["block_sizes"] = {-1: block_size}
+            break


⚠️ Potential issue | 🟡 Minor

Fail fast when *weight_quantizer is missing.

If no matching entry exists, this function currently returns without applying block_sizes, which can silently invalidate downstream test intent.

🔧 Proposed fix

def get_awq_config(algorithm="awq_lite", block_size=8): config = copy.deepcopy(mtq.INT4_AWQ_CFG) + found_weight_entry = False for entry in config["quant_cfg"]: pat = ( entry["quantizer_path"] if isinstance(entry, dict) and "quantizer_path" in entry else entry[0] ) if pat == "*weight_quantizer": + found_weight_entry = True if isinstance(entry, dict) and "quantizer_path" in entry: entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size} else: entry[1]["block_sizes"] = {-1: block_size} break + if not found_weight_entry: + raise KeyError("No '*weight_quantizer' entry found in quant_cfg")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/_test_utils/torch/quantization/quantize_common.py` around lines 50 - 61, The loop over config["quant_cfg"] looks for a "*weight_quantizer" entry and mutates its "block_sizes" but silently does nothing if no match is found; add a fail-fast check by tracking whether a matching entry was found (e.g., set a boolean flag when pat == "*weight_quantizer") and after the loop raise an explicit error (ValueError or AssertionError) if no match was found so tests fail early rather than silently proceeding without applying block_size to the intended entry.

coderabbitai · 2026-03-23T04:24:22Z

tests/gpu/torch/quantization/test_quantize_cuda.py

+        for entry in config["quant_cfg"]:
+            if entry.get("quantizer_path") == "*weight_quantizer":
+                entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}


⚠️ Potential issue | 🟡 Minor

Deep-copy the shared config before patching block_sizes.

config here is the imported mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG object. Mutating config["quant_cfg"] in place leaks the reduced block size into later parametrized cases and any other tests that reuse the same constant.

💡 Suggested fix

if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG: + import copy + + config = copy.deepcopy(config) # reduce block sizes for simple testing models for entry in config["quant_cfg"]: if entry.get("quantizer_path") == "*weight_quantizer": entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

for entry in config["quant_cfg"]:

if entry.get("quantizer_path") == "*weight_quantizer":

entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}

if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG:

import copy

config = copy.deepcopy(config)

# reduce block sizes for simple testing models

for entry in config["quant_cfg"]:

if entry.get("quantizer_path") == "*weight_quantizer":

entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/gpu/torch/quantization/test_quantize_cuda.py` around lines 133 - 135, The test is mutating the shared constant mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by editing config["quant_cfg"] in place; instead, create a deep copy of that constant (e.g., via copy.deepcopy) into the local variable config before patching so changes to entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8} only affect this test; locate the code that assigns config from mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a deep-copied version so modifications to quant_cfg/ block_sizes/quantizer_path do not leak to other parametrized cases.

coderabbitai · 2026-03-23T04:24:22Z

tests/unit/torch/quantization/test_autoquant.py

+    assert isinstance(config["quant_cfg"], list)
+    assert any(
+        (
+            entry["quantizer_path"]
+            if isinstance(entry, dict) and "quantizer_path" in entry
+            else entry[0]
+        )
+        == "*"
+        and (
+            entry.get("enable")
+            if isinstance(entry, dict) and "quantizer_path" in entry
+            else entry[1].get("enable")
+        )
+        is False
+        for entry in config["quant_cfg"]
+    )


⚠️ Potential issue | 🟡 Minor

Keep this assertion strict to QuantizerCfgEntry dicts.

The fallback to entry[0] / entry[1] means this test now accepts tuple-style payloads, so it no longer protects the new list-of-dicts contract from regressions. Assert on dict entries directly.

🧪 Proposed fix

- assert any( - ( - entry["quantizer_path"] - if isinstance(entry, dict) and "quantizer_path" in entry - else entry[0] - ) - == "*" - and ( - entry.get("enable") - if isinstance(entry, dict) and "quantizer_path" in entry - else entry[1].get("enable") - ) - is False - for entry in config["quant_cfg"] - ) + assert all(isinstance(entry, dict) for entry in config["quant_cfg"]) + assert any( + entry.get("quantizer_path") == "*" and entry.get("enable") is False + for entry in config["quant_cfg"] + )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/unit/torch/quantization/test_autoquant.py` around lines 492 - 507, The current assertion allows tuple-style payloads by falling back to entry[0]/entry[1]; restrict the check to dict-style QuantizerCfgEntry entries only by iterating over config["quant_cfg"] and filtering with isinstance(entry, dict), then assert that some dict has entry.get("quantizer_path") == "*" and entry.get("enable") is False; update the assertion around the quant_cfg check in tests/unit/torch/quantization/test_autoquant.py accordingly (referencing config["quant_cfg"] and the dict keys "quantizer_path" and "enable").

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

modelopt/torch/quantization/config.py (1)
1622-1628: ⚠️ Potential issue | 🟠 Major

Legacy top-level dict format bypasses normalization.

The before validator returns non-list inputs unchanged (line 1626-1627). If a user passes a legacy dict-style quant_cfg like {"*weight_quantizer": {"num_bits": 8}} directly (not wrapped in a list), it will not be normalized. This can cause need_calibration() and downstream consumers to fail when iterating, as they expect list entries.

Consider converting top-level dict inputs to list form before calling normalize_quant_cfg_list():
Suggested fix
 `@field_validator`("quant_cfg", mode="before")
 `@classmethod`
 def normalize_quant_cfg(cls, v):
     """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts."""
+    if isinstance(v, dict):
+        # Convert legacy top-level dict format to list of single-key dicts
+        v = [{k: val} for k, val in v.items()]
     if not isinstance(v, list):
         return v
     return normalize_quant_cfg_list(v)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The
before-validator normalize_quant_cfg currently returns non-list inputs
unchanged, so legacy top-level dict forms for quant_cfg bypass normalization;
update normalize_quant_cfg (the `@field_validator` for "quant_cfg") to detect a
dict input (e.g., mapping of pattern->entry) and wrap/convert it into a one-item
list (or otherwise translate into the list form expected by
normalize_quant_cfg_list) before calling normalize_quant_cfg_list, then return
the normalized list so downstream functions like need_calibration() always
receive a list of QuantizerCfgEntry dicts.

🧹 Nitpick comments (2)

modelopt/torch/quantization/config.py (2)

1630-1639: Consider validating list-form cfg entries as well.

The validator validates dict cfg values against QuantizerAttributeConfig, but skips list cfg (used for SequentialQuantizer). While downstream processing may catch errors, early validation would surface misconfigurations sooner.

Optional enhancement

 `@field_validator`("quant_cfg", mode="after")
 `@classmethod`
 def validate_quant_cfg_entries(cls, v):
     """Validate quantizer attribute configs to surface errors (e.g. invalid axis/block_sizes)."""
     qac_fields = set(QuantizerAttributeConfig.model_fields.keys())
     for entry in v:
         cfg = entry.get("cfg", {})
         if isinstance(cfg, dict) and qac_fields & set(cfg.keys()):
             QuantizerAttributeConfig.model_validate(cfg)
+        elif isinstance(cfg, list):
+            for sub_cfg in cfg:
+                if isinstance(sub_cfg, dict) and qac_fields & set(sub_cfg.keys()):
+                    QuantizerAttributeConfig.model_validate(sub_cfg)
     return v

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1630 - 1639, The
validator validate_quant_cfg_entries currently only validates dict cfg values
against QuantizerAttributeConfig but ignores list-form cfg (used by
SequentialQuantizer); update validate_quant_cfg_entries to also handle when
entry.get("cfg") is a list by iterating over its items and, for each item that
is a dict and contains keys in QuantizerAttributeConfig.model_fields, call
QuantizerAttributeConfig.model_validate(item) so each element is validated
early; preserve existing behavior for non-dict/non-list cfgs and return v as
before.

1554-1567: Replace assert with ValueError for clearer error messaging.

The assert at line 1556 will produce an unclear AssertionError if the legacy nn.* format has unexpected structure. A ValueError with a descriptive message would be more helpful for users debugging config issues.

Suggested improvement

     def _dict_to_entry(key: str, value) -> QuantizerCfgEntry:
         if isinstance(key, str) and key.startswith("nn."):
-            assert isinstance(value, dict) and len(value) == 1
+            if not isinstance(value, dict) or len(value) != 1:
+                raise ValueError(
+                    f"Legacy nn.*-scoped entry expects a single-key dict value, got: {value!r}"
+                )
             q_path, sub_cfg = next(iter(value.items()))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1554 - 1567, In
_dict_to_entry, replace the brittle assert inside the branch that handles legacy
"nn." keys with a ValueError that gives a clear message; specifically in the
function _dict_to_entry where you check if key.startswith("nn."), validate that
value is a dict with exactly one item and if not raise ValueError("invalid
legacy nn.* quantizer entry: expected a dict with a single mapping from
quantizer path to config, got: {repr(value)}") (include the offending value in
the message) so callers get a descriptive error instead of an AssertionError.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can silently mis-index string
keys when an unnormalized dict is present; update the logic in the loop that
determines name/cfg (the code using entry["quantizer_path"] vs
entry[0]/entry[1]) to explicitly check types: if entry is a dict require the
"quantizer_path" key and use entry.get("cfg"), otherwise require entry to be a
sequence/tuple/list of length >= 2 before using entry[0] and entry[1]; if the
input is neither a valid dict nor a valid 2+ element sequence, raise a clear
ValueError (or log and skip) so _not_dynamic and downstream logic never index
into a string or malformed structure. Ensure you reference quant_cfg, entry,
name, cfg, and _not_dynamic when implementing the checks.

---

Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged, so legacy top-level dict forms for quant_cfg
bypass normalization; update normalize_quant_cfg (the `@field_validator` for
"quant_cfg") to detect a dict input (e.g., mapping of pattern->entry) and
wrap/convert it into a one-item list (or otherwise translate into the list form
expected by normalize_quant_cfg_list) before calling normalize_quant_cfg_list,
then return the normalized list so downstream functions like need_calibration()
always receive a list of QuantizerCfgEntry dicts.

---

Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1630-1639: The validator validate_quant_cfg_entries currently only
validates dict cfg values against QuantizerAttributeConfig but ignores list-form
cfg (used by SequentialQuantizer); update validate_quant_cfg_entries to also
handle when entry.get("cfg") is a list by iterating over its items and, for each
item that is a dict and contains keys in QuantizerAttributeConfig.model_fields,
call QuantizerAttributeConfig.model_validate(item) so each element is validated
early; preserve existing behavior for non-dict/non-list cfgs and return v as
before.
- Around line 1554-1567: In _dict_to_entry, replace the brittle assert inside
the branch that handles legacy "nn." keys with a ValueError that gives a clear
message; specifically in the function _dict_to_entry where you check if
key.startswith("nn."), validate that value is a dict with exactly one item and
if not raise ValueError("invalid legacy nn.* quantizer entry: expected a dict
with a single mapping from quantizer path to config, got: {repr(value)}")
(include the offending value in the message) so callers get a descriptive error
instead of an AssertionError.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a4a13435-75c6-4c01-93a8-09afd75e09ad

📥 Commits

Reviewing files that changed from the base of the PR and between 5115452 and 5971a7b.

📒 Files selected for processing (2)

modelopt/torch/quantization/config.py
modelopt/torch/quantization/utils/core_utils.py

coderabbitai · 2026-03-23T07:52:44Z

modelopt/torch/quantization/config.py

+    quant_cfg: list = config.get("quant_cfg") or []
+    for entry in quant_cfg:
+        name = (
+            entry["quantizer_path"]
+            if isinstance(entry, dict) and "quantizer_path" in entry
+            else entry[0]
+        )
+        if isinstance(entry, dict) and "quantizer_path" in entry:
+            cfg = dict(entry.get("cfg") or {})
+            if "enable" in entry:
+                cfg["enable"] = entry["enable"]
+        else:
+            cfg = entry[1]
        if "weight_quantizer" in name:
            # We don't calibrate weight quantizer
            continue
        # quantization like W4A8 has a list of weight quantizers
        if isinstance(cfg, list):
            for _config in cfg:
                if _not_dynamic(_config):
-                    print(f"{cfg}: True")
                    return True
-        elif _not_dynamic(cfg):
-            print(f"{cfg}: True")
+        elif isinstance(cfg, dict) and _not_dynamic(cfg):
            return True


⚠️ Potential issue | 🟡 Minor

Fallback logic for tuple format may mask errors from unnormalized dict inputs.

The fallback to entry[0]/entry[1] (lines 1682, 1689) is intended for legacy tuple format support. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, and entry[0]/entry[1] would index into string characters (e.g., "*"[0] → "*"), leading to silent misbehavior rather than a clear error.

Consider adding explicit type checking or removing the tuple fallback if tuples are no longer a supported input format:

Suggested defensive check

quant_cfg: list = config.get("quant_cfg") or [] for entry in quant_cfg: + if not isinstance(entry, dict): + raise TypeError(f"Expected dict entry in quant_cfg, got {type(entry).__name__}: {entry!r}") name = ( entry["quantizer_path"] if isinstance(entry, dict) and "quantizer_path" in entry else entry[0] )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop over quant_cfg can silently mis-index string keys when an unnormalized dict is present; update the logic in the loop that determines name/cfg (the code using entry["quantizer_path"] vs entry[0]/entry[1]) to explicitly check types: if entry is a dict require the "quantizer_path" key and use entry.get("cfg"), otherwise require entry to be a sequence/tuple/list of length >= 2 before using entry[0] and entry[1]; if the input is neither a valid dict nor a valid 2+ element sequence, raise a clear ValueError (or log and skip) so _not_dynamic and downstream logic never index into a string or malformed structure. Ensure you reference quant_cfg, entry, name, cfg, and _not_dynamic when implementing the checks.

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (2)

modelopt/torch/quantization/config.py (2)

1677-1699: ⚠️ Potential issue | 🟡 Minor

Fallback logic for tuple format may mask errors from unnormalized dict inputs.

The fallback to entry[0]/entry[1] (lines 1682, 1689) is intended for legacy tuple format. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, and entry[0]/entry[1] would index into string characters (e.g., "*weight_quantizer"[0] → "*"), causing silent misbehavior.

Suggested defensive check

 quant_cfg: list = config.get("quant_cfg") or []
 for entry in quant_cfg:
+    if not isinstance(entry, (dict, tuple, list)):
+        raise TypeError(f"Expected dict or tuple entry in quant_cfg, got {type(entry).__name__}: {entry!r}")
     name = (
         entry["quantizer_path"]
         if isinstance(entry, dict) and "quantizer_path" in entry
         else entry[0]
     )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop
over quant_cfg can mis-index string keys when a dict slips through the legacy
tuple/path fallback; update the logic in the quant_cfg iteration (around
variables entry, name, cfg) to first check types explicitly: if entry is a dict
use entry["quantizer_path"] and entry.get("cfg"), else if entry is a sequence
(tuple/list) with length >= 2 use entry[0]/entry[1]; otherwise raise a clear
ValueError or skip with a logged warning. Ensure the branch that checks
"weight_quantizer" still uses the computed name and that _not_dynamic is only
called with validated cfg objects.

1622-1628: ⚠️ Potential issue | 🟠 Major

Legacy dict-style quant_cfg is not normalized by the before-validator.

The validator returns non-list inputs unchanged (line 1626-1627), so a legacy dict format like {"*weight_quantizer": {"num_bits": 8}} bypasses normalize_quant_cfg_list() entirely. If such a dict reaches downstream code expecting a list, it will fail.

Since normalize_quant_cfg_list() already supports converting legacy dict formats via _dict_to_entry(), the validator should route dicts to it:

Suggested fix

 `@field_validator`("quant_cfg", mode="before")
 `@classmethod`
 def normalize_quant_cfg(cls, v):
     """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts."""
-    if not isinstance(v, list):
-        return v
-    return normalize_quant_cfg_list(v)
+    if isinstance(v, dict):
+        # Legacy top-level dict format: convert to list of single-key dicts
+        v = [{k: val} for k, val in v.items()]
+    if isinstance(v, list):
+        return normalize_quant_cfg_list(v)
+    return v

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The
before-validator normalize_quant_cfg currently returns non-list values
unchanged, so legacy dict-style quant_cfg like {"*weight_quantizer":
{"num_bits": 8}} bypasses normalization; update normalize_quant_cfg to detect
dict inputs and pass them into normalize_quant_cfg_list (which uses
_dict_to_entry) so all legacy dict formats are converted into the expected list
of QuantizerCfgEntry objects—i.e., in normalize_quant_cfg (the `@field_validator`
for "quant_cfg") call normalize_quant_cfg_list(v) for dicts as well as lists so
downstream code always receives a normalized list.

🧹 Nitpick comments (1)

modelopt/torch/quantization/config.py (1)

162-168: TypedDict total=False conflicts with required quantizer_path documentation.

The docstring states quantizer_path is required, but total=False makes all fields optional at the type level. Consider using typing.Required from typing_extensions to mark quantizer_path as required while keeping other fields optional:

-class QuantizerCfgEntry(TypedDict, total=False):
+from typing_extensions import Required
+
+class QuantizerCfgEntry(TypedDict, total=False):
     """A single entry in a ``quant_cfg`` list."""
 
-    quantizer_path: str  # required; matched against quantizer module names
+    quantizer_path: Required[str]  # required; matched against quantizer module names
     parent_class: str | None  # optional; filters by pytorch module class name (e.g. "nn.Linear")

This ensures static type checkers flag missing quantizer_path while allowing other fields to be omitted.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 162 - 168,
QuantizerCfgEntry currently uses TypedDict(total=False) which makes all keys
optional but the docstring says quantizer_path is required; update
QuantizerCfgEntry to mark quantizer_path as required using
typing_extensions.Required (or typing.Required on newer Python): change the
TypedDict definition so quantizer_path: Required[str] while keeping
parent_class, cfg, and enable as optional, and import Required from
typing_extensions if not available in typing to ensure static type-checkers
enforce the required key.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 830-849: The function update_quant_cfg_with_kv_cache_quant can
raise a TypeError if quant_cfg.get("quant_cfg") returns a legacy dict instead of
a list; change it to defensively normalize inner by checking the type of
quant_cfg.get("quant_cfg") and converting a dict into a single-item list (or
treating None/Falsey as the existing default list), then set
quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg) and return the
deep-copied quant_cfg; update the logic around the inner variable in
update_quant_cfg_with_kv_cache_quant to handle list, dict, and None cases
explicitly.

---

Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can mis-index string keys when
a dict slips through the legacy tuple/path fallback; update the logic in the
quant_cfg iteration (around variables entry, name, cfg) to first check types
explicitly: if entry is a dict use entry["quantizer_path"] and entry.get("cfg"),
else if entry is a sequence (tuple/list) with length >= 2 use entry[0]/entry[1];
otherwise raise a clear ValueError or skip with a logged warning. Ensure the
branch that checks "weight_quantizer" still uses the computed name and that
_not_dynamic is only called with validated cfg objects.
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list values unchanged, so legacy dict-style quant_cfg like
{"*weight_quantizer": {"num_bits": 8}} bypasses normalization; update
normalize_quant_cfg to detect dict inputs and pass them into
normalize_quant_cfg_list (which uses _dict_to_entry) so all legacy dict formats
are converted into the expected list of QuantizerCfgEntry objects—i.e., in
normalize_quant_cfg (the `@field_validator` for "quant_cfg") call
normalize_quant_cfg_list(v) for dicts as well as lists so downstream code always
receives a normalized list.

---

Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 162-168: QuantizerCfgEntry currently uses TypedDict(total=False)
which makes all keys optional but the docstring says quantizer_path is required;
update QuantizerCfgEntry to mark quantizer_path as required using
typing_extensions.Required (or typing.Required on newer Python): change the
TypedDict definition so quantizer_path: Required[str] while keeping
parent_class, cfg, and enable as optional, and import Required from
typing_extensions if not available in typing to ensure static type-checkers
enforce the required key.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea0ca383-fd43-415d-8bef-252ce66fbfc5

📥 Commits

Reviewing files that changed from the base of the PR and between 5971a7b and fe2d2f3.

📒 Files selected for processing (2)

modelopt/torch/quantization/config.py
modelopt/torch/quantization/utils/core_utils.py

coderabbitai · 2026-03-23T08:00:31Z

modelopt/torch/quantization/utils/core_utils.py

 def update_quant_cfg_with_kv_cache_quant(
-    quant_cfg: dict[str, Any], kv_cache_quant_cfg: dict[str, Any]
+    quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry]
 ) -> dict[str, Any]:
-    """Update the quant_cfg with the kv cache quant_cfg."""
+    """Update the quant_cfg with the kv cache quant_cfg.
+
+    Args:
+        quant_cfg: The outer quantization config dict (with ``"quant_cfg"`` and ``"algorithm"`` keys).
+        kv_cache_quant_cfg: A list of :class:`QuantizerCfgEntry
+            <modelopt.torch.quantization.config.QuantizerCfgEntry>` dicts for KV cache quantization,
+            typically ``some_kv_cfg["quant_cfg"]``.
+
+    Returns:
+        A deep copy of ``quant_cfg`` with the KV cache entries appended to ``quant_cfg["quant_cfg"]``.
+    """
    # If quant_cfg["quant_cfg"] is None, it corresponds to only kv cache quantization case
    quant_cfg = copy.deepcopy(quant_cfg)
-    quant_cfg["quant_cfg"] = quant_cfg.get("quant_cfg") or {"default": {"enable": False}}
-    quant_cfg["quant_cfg"].update(kv_cache_quant_cfg)
+    inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [
+        {"quantizer_path": "*", "enable": False}
+    ]
+    quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg)


⚠️ Potential issue | 🟡 Minor

Potential issue if quant_cfg["quant_cfg"] is a legacy dict format.

The function assumes quant_cfg.get("quant_cfg") returns a list (or falsey value). If a caller passes an unnormalized config where quant_cfg["quant_cfg"] is still a legacy dict, line 849 would raise TypeError when attempting dict + list.

Consider normalizing the inner config or adding a type check:

Suggested defensive fix

+from modelopt.torch.quantization.config import normalize_quant_cfg_list + def update_quant_cfg_with_kv_cache_quant( quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry] ) -> dict[str, Any]: ... quant_cfg = copy.deepcopy(quant_cfg) - inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [ - {"quantizer_path": "*", "enable": False} - ] + raw_inner = quant_cfg.get("quant_cfg") + if raw_inner is None: + inner = [{"quantizer_path": "*", "enable": False}] + elif isinstance(raw_inner, dict): + # Legacy dict format - convert to list + inner = normalize_quant_cfg_list([{k: v} for k, v in raw_inner.items()]) + else: + inner = list(raw_inner) quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/utils/core_utils.py` around lines 830 - 849, The function update_quant_cfg_with_kv_cache_quant can raise a TypeError if quant_cfg.get("quant_cfg") returns a legacy dict instead of a list; change it to defensively normalize inner by checking the type of quant_cfg.get("quant_cfg") and converting a dict into a single-item list (or treating None/Falsey as the existing default list), then set quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg) and return the deep-copied quant_cfg; update the logic around the inner variable in update_quant_cfg_with_kv_cache_quant to handle list, dict, and None cases explicitly.

github-actions · 2026-03-23T20:23:32Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1094/
Built to branch `gh-pages` at 2026-03-26 06:14 UTC. Preview will be ready when the GitHub Pages deployment is complete.

realAsma · 2026-03-23T20:47:12Z

docs/source/guides/_quant_cfg.rst

+        # 1. Deny all quantizers by default
+        {"quantizer_path": "*", "enable": False},
+
+        # 2. Enable and configure the target quantizers
+        {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
+        {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},


Suggested change

# 1. Deny all quantizers by default

{"quantizer_path": "*", "enable": False},

# 2. Enable and configure the target quantizers

{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},

{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},

# 1. Deny all quantizers by default

{"*", {"enable": False}},

# 2. Enable and configure the target quantizers

{"*weight_quantizer": {"num_bits": 8, "axis": 0}},

{"*input_quantizer": {"num_bits": 8, "axis": None}},

Why do we need the following fields "quantizer_path", "cfg"

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/gpu/torch/quantization/test_real_quantize_cuda.py`:
- Around line 50-56: The loop over config["quant_cfg"] that looks for
entry.get("quantizer_path") == "*weight_quantizer" must fail fast if no such
entry is found and must validate the shape of entry["cfg"]["block_sizes"];
update the code that currently sets entry.setdefault("cfg", {})["block_sizes"] =
{...} to instead: scan config["quant_cfg"] and set a found flag when the
matching entry is located, then after the loop call pytest.fail or raise
AssertionError if not found; for the found entry, ensure entry.setdefault("cfg",
{}) contains a "block_sizes" dict and validate that it has the required keys
(e.g., -1 and "scale_bits") and that their values are integers (or otherwise of
expected type/shape), raising a test failure if the validation fails; apply the
same fail-fast + validation change to the other similar loop in the file that
handles the weight_quantizer (the block around the second occurrence).
- Around line 50-56: The test mutates shared quant config constants (e.g.,
mtq.INT4_AWQ_CFG) by updating entries in config["quant_cfg"], which can leak
state between tests; to fix, make a defensive deep copy of the config before
modifying it (e.g., copy.deepcopy(config) or deepcopy the specific
config["quant_cfg"] list) and perform the block_sizes modification on that copy
so the original mtq.INT4_AWQ_CFG remains unchanged; locate the code that
iterates over config["quant_cfg"] and change it to operate on the copied object
(references: config, config["quant_cfg"], entry, mtq.INT4_AWQ_CFG).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 14610ec4-1ae7-433b-b1df-66856f70d771

📥 Commits

Reviewing files that changed from the base of the PR and between fe2d2f3 and b9d67d3.

📒 Files selected for processing (1)

tests/gpu/torch/quantization/test_real_quantize_cuda.py

coderabbitai · 2026-03-25T18:34:20Z

tests/gpu/torch/quantization/test_real_quantize_cuda.py

+        for entry in config["quant_cfg"]:
+            if entry.get("quantizer_path") == "*weight_quantizer":
+                entry.setdefault("cfg", {})["block_sizes"] = {
+                    -1: 16,
+                    "scale_bits": 8,
+                }
+                break


⚠️ Potential issue | 🟡 Minor

Fail fast if *weight_quantizer entry is missing (and validate cfg shape).

The current loop silently does nothing when the entry is absent, which can hide config-schema regressions in tests.

Suggested fix

for entry in config["quant_cfg"]: if entry.get("quantizer_path") == "*weight_quantizer": - entry.setdefault("cfg", {})["block_sizes"] = { + cfg = entry.setdefault("cfg", {}) + if not isinstance(cfg, dict): + pytest.fail("Expected dict cfg for *weight_quantizer in INT4_AWQ_CFG") + cfg["block_sizes"] = { "scale_bits": 8, } break + else: + pytest.fail("INT4_AWQ_CFG missing *weight_quantizer entry")

Also applies to: 107-113

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/gpu/torch/quantization/test_real_quantize_cuda.py` around lines 50 - 56, The loop over config["quant_cfg"] that looks for entry.get("quantizer_path") == "*weight_quantizer" must fail fast if no such entry is found and must validate the shape of entry["cfg"]["block_sizes"]; update the code that currently sets entry.setdefault("cfg", {})["block_sizes"] = {...} to instead: scan config["quant_cfg"] and set a found flag when the matching entry is located, then after the loop call pytest.fail or raise AssertionError if not found; for the found entry, ensure entry.setdefault("cfg", {}) contains a "block_sizes" dict and validate that it has the required keys (e.g., -1 and "scale_bits") and that their values are integers (or otherwise of expected type/shape), raising a test failure if the validation fails; apply the same fail-fast + validation change to the other similar loop in the file that handles the weight_quantizer (the block around the second occurrence).

⚠️ Potential issue | 🟠 Major

Avoid mutating shared quant config constants in-place across tests.

These blocks mutate mtq.INT4_AWQ_CFG through config, which can leak state to later tests and make runs order-dependent.

Suggested fix

+import copy @@ def test_real_quantize(model_cls, config): """Test quantize function can run without problems.""" + config = copy.deepcopy(config) @@ def test_save_restore(model_cls, config): + config = copy.deepcopy(config) # update config to fit test cases

Also applies to: 107-113

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/gpu/torch/quantization/test_real_quantize_cuda.py` around lines 50 - 56, The test mutates shared quant config constants (e.g., mtq.INT4_AWQ_CFG) by updating entries in config["quant_cfg"], which can leak state between tests; to fix, make a defensive deep copy of the config before modifying it (e.g., copy.deepcopy(config) or deepcopy the specific config["quant_cfg"] list) and perform the block_sizes modification on that copy so the original mtq.INT4_AWQ_CFG remains unchanged; locate the code that iterates over config["quant_cfg"] and change it to operate on the copied object (references: config, config["quant_cfg"], entry, mtq.INT4_AWQ_CFG).

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

examples/llm_ptq/example_utils.py (1)

206-247: ⚠️ Potential issue | 🟠 Major

Normalize before editing quant_cfg, and don't assume the first weight rule wins.

build_quant_cfg() now mutates quant_cfg["quant_cfg"] before the QuantizeConfig validator can normalize legacy dict-style configs, and the AWQ branch grabs the first *weight_quantizer entry even though list configs are ordered last-wins. That means old configs can fail here, and appended weight overrides can be silently ignored.

🛠️ Suggested fix

+from modelopt.torch.quantization.config import normalize_quant_cfg_list
...
 def build_quant_cfg(
     qformat,
     quant_cfg,
     awq_block_size,
     model_type,
     moe_calib_experts_ratio: float | None = None,
 ) -> dict[str, Any]:
     quant_cfg = copy.deepcopy(quant_cfg)
+    quant_cfg["quant_cfg"] = normalize_quant_cfg_list(quant_cfg.get("quant_cfg", []))
+
     if "awq" in str(quant_cfg.get("algorithm")):
         weight_quantizer_entry = next(
             e
-            for e in quant_cfg["quant_cfg"]
-            if isinstance(e, dict) and e.get("quantizer_path") == "*weight_quantizer"
+            for e in reversed(quant_cfg["quant_cfg"])
+            if e.get("quantizer_path") == "*weight_quantizer" and e.get("cfg") is not None
         )
         weight_quantizer = weight_quantizer_entry.get("cfg", {})

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/example_utils.py` around lines 206 - 247, In
build_quant_cfg, avoid mutating quant_cfg["quant_cfg"] before it is normalized
and stop assuming the first "*weight_quantizer" rule wins: first run the
normalization/validation step used elsewhere (i.e., the QuantizeConfig or
whatever normalizer the codebase provides) on quant_cfg so legacy dict-style
entries are canonicalized, then when handling the AWQ branch find the effective
weight_quantizer by scanning quant_cfg["quant_cfg"] in reverse (or otherwise
selecting the last matching entry) to respect last-wins semantics; update that
entry's cfg (and apply awq_block_size to weight_quantizer["block_sizes"][-1])
and only then adjust quant_cfg["algorithm"] for AWQ or other model-specific
tweaks.

modelopt/torch/quantization/conversion.py (1)

423-454: ⚠️ Potential issue | 🟠 Major

Enforce the no-sequential-list contract in this context manager.

The docstring says cfg: [...] entries are unsupported here, but the function no longer rejects them. If one gets through, set_quantizer_by_cfg() can replace foo.quantizer with a SequentialQuantizer while the exit path only snapshots/restores TensorQuantizer names, so restoration can miss the swap or fail on new child names like foo.quantizer.0.

🛠️ Suggested fix

 def set_quantizer_by_cfg_context(quant_model: nn.Module, quant_cfg: QuantizeQuantCfgType):
     """Context manager that temporarily applies a quantization config and restores the original state on exit.
@@
     """
     quant_cfg = normalize_quant_cfg_list(quant_cfg)
+    if any(isinstance(entry["cfg"], list) for entry in quant_cfg if entry["cfg"] is not None):
+        raise ValueError(
+            "set_quantizer_by_cfg_context() does not support SequentialQuantizer cfg lists."
+        )
 
     original_attributes = {}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/conversion.py` around lines 423 - 454, The
context manager set_quantizer_by_cfg_context must reject sequential cfg lists
before mutating the model: validate quant_cfg (after normalize_quant_cfg_list or
on the original input) and if any entry is a list/sequence (i.e., a sequential
cfg that would produce a SequentialQuantizer) raise a ValueError with a clear
message; perform this check prior to calling set_quantizer_by_cfg so
set_quantizer_by_cfg_context never allows a config that could swap in a
SequentialQuantizer and break the restore logic which only looks for
TensorQuantizer instances.

♻️ Duplicate comments (2)

docs/source/guides/_pytorch_quantization.rst (1)
258-272: ⚠️ Potential issue | 🟡 Minor

Make this example self-contained.

_default_disabled_quantizer_cfg is still undefined in this snippet, so the block fails when copied verbatim. Please qualify/import it or inline those exclusions instead of referencing the bare private helper.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/guides/_pytorch_quantization.rst` around lines 258 - 272, The
example references an undefined symbol _default_disabled_quantizer_cfg causing
copy-paste failures; fix by either importing or inlining its entries: update the
snippet that defines MY_CUSTOM_CONFIG to remove the bare reference and instead
import the helper (e.g., from the module that exports
_default_disabled_quantizer_cfg) or expand its contents directly into the
"quant_cfg" list so the configuration is self-contained; ensure the change is
applied to the MY_CUSTOM_CONFIG definition and keep the existing entries for
"*", "*weight_quantizer", and "*input_quantizer" intact.
modelopt/torch/quantization/conversion.py (1)
340-356: ⚠️ Potential issue | 🔴 Critical

Full replacements still don't swap incompatible quantizer shapes.

The list branch only warns when len(attributes) != len(module), and the scalar branch mutates an existing SequentialQuantizer in place. Both cases leave stale sub-quantizers around instead of honoring the advertised “cfg is a complete replacement” behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/conversion.py` around lines 340 - 356, The loop
handling quantizer replacements currently mutates existing SequentialQuantizer
instances or only warns on length mismatches, leaving stale sub-quantizers;
change it to perform full replacements: when _match_quantizer(...) finds a
target and attributes is a list, always construct a new
SequentialQuantizer(*(TensorQuantizer() for _ in range(len(attributes)))) and
replace the submodule via quant_model.get_submodule(...)/setattr before calling
set_from_attribute_config on the new module (don’t just warn when
len(attributes) != len(module)); when attributes is not a list and the existing
module is a SequentialQuantizer, replace it with a fresh TensorQuantizer() in
the parent and then call set_from_attribute_config on that new TensorQuantizer
instead of mutating the old SequentialQuantizer in place; use the same name
resolution logic already present (name.rpartition / name.split) to locate parent
and attribute name.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/conversion.py`:
- Around line 258-263: The current truthy check treats an explicit empty config
({}) the same as omitted config, so change the condition to detect only omitted
configs by replacing the "if not cfg" logic with an explicit None check (e.g.,
"if cfg is None") so that normalize_quant_cfg_list()'s semantics are respected;
this ensures that when cfg == {} the else branch runs and
set_quantizer_attributes_partial/other reset logic for quant_model,
quantizer_path and parent_class will execute to reset attributes rather than
only toggling "enable".

---

Outside diff comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 206-247: In build_quant_cfg, avoid mutating quant_cfg["quant_cfg"]
before it is normalized and stop assuming the first "*weight_quantizer" rule
wins: first run the normalization/validation step used elsewhere (i.e., the
QuantizeConfig or whatever normalizer the codebase provides) on quant_cfg so
legacy dict-style entries are canonicalized, then when handling the AWQ branch
find the effective weight_quantizer by scanning quant_cfg["quant_cfg"] in
reverse (or otherwise selecting the last matching entry) to respect last-wins
semantics; update that entry's cfg (and apply awq_block_size to
weight_quantizer["block_sizes"][-1]) and only then adjust quant_cfg["algorithm"]
for AWQ or other model-specific tweaks.

In `@modelopt/torch/quantization/conversion.py`:
- Around line 423-454: The context manager set_quantizer_by_cfg_context must
reject sequential cfg lists before mutating the model: validate quant_cfg (after
normalize_quant_cfg_list or on the original input) and if any entry is a
list/sequence (i.e., a sequential cfg that would produce a SequentialQuantizer)
raise a ValueError with a clear message; perform this check prior to calling
set_quantizer_by_cfg so set_quantizer_by_cfg_context never allows a config that
could swap in a SequentialQuantizer and break the restore logic which only looks
for TensorQuantizer instances.

---

Duplicate comments:
In `@docs/source/guides/_pytorch_quantization.rst`:
- Around line 258-272: The example references an undefined symbol
_default_disabled_quantizer_cfg causing copy-paste failures; fix by either
importing or inlining its entries: update the snippet that defines
MY_CUSTOM_CONFIG to remove the bare reference and instead import the helper
(e.g., from the module that exports _default_disabled_quantizer_cfg) or expand
its contents directly into the "quant_cfg" list so the configuration is
self-contained; ensure the change is applied to the MY_CUSTOM_CONFIG definition
and keep the existing entries for "*", "*weight_quantizer", and
"*input_quantizer" intact.

In `@modelopt/torch/quantization/conversion.py`:
- Around line 340-356: The loop handling quantizer replacements currently
mutates existing SequentialQuantizer instances or only warns on length
mismatches, leaving stale sub-quantizers; change it to perform full
replacements: when _match_quantizer(...) finds a target and attributes is a
list, always construct a new SequentialQuantizer(*(TensorQuantizer() for _ in
range(len(attributes)))) and replace the submodule via
quant_model.get_submodule(...)/setattr before calling set_from_attribute_config
on the new module (don’t just warn when len(attributes) != len(module)); when
attributes is not a list and the existing module is a SequentialQuantizer,
replace it with a fresh TensorQuantizer() in the parent and then call
set_from_attribute_config on that new TensorQuantizer instead of mutating the
old SequentialQuantizer in place; use the same name resolution logic already
present (name.rpartition / name.split) to locate parent and attribute name.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 53589e40-16b3-46f6-8ff4-de3dd80a9420

📥 Commits

Reviewing files that changed from the base of the PR and between b9d67d3 and 823d602.

📒 Files selected for processing (5)

docs/source/guides/_pytorch_quantization.rst
examples/llm_ptq/example_utils.py
modelopt/torch/export/unified_export_hf.py
modelopt/torch/quantization/conversion.py
tests/unit/torch/quantization/plugins/test_peft.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/unit/torch/quantization/plugins/test_peft.py
modelopt/torch/export/unified_export_hf.py

coderabbitai · 2026-03-25T22:10:34Z

modelopt/torch/quantization/conversion.py

+        if not cfg:
+            # No cfg: only toggle the enable state, leave all other attributes unchanged.
+            set_quantizer_attributes_partial(
+                quant_model, quantizer_path, {"enable": enable}, parent_class
            )
-            for sub_pattern, sub_cfg in cfg.items():
-                set_quantizer_attribute(quant_model, sub_pattern, sub_cfg, parent_class)
-            continue
-        set_quantizer_attribute(quant_model, pattern, cfg)
+        else:


⚠️ Potential issue | 🟠 Major

Treat cfg={} as a full reset, not as “no cfg”.

normalize_quant_cfg_list() already uses None to mean “cfg omitted”. The truthiness check here collapses an explicit empty config into the enable-only path, so {"cfg": {}} preserves old attributes instead of resetting them to defaults.

🛠️ Suggested fix

- if not cfg: + if cfg is None:

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/conversion.py` around lines 258 - 263, The current truthy check treats an explicit empty config ({}) the same as omitted config, so change the condition to detect only omitted configs by replacing the "if not cfg" logic with an explicit None check (e.g., "if cfg is None") so that normalize_quant_cfg_list()'s semantics are respected; this ensures that when cfg == {} the else branch runs and set_quantizer_attributes_partial/other reset logic for quant_model, quantizer_path and parent_class will execute to reset attributes rather than only toggling "enable".

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu added 23 commits March 22, 2026 23:44

Make quant_cfg a list of tuples, dict is too much

d99e4ae

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

yaml config format update

b5bea21

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix some extra quant_cfg

1b8c4bf

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix tests

ab4daec

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

rename from format to cfg

4ffd2fa

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

pattern to path

d599103

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

flatten the inner configs

fc53877

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

get rid of the special 'default'

a19335f

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

remove default

04014ec

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

match yaml file format

22134ef

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix tests

f52d213

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix guide

8f59142

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

default to disable

3cda60f

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

tuple format is not needed, remove all of them

43f9a1a

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

final remove tuple format

4549001

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

add atomicity to doc

30bb041

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix more quant_cfg args

ff9fdd9

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

new partial set quantizer cfg for internal merging logic

dc915f5

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

enable semantic documentation

10c4cdd

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

revert accidental test change

a03d975

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix mypy

fb3bb07

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 192ea05 to fb3bb07 Compare March 23, 2026 00:19

new tests and fix existing tests

aecf832

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

python < 3.12

5115452

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 4f38294 to 5115452 Compare March 23, 2026 03:18

shengliangxu requested review from a team as code owners March 23, 2026 04:11

shengliangxu requested review from kaix-nv, kinjalpatel27, sugunav14, vishalpandya1990, yeyu-nvidia and ynankani March 23, 2026 04:11

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

shengliangxu added 2 commits March 23, 2026 07:50

more fix dict to list

a481bd1

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

KV config has only quant_cfg meaningful

fe2d2f3

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 5971a7b to fe2d2f3 Compare March 23, 2026 07:51

coderabbitai bot reviewed Mar 23, 2026

View reviewed changes

Merge branch 'main' into shengliangx/quant_cfg-list

3a3b112

realAsma reviewed Mar 23, 2026

View reviewed changes

fix tests

b9d67d3

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

Merge branch 'main' into shengliangx/quant_cfg-list

823d602

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

shengliangxu added 4 commits March 26, 2026 00:33

fix: entry is a dict

9bcd06e

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix megatron tests

2721483

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

fix deepseek example semantic

9752f05

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

more fixes

cd65849

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>

Conversation

shengliangxu commented Mar 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Summary

Motivation

New format

Changes

Backward compatibility

Test coverage

Additional Information

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Mar 22, 2026

Uh oh!

coderabbitai bot commented Mar 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

codecov bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-03-26 06:14 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

realAsma Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

realAsma Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

shengliangxu commented Mar 22, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 22, 2026 •

edited

Loading

codecov bot commented Mar 23, 2026 •

edited

Loading

github-actions bot commented Mar 23, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-03-26 06:14 UTC.
Preview will be ready when the GitHub Pages deployment is complete.