Skip to content

[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094

Open
shengliangxu wants to merge 34 commits intomainfrom
shengliangx/quant_cfg-list
Open

[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094
shengliangxu wants to merge 34 commits intomainfrom
shengliangx/quant_cfg-list

Conversation

@shengliangxu
Copy link
Copy Markdown
Contributor

@shengliangxu shengliangxu commented Mar 22, 2026

What does this PR do?

Summary

Redesigns the quant_cfg configuration format in ModelOpt's PyTorch quantization stack, replacing the previous dict-based format with an ordered list of typed QuantizerCfgEntry dicts.

Motivation

The old quant_cfg dict had several pain points:

  • Ambiguous precedence: no explicit way to reason about which entry wins when multiple keys match a quantizer
  • Mixed key namespaces: wildcard paths and PyTorch class names lived in the same dict level, requiring ad-hoc dispatch
  • Magic "default" key: an implicit, undocumented catch-all that was easy to misuse
  • Poor composability: merging two configs required dict updates that silently discarded keys
  • No YAML round-trip fidelity: the nested structure couldn't be expressed cleanly in YAML
New format

quant_cfg is now an ordered list of QuantizerCfgEntry TypedDicts. Each entry has:

  • quantizer_path (required): fnmatch wildcard matched against quantizer module names
  • cfg (optional): dict (or list of dicts) of QuantizerAttributeConfig fields
  • enable (optional): toggles quantizer on/off independently of cfg
  • parent_class (optional): restricts match to quantizers inside the given PyTorch module class

Entries are applied in list order; later entries override earlier ones. The canonical pattern is deny-all first (_base_disable_all), then selectively re-enable, then apply standard exclusions (_default_disabled_quantizer_cfg).

Changes
  • config.py: QuantizerCfgEntry TypedDict and normalize_quant_cfg_list() added; _default_disabled_quantizer_cfg / _mamba_moe_disabled_quantizer_cfg converted from dicts to lists; all built-in configs (INT8_DEFAULT_CFG, FP8_DEFAULT_CFG, NVFP4_DEFAULT_CFG, etc.) converted to list format; two QuantizeConfig field validators added for normalization and early validation
  • conversion.py: set_quantizer_by_cfg rewritten to iterate the list directly; set_quantizer_attributes_full and set_quantizer_attributes_partial distinguished; parent_class resolution uses new _DMRegistryCls.get_nn_class() against the original nn.Module class
  • dynamic.py: _DMRegistryCls.get_nn_class() added to resolve string class names to their original nn.Module subclass for isinstance checks
  • tensor_quantizer.py: set_from_attribute_config signature narrowed to QuantizerAttributeConfig; uses model_dump() (not exclude_unset) to enforce entry atomicity — unset fields reset to defaults rather than inheriting from prior entries
  • YAML recipes: all modelopt_recipes/general/ptq/*.yml files updated to the new list format
  • Docs: new docs/source/guides/_quant_cfg.rst covering entry format, ordering, atomicity, and common patterns
Backward compatibility

normalize_quant_cfg_list() is called automatically by the QuantizeConfig Pydantic validator, so existing code using the old single-key dict format ({"*weight_quantizer": {"num_bits": 8}}) continues to work without modification.

Test coverage

  • Unit testing: new unit tests to cover the semantic changes
  • system testing:
python examples/llm_ptq/hf_ptq.py \
      --model Qwen/Qwen3-8B  \
      --recipe general/ptq/fp8_default-fp8_kv \
      --export_path=build/fp8_default-fp8_kv42  \
      --calib_size=16 \
      --batch_size=0 \
      --trust_remote_code 
      --export_fmt=hf

Additional Information

Summary by CodeRabbit

  • New Features

    • Introduced an ordered, list-based quantization config format with per-rule selectors, explicit enable/cfg fields, and last-match-wins precedence.
  • Documentation

    • Added a dedicated guide for the new quant_cfg schema and updated guides, examples, and notebooks to show the list-based format.
  • Refactor

    • Migrated configs, APIs, and internal flows to the new list-oriented quant_cfg shape for consistent precedence and validation.
  • Tests

    • Updated and added tests to validate normalization, validation rules, and list-based behavior.

@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Mar 22, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 22, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Refactors quantization configuration from a wildcard-keyed dict schema to an ordered list-of-entry schema (entries with quantizer_path, optional parent_class, optional cfg, optional enable) and updates runtime APIs, normalization/validation, examples, recipes, tests, backends, and docs to the new format.

Changes

Cohort / File(s) Summary
Core Config & Types
modelopt/torch/quantization/config.py
Replaced dict-based quant_cfg with list[QuantizerCfgEntry]; added QuantizerCfgEntry TypedDict, normalize_quant_cfg_list(), Pydantic validators, canonical defaults, and stricter validation (entries must include cfg or enable).
Conversion API & Attribute Setters
modelopt/torch/quantization/conversion.py
set_quantizer_by_cfg/set_quantizer_by_cfg_context now accept normalized list configs; added set_quantizer_attributes_full and set_quantizer_attributes_partial; centralized matching (_match_quantizer) and split full-vs-partial application logic.
Quantization Runtime & Model Integration
modelopt/torch/quantization/model_quant.py, modelopt/torch/quantization/model_calib.py, modelopt/torch/quantization/algorithms.py, modelopt/torch/quantization/compress.py
Adapted codepaths to read/produce list-form quant_cfg; updated estimation/compression logic to extract nested cfg; replaced set_quantizer_attribute usage with partial/full attribute setters; KV-cache and disable/enable flows updated to append list entries.
Quantizer Implementations & Utilities
modelopt/torch/quantization/nn/modules/tensor_quantizer.py, modelopt/torch/quantization/nn/modules/..., modelopt/torch/quantization/utils/core_utils.py
Adjusted type annotations and setters (axis, block_sizes) to keep calibrator state in sync; updated helper APIs to accept/return list-form config and to append KV-cache entries; added/updated public typing imports.
Backends Availability Checks
modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py, .../nvfp4_gemm.py
Switched logic to scan quant_cfg list for *input_quantizer/*weight_quantizer entries and assert extracted cfg types before validating module compatibility.
Conversion Entry Points & Export
modelopt/torch/quantization/conversion.py, modelopt/torch/export/unified_export_hf.py, modelopt/onnx/llm_export_utils/quantization_utils.py
Pass normalized list-form quant_cfg into conversion and export helpers; get_quant_config now rebuilds/appends lm_head overrides as list entries.
Examples & Notebooks
examples/... (examples/diffusers/..., examples/llm_ptq/..., examples/llm_autodeploy/run_auto_quantize.py, examples/llm_eval/..., examples/vllm_serve/fakequant_worker.py, examples/windows/..., examples/llm_qat/main.py, notebooks)`
Converted example configs and helper logic from dict-based quant_cfg to list entries; updated lookups from direct dict indexing to list traversal/search/append; adjusted a few function signatures to accept list-form KV configs.
Recipes (YAML)
modelopt_recipes/general/ptq/*.yml
Refactored recipe quant_cfg sections from key-mapped forms to ordered lists of rule objects (quantizer_path, enable, cfg, optional parent_class); replaced implicit default semantics with explicit catch-all entries.
Tests & Test Utilities
tests/_test_utils/..., tests/.../torch/quantization/*, tests/unit/torch/quantization/*, tests/gpu/...
Migrated fixtures, helpers and tests to list-based quant_cfg; updated call sites to use set_quantizer_attributes_partial/set_quantizer_attributes_full; added tests for normalize_quant_cfg_list() and new attribute-setter semantics; adapted many assertions and search logic to iterate over list entries.
Docs & Guides
docs/source/guides/1_quantization.rst, docs/source/guides/_pytorch_quantization.rst, docs/source/guides/_quant_cfg.rst
Added _quant_cfg.rst documenting the new list schema, per-entry fields, validation rules, precedence/atomicity semantics; updated PyTorch quantization guide examples to deep-copy defaults and to use list-based quant_cfg; added page to toctree.
Misc Examples / Config Constants
examples/diffusers/quantization/config.py, tests/_test_utils/...
Reworked module-level default config constants to list form; adjusted helper setters to inject trt_high_precision_dtype or calibrator configs into nested cfg entries via list traversal/appends.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 60.40% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately describes the main change: converting quant_cfg from dict-based to list-based format with design documented in _quant_cfg.rst.
Security Anti-Patterns ✅ Passed No security anti-patterns found: torch.load, numpy.load, trust_remote_code, eval/exec, or #nosec comments absent from all modified files.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch shengliangx/quant_cfg-list

Comment @coderabbitai help to get the list of available commands and usage tips.

Right now the quant_cfg is a dict, but we are using the quant_cfg as if
it is a list. When we apply the quant_cfg, we enumerate the items in the
dict and apply the config one by one in
modelopt/torch/quantization/conversion.py. This implementation actually
has the semantic that the latter configs has higher precedence than the
former configs. However, dicts do not have reliable ordering.

Therefore, we make quant_cfg a list of patterns:

1. The latter config patterns have higher precedence. A latter config in
   the list overrides a fomer config if they target the same module.

2. A config to each module is atomic, each config provides the full
   information. We do not compose a quant module config from multiple
   config lines

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
…s_partial

set_quantizer_attributes_full updates the full quantizer attributes, it
has the atomic semantic

set_quantizer_attributes_partial updates just a partial set of quantizer
attributes, it has the merge semantic

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 192ea05 to fb3bb07 Compare March 23, 2026 00:19
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 23, 2026

Codecov Report

❌ Patch coverage is 83.76963% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.18%. Comparing base (291498b) to head (cd65849).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/quantization/algorithms.py 75.00% 8 Missing ⚠️
modelopt/torch/quantization/utils/core_utils.py 14.28% 6 Missing ⚠️
modelopt/torch/quantization/backends/nvfp4_gemm.py 0.00% 5 Missing ⚠️
modelopt/torch/quantization/conversion.py 90.74% 5 Missing ⚠️
modelopt/torch/quantization/config.py 94.59% 4 Missing ⚠️
modelopt/torch/quantization/model_quant.py 50.00% 2 Missing ⚠️
modelopt/torch/quantization/compress.py 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1094      +/-   ##
==========================================
- Coverage   70.21%   70.18%   -0.03%     
==========================================
  Files         228      229       +1     
  Lines       25952    26113     +161     
==========================================
+ Hits        18221    18327     +106     
- Misses       7731     7786      +55     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 4f38294 to 5115452 Compare March 23, 2026 03:18
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 20

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
examples/llm_autodeploy/run_auto_quantize.py (1)

134-143: ⚠️ Potential issue | 🔴 Critical

Remove hardcoded trust_remote_code=True in HF model and tokenizer loading.

Both AutoModelForCausalLM.from_pretrained() (line 134) and AutoTokenizer.from_pretrained() (line 142) hardcode trust_remote_code=True, creating a critical RCE vector from untrusted model repositories. Expose this as a user-controlled parameter defaulting to False in the modelopt_ptq() function signature.

🔒 Proposed fix
 def modelopt_ptq(
     model_path: str,
     output_dir: str,
     qformat: str | None = None,
     num_samples: int = 512,
     auto_quantize_bits: float | None = None,
     calib_dataset: str = "cnn_dailymail",
     calib_batch_size: int = 8,
+    trust_remote_code: bool = False,
 ) -> torch.nn.Module:
     """Quantize the model with modelopt."""
     model = AutoModelForCausalLM.from_pretrained(
-        model_path, trust_remote_code=True, torch_dtype="auto", device_map="auto"
+        model_path,
+        trust_remote_code=trust_remote_code,
+        torch_dtype="auto",
+        device_map="auto",
     )
     model.eval()
 
     tokenizer = AutoTokenizer.from_pretrained(
         model_path,
         model_max_length=2048,
         padding_side="left",
-        trust_remote_code=True,
+        trust_remote_code=trust_remote_code,
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_autodeploy/run_auto_quantize.py` around lines 134 - 143, The
code currently hardcodes trust_remote_code=True when calling
AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained, creating
an RCE risk; update the modelopt_ptq(...) function signature to accept a new
parameter (e.g., trust_remote_code: bool = False) and replace the hardcoded True
with that parameter for both AutoModelForCausalLM.from_pretrained and
AutoTokenizer.from_pretrained so callers can opt-in explicitly while defaulting
to False.
examples/llm_ptq/example_utils.py (1)

208-247: ⚠️ Potential issue | 🟠 Major

Normalize quant_cfg["quant_cfg"] before mutating it.

Lines 210 and 244-247 now assume the new list schema. If a caller still passes a legacy dict-style config, the AWQ branch will fail before mtq.quantize() reaches the backward-compat normalizer, and the phi4mm branch will hit .append() on a dict. Even in list form, walking from the front can rewrite a shadowed *weight_quantizer rule instead of the effective last match.

💡 Suggested fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list
@@
 def build_quant_cfg(
@@
 ) -> dict[str, Any]:
     quant_cfg = copy.deepcopy(quant_cfg)
+    quant_cfg["quant_cfg"] = normalize_quant_cfg_list(quant_cfg["quant_cfg"])
     if "awq" in str(quant_cfg.get("algorithm")):
         weight_quantizer_entry = next(
             e
-            for e in quant_cfg["quant_cfg"]
-            if isinstance(e, dict) and e.get("quantizer_path") == "*weight_quantizer"
+            for e in reversed(quant_cfg["quant_cfg"])
+            if e.get("quantizer_path") == "*weight_quantizer"
         )
         weight_quantizer = weight_quantizer_entry.get("cfg", {})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/example_utils.py` around lines 208 - 247, The code mutates
quant_cfg["quant_cfg"] assuming it's a list; normalize quant_cfg["quant_cfg"] to
the list schema up front (convert legacy dict form into the new list form)
before any mutations so weight_quantizer lookup and .append() calls are safe;
when locating the weight quantizer use a reverse-scan (e.g., search from the
end) to find the effective "*weight_quantizer" entry referenced by
weight_quantizer_entry so awq_block_size handling (weight_quantizer_entry /
weight_quantizer) and the model_type == "phi4mm" .append() calls operate on the
correct, normalized list.
modelopt/onnx/llm_export_utils/quantization_utils.py (1)

57-111: ⚠️ Potential issue | 🟠 Major

Copy the preset before rewriting quant_cfg.

Line 111 mutates mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place. With list-based rules, repeated calls can now retain or duplicate previously appended lm_head entries, so get_quant_config() becomes order-dependent across calls in the same process.

💡 Suggested fix
+import copy
 import time
@@
     if precision == "fp8":
-        quant_cfg = mtq.FP8_DEFAULT_CFG
+        quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG)
 
     elif precision == "nvfp4":
-        quant_cfg = mtq.NVFP4_DEFAULT_CFG
+        quant_cfg = copy.deepcopy(mtq.NVFP4_DEFAULT_CFG)
 
     elif precision == "int4_awq":
-        quant_cfg = mtq.INT4_AWQ_CFG
+        quant_cfg = copy.deepcopy(mtq.INT4_AWQ_CFG)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py` around lines 57 - 111,
get_quant_config currently mutates the shared presets mtq.FP8_DEFAULT_CFG /
mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which causes duplicated
lm_head entries on repeated calls; fix by making a deep copy of the chosen
preset (e.g., import copy and do quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG)
/ copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) / copy.deepcopy(mtq.INT4_AWQ_CFG) inside
get_quant_config) before building quant_cfg_list and appending lm_head entries
so the original mtq presets remain unchanged.
modelopt/torch/quantization/conversion.py (1)

423-454: ⚠️ Potential issue | 🟠 Major

Reject sequential cfgs here or restore full module topology.

This context manager snapshots only TensorQuantizer properties. If a temporary config contains a list cfg, set_quantizer_by_cfg() can replace a TensorQuantizer with a SequentialQuantizer, and the exit path never converts it back. The docstring already says sequential cfg lists are unsupported here, but that isn’t enforced.

🧯 Suggested fix
     quant_cfg = normalize_quant_cfg_list(quant_cfg)
+    if any(isinstance(entry["cfg"], list) for entry in quant_cfg):
+        raise ValueError("set_quantizer_by_cfg_context does not support sequential cfg lists.")
 
     original_attributes = {}
modelopt/torch/quantization/config.py (1)

541-545: ⚠️ Potential issue | 🔴 Critical

Remove enable from _nvfp4_cfg_bs32.

The nested "enable": True in _nvfp4_cfg_bs32 (line 545) causes a duplicate keyword argument error at runtime. When _nvfp4_selective_quant_cfg() passes this dict as the cfg field in QuantizerCfgEntry (line 559), the conversion code later unpacks it with QuantizerAttributeConfig(**cfg, enable=enable) (conversion.py line 266). This passes enable twice and raises TypeError before conversion can complete.

Suggested fix
 _nvfp4_cfg_bs32 = {
     "num_bits": (2, 1),
     "block_sizes": {-1: 32, "type": "dynamic", "scale_bits": (4, 3)},
-    "enable": True,
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 541 - 545, The dict
_nvfp4_cfg_bs32 contains an internal "enable" key which causes a duplicate
keyword when later unpacked into QuantizerAttributeConfig; remove the "enable"
entry from _nvfp4_cfg_bs32 so it only contains its attribute settings (num_bits,
block_sizes, scale_bits/type entries) and let the caller
(_nvfp4_selective_quant_cfg -> QuantizerCfgEntry) supply the enable flag
separately; specifically edit the _nvfp4_cfg_bs32 constant to drop the "enable":
True key so that conversion code that does QuantizerAttributeConfig(**cfg,
enable=enable) no longer passes enable twice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/source/guides/_pytorch_quantization.rst`:
- Around line 260-272: The example references _default_disabled_quantizer_cfg
but doesn't define or import it; fix by either adding an import for it (e.g.,
from mtq.config import _default_disabled_quantizer_cfg) at the top of the
snippet or qualify the reference inline as
mtq.config._default_disabled_quantizer_cfg so the example is self-contained and
runnable; update the code block containing "quant_cfg" to use the qualified name
or add the import near the example.

In `@docs/source/guides/_quant_cfg.rst`:
- Around line 123-143: The docs currently state that _base_disable_all (a.k.a.
the deny-all entry) is prepended to every built-in config; update the text to
accurately say that the deny-all entry (referenced as _base_disable_all or
mtq.config._default_disabled_quantizer_cfg) is included in most built-in configs
but not all, and call out that some specialized recipes (e.g., the KV-only
configs in the quantization config set) intentionally omit the deny-all prelude;
adjust the example wording and the note to avoid the absolute "every built-in
config" claim and add a short parenthetical pointer to the specialized recipes
that omit it (referencing their config names such as the KV-only configs).

In `@examples/diffusers/quantization/config.py`:
- Around line 106-109: The loop that injects trt_high_precision_dtype into
quant_config["quant_cfg"] only handles when cfg is a dict (variable p) and skips
the case where cfg is a list of attribute dicts; update the logic in the loop
that iterates quant_config["quant_cfg"] to detect if p is a list and, when so,
iterate its elements and for each element that is a dict and contains "num_bits"
but not "trt_high_precision_dtype", set element["trt_high_precision_dtype"] =
trt_high_precision_dtype; keep the existing dict-handling branch for backward
compatibility so both dict and list cfg forms are supported.
- Around line 128-147: The current append in reset_set_int8_config causes
duplicate entries in quant_config["quant_cfg"]; instead, make the operation
idempotent by searching quant_config["quant_cfg"] for an existing dict whose
"quantizer_path" equals aq_name and replace that dict in-place (or update its
"cfg") if found, otherwise append the new entry; modify the code around the
append that builds the payload (the block referencing aq_name,
quant_config["quant_cfg"], and the PercentileCalibrator tuple) to perform a
find-and-replace by quantizer_path rather than always appending.

In `@examples/llm_eval/quantization_utils.py`:
- Around line 36-49: The loop that detects BMM-attention entries fails because
mtq_cfg["quant_cfg"] is now a list of dicts; update _quantize_model_with_dataset
to handle both list and dict formats by inspecting each entry: if an entry is a
dict, read its "quantizer_path" (or keys for legacy string entries) and test
whether the path string contains "bmm_quantizer" or matches "*bmm_quantizer"; if
mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old
membership check. Ensure this detection is used before calling
register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via
"quantizer_path": "*bmm_quantizer" are not skipped.

In `@examples/llm_ptq/hf_ptq.py`:
- Line 332: The code is using attribute access on plain dicts in
_default_disabled_quantizer_cfg which will raise AttributeError; update the
disabled_layers construction to access the dict key "quantizer_path" for each
entry (i.e., replace entry.quantizer_path with entry["quantizer_path"]) so that
disabled_layers is built from the correct dictionary key.

In `@examples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynb`:
- Around line 193-198: The code fails because quant_cfg["quant_cfg"] contains
dict entries, not 2-tuples; change the extraction to find the dict whose
quantizer_path (or appropriate key) equals "*weight_quantizer" and then take its
"cfg" field (e.g. use next(entry["cfg"] for entry in quant_cfg["quant_cfg"] if
entry.get("quantizer_path") == "*weight_quantizer")), then handle the case where
weight_quantizer is a list as before and set weight_quantizer["block_sizes"][-1]
= 128; update references to quant_cfg, weight_quantizer, and
quant_cfg["quant_cfg"] accordingly.

In `@examples/vllm_serve/fakequant_worker.py`:
- Around line 173-184: The code currently finds only the first kv rule
(kv_entry) and appends minimal clones at the end losing order and full fields;
instead iterate over kv_quant_cfg with indexes, find every entry where
isinstance(e, dict) and e.get("quantizer_path") == "*[kv]_bmm_quantizer", and
for each matching entry insert (not append) two full deep-copied clones of that
entire entry right after the original entry with quantizer_path set to
"*kv_c_bmm_quantizer" and "*k_pe_bmm_quantizer" respectively so you preserve
list order and all other fields (enable, parent_class, etc.) — operate on
kv_quant_cfg and use copy.deepcopy to clone the whole entry before changing
quantizer_path.

In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`:
- Around line 260-271: The quant_cfg list currently uses tuple entries and
embeds "enable" inside the quantizer config dict (_nvfp4_cfg), which fails
validation in normalize_quant_cfg_list(); change quant_cfg to a list of
QuantizerCfgEntry dicts where each entry is e.g. {"quantizer_path": "<pattern>",
"cfg": <cfg_dict>} for enabled entries (move "enable" out of _nvfp4_cfg) and
{"quantizer_path": "<pattern>", "enable": False} for disabled patterns (use
SENSITIVE_LAYER_PATTERNS and the f"*transformer_blocks.{i}.*" patterns for
exclude_blocks), ensuring _nvfp4_cfg no longer contains a top-level "enable" key
and only used as the "cfg" value.

In `@modelopt/torch/quantization/algorithms.py`:
- Around line 1333-1344: The loop in get_auto_quantize_config is calling
_match_quantizer_cfg with only the short names
("input_quantizer"/"weight_quantizer"), so path- or parent_class-scoped
selectors never match and enable-only matches are dropped; change the call to
pass the fully qualified quantizer name (e.g.,
f"{module_name}.{quantizer_attr}") to _match_quantizer_cfg and ensure the caller
appends an entry when either matched_cfg is not None or matched_enable is True
(so enable-only matches are preserved); apply the same fix for the analogous
block that starts at the later occurrence (the other loop referencing
_match_quantizer_cfg, module_name, and quantizer_attr).
- Around line 96-98: The code currently drops the entry-level "enable" flag and
estimates compression from every entry's "cfg"; change the selection to only
include cfgs from QuantizerCfgEntry items where e.get("enable", True) is truthy
and the "cfg" is not None. Concretely, in the block using quant_cfg.quant_cfg
and calling estimate_quant_compression_for_quantizer, filter entries by
e.get("enable", True) before extracting e.get("cfg") (and still skip None cfgs)
so estimate_quant_compression_for_quantizer receives only enabled cfgs.

In `@modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py`:
- Around line 100-116: The availability probe currently uses next(...) without a
default and assertions on input_cfg/weight_cfg which can raise
StopIteration/AssertionError; change the two next(...) calls that produce
input_cfg and weight_cfg to include a default (e.g., next(..., None)) and
replace the assert isinstance(...) checks with explicit conditional checks that
return False from the probe when input_cfg or weight_cfg is missing or not a
dict; ensure you reference FP8_DEFAULT_CFG["quant_cfg"], quant_cfg_list,
input_cfg and weight_cfg in the probe logic so malformed or missing default
entries cause the function to return False instead of throwing.

In `@modelopt/torch/quantization/backends/nvfp4_gemm.py`:
- Around line 216-231: The availability check currently uses next(...) without
defaults and asserts which can raise instead of returning False; update the
_nvfp4_availability_check implementation to retrieve input_cfg and weight_cfg
from quant_cfg_list using next(..., None) (or equivalent), verify they are dicts
with proper "cfg" entries and that their "quantizer_path" matches
"*input_quantizer" / "*weight_quantizer", and if either is missing or not a dict
return False instead of asserting or allowing StopIteration; reference the
symbols quant_cfg_list, input_cfg, weight_cfg and the existing next(...)/assert
lines to locate and change the lookup and validation to defensive checks that
return False on malformed/missing configs.

In `@modelopt/torch/quantization/config.py`:
- Around line 1625-1631: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged causing legacy dict-style quant_cfg to bypass
normalization; update normalize_quant_cfg to detect dict inputs and pass them
through normalize_quant_cfg_list (which handles dict -> entry conversion via
_dict_to_entry) so legacy {"*weight_quantizer": {...}} shapes are converted to
the expected list form; ensure this change preserves existing behavior for lists
and other non-dict types so downstream callers like need_calibration (which
iterates entries) receive normalized entry lists.

In `@modelopt/torch/quantization/conversion.py`:
- Around line 341-356: The current logic in _match_quantizer branch mutates
existing SequentialQuantizer instances instead of fully replacing them when the
incoming attributes change shape; to fix, ensure full replacement instead of
partial assignment: when attributes is a list, if module is not a
SequentialQuantizer create a new SequentialQuantizer (as before) OR if module is
a SequentialQuantizer but len(attributes) != len(module) replace the existing
module with a new SequentialQuantizer of the correct length before calling
set_from_attribute_config; conversely, when attributes is not a list and module
is a SequentialQuantizer, replace it with a fresh TensorQuantizer instance and
then call set_from_attribute_config on that new instance (use
quant_model.get_submodule/name.rpartition to locate parent_module and setattr to
swap the module).

In `@modelopt/torch/quantization/model_quant.py`:
- Around line 183-191: The docstring for quantize() is out of sync: it still
describes quant_cfg as a dict mapping wildcard keys, but the implementation and
example use a list-of-dicts schema. Update the prose in the quantize() doc block
to describe quant_cfg as a list where each item is a dict with keys like
"quantizer_path" (wildcard pattern), optional "enable" (bool), optional "cfg"
(dict, e.g., {"num_bits": int, "axis": int}), and that multiple entries are
applied in order; also mention the top-level "algorithm" field (e.g., "max") and
how entries such as
{"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and
{"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are
interpreted by quantize()/quant_cfg.

In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 835-836: The code is appending dict.items() tuples into
quant_cfg["quant_cfg"], creating mixed tuples instead of the expected
list[QuantizerCfgEntry] and can error when quant_cfg["quant_cfg"] is a dict;
normalize both sources into proper QuantizerCfgEntry dicts: ensure inner is a
list (if quant_cfg.get("quant_cfg") is a dict, convert it to a list of entries),
and transform kv_cache_quant_cfg (the dict of pattern->cfg) into a list of
{"quantizer_path": pattern, ...cfg} or at least {"quantizer_path": pattern,
"enable": cfg.get("enable", False)} entries before concatenating; assign the
concatenated list back to quant_cfg["quant_cfg"] so downstream consumers only
see a homogeneous list[QuantizerCfgEntry].

In `@tests/_test_utils/torch/quantization/quantize_common.py`:
- Around line 50-61: The loop over config["quant_cfg"] looks for a
"*weight_quantizer" entry and mutates its "block_sizes" but silently does
nothing if no match is found; add a fail-fast check by tracking whether a
matching entry was found (e.g., set a boolean flag when pat ==
"*weight_quantizer") and after the loop raise an explicit error (ValueError or
AssertionError) if no match was found so tests fail early rather than silently
proceeding without applying block_size to the intended entry.

In `@tests/gpu/torch/quantization/test_quantize_cuda.py`:
- Around line 133-135: The test is mutating the shared constant
mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by editing config["quant_cfg"] in place;
instead, create a deep copy of that constant (e.g., via copy.deepcopy) into the
local variable config before patching so changes to entry.setdefault("cfg",
{})["block_sizes"] = {-1: 8, -2: 8} only affect this test; locate the code that
assigns config from mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a
deep-copied version so modifications to quant_cfg/ block_sizes/quantizer_path do
not leak to other parametrized cases.

In `@tests/unit/torch/quantization/test_autoquant.py`:
- Around line 492-507: The current assertion allows tuple-style payloads by
falling back to entry[0]/entry[1]; restrict the check to dict-style
QuantizerCfgEntry entries only by iterating over config["quant_cfg"] and
filtering with isinstance(entry, dict), then assert that some dict has
entry.get("quantizer_path") == "*" and entry.get("enable") is False; update the
assertion around the quant_cfg check in
tests/unit/torch/quantization/test_autoquant.py accordingly (referencing
config["quant_cfg"] and the dict keys "quantizer_path" and "enable").

---

Outside diff comments:
In `@examples/llm_autodeploy/run_auto_quantize.py`:
- Around line 134-143: The code currently hardcodes trust_remote_code=True when
calling AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained,
creating an RCE risk; update the modelopt_ptq(...) function signature to accept
a new parameter (e.g., trust_remote_code: bool = False) and replace the
hardcoded True with that parameter for both AutoModelForCausalLM.from_pretrained
and AutoTokenizer.from_pretrained so callers can opt-in explicitly while
defaulting to False.

In `@examples/llm_ptq/example_utils.py`:
- Around line 208-247: The code mutates quant_cfg["quant_cfg"] assuming it's a
list; normalize quant_cfg["quant_cfg"] to the list schema up front (convert
legacy dict form into the new list form) before any mutations so
weight_quantizer lookup and .append() calls are safe; when locating the weight
quantizer use a reverse-scan (e.g., search from the end) to find the effective
"*weight_quantizer" entry referenced by weight_quantizer_entry so awq_block_size
handling (weight_quantizer_entry / weight_quantizer) and the model_type ==
"phi4mm" .append() calls operate on the correct, normalized list.

In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 57-111: get_quant_config currently mutates the shared presets
mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which
causes duplicated lm_head entries on repeated calls; fix by making a deep copy
of the chosen preset (e.g., import copy and do quant_cfg =
copy.deepcopy(mtq.FP8_DEFAULT_CFG) / copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) /
copy.deepcopy(mtq.INT4_AWQ_CFG) inside get_quant_config) before building
quant_cfg_list and appending lm_head entries so the original mtq presets remain
unchanged.

In `@modelopt/torch/quantization/config.py`:
- Around line 541-545: The dict _nvfp4_cfg_bs32 contains an internal "enable"
key which causes a duplicate keyword when later unpacked into
QuantizerAttributeConfig; remove the "enable" entry from _nvfp4_cfg_bs32 so it
only contains its attribute settings (num_bits, block_sizes, scale_bits/type
entries) and let the caller (_nvfp4_selective_quant_cfg -> QuantizerCfgEntry)
supply the enable flag separately; specifically edit the _nvfp4_cfg_bs32
constant to drop the "enable": True key so that conversion code that does
QuantizerAttributeConfig(**cfg, enable=enable) no longer passes enable twice.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fcb69c76-5a46-4dd0-82f2-905613c6cc83

📥 Commits

Reviewing files that changed from the base of the PR and between 08e5f92 and 5115452.

📒 Files selected for processing (53)
  • docs/source/guides/1_quantization.rst
  • docs/source/guides/_pytorch_quantization.rst
  • docs/source/guides/_quant_cfg.rst
  • examples/diffusers/quantization/config.py
  • examples/llm_autodeploy/run_auto_quantize.py
  • examples/llm_eval/quantization_utils.py
  • examples/llm_ptq/example_utils.py
  • examples/llm_ptq/hf_ptq.py
  • examples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynb
  • examples/llm_ptq/notebooks/3_PTQ_AutoQuantization.ipynb
  • examples/llm_qat/main.py
  • examples/vllm_serve/fakequant_worker.py
  • examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py
  • modelopt/onnx/llm_export_utils/quantization_utils.py
  • modelopt/torch/export/unified_export_hf.py
  • modelopt/torch/quantization/algorithms.py
  • modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py
  • modelopt/torch/quantization/backends/nvfp4_gemm.py
  • modelopt/torch/quantization/compress.py
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/conversion.py
  • modelopt/torch/quantization/model_calib.py
  • modelopt/torch/quantization/model_quant.py
  • modelopt/torch/quantization/nn/modules/tensor_quantizer.py
  • modelopt/torch/quantization/utils/core_utils.py
  • modelopt/torch/sparsity/attention_sparsity/conversion.py
  • modelopt_recipes/general/ptq/fp8_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_default-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.yml
  • modelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.yml
  • tests/_test_utils/torch/export/utils.py
  • tests/_test_utils/torch/quantization/onnx_export.py
  • tests/_test_utils/torch/quantization/quantize_common.py
  • tests/gpu/torch/quantization/test_hadamard.py
  • tests/gpu/torch/quantization/test_quant_rnn_cuda.py
  • tests/gpu/torch/quantization/test_quantize_cuda.py
  • tests/gpu_megatron/torch/peft/plugins/test_megatron_peft.py
  • tests/gpu_megatron/torch/quantization/plugins/test_apex.py
  • tests/gpu_megatron/torch/quantization/plugins/test_megatron.py
  • tests/unit/recipe/test_loader.py
  • tests/unit/torch/quantization/plugins/test_attention_quant.py
  • tests/unit/torch/quantization/plugins/test_huggingface.py
  • tests/unit/torch/quantization/plugins/test_peft.py
  • tests/unit/torch/quantization/test_autoquant.py
  • tests/unit/torch/quantization/test_compute_quantization_mse.py
  • tests/unit/torch/quantization/test_config_validation.py
  • tests/unit/torch/quantization/test_custom_backend.py
  • tests/unit/torch/quantization/test_quant_activations.py
  • tests/unit/torch/quantization/test_quant_batchnorm.py
  • tests/unit/torch/quantization/test_quant_rnn.py
  • tests/unit/torch/quantization/test_quantize_cpu.py
  • tests/unit/torch/quantization/test_quantize_replace.py
  • tests/unit/torch/quantization/test_tensor_quant_cpu.py

Comment on lines +260 to +272
"quant_cfg": [
# Disable all quantizers by default, then enable selectively
{"quantizer_path": "*", "enable": False},
# Configure weight quantizers with 4-bit precision and 128-element blocks
"*weight_quantizer": {"num_bits": 4, "block_sizes": {-1: 128}, "enable": True},
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 4, "block_sizes": {-1: 128}}, "enable": True},
# Configure input quantizers with 8-bit dynamic quantization
"*input_quantizer": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}},
# Include default disabled quantizer configurations
**_default_disabled_quantizer_cfg,
},
*_default_disabled_quantizer_cfg,
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Define or qualify _default_disabled_quantizer_cfg in this example.

This snippet now references _default_disabled_quantizer_cfg without importing or qualifying it, so it fails when copied verbatim. Please either add the import or qualify it as mtq.config._default_disabled_quantizer_cfg.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/guides/_pytorch_quantization.rst` around lines 260 - 272, The
example references _default_disabled_quantizer_cfg but doesn't define or import
it; fix by either adding an import for it (e.g., from mtq.config import
_default_disabled_quantizer_cfg) at the top of the snippet or qualify the
reference inline as mtq.config._default_disabled_quantizer_cfg so the example is
self-contained and runnable; update the code block containing "quant_cfg" to use
the qualified name or add the import near the example.

Comment on lines +123 to +143
The recommended pattern used by all built-in configs is:

.. code-block:: python

"quant_cfg": [
# 1. Deny all quantizers by default
{"quantizer_path": "*", "enable": False},

# 2. Enable and configure the target quantizers
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},

# 3. Apply standard exclusions last (BatchNorm, LM head, MoE routers, etc.)
*mtq.config._default_disabled_quantizer_cfg,
]

.. note::

The deny-all entry ``{"quantizer_path": "*", "enable": False}`` is available as
:data:`modelopt.torch.quantization.config._base_disable_all` and is prepended to every
built-in config. This ensures quantizers not explicitly targeted remain disabled.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don’t describe _base_disable_all as universal across built-ins.

Several specialized recipes in modelopt/torch/quantization/config.py intentionally omit the deny-all prelude (for example the KV-only configs). Saying it is prepended to every built-in config will send users to the wrong pattern for those formats.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/guides/_quant_cfg.rst` around lines 123 - 143, The docs currently
state that _base_disable_all (a.k.a. the deny-all entry) is prepended to every
built-in config; update the text to accurately say that the deny-all entry
(referenced as _base_disable_all or mtq.config._default_disabled_quantizer_cfg)
is included in most built-in configs but not all, and call out that some
specialized recipes (e.g., the KV-only configs in the quantization config set)
intentionally omit the deny-all prelude; adjust the example wording and the note
to avoid the absolute "every built-in config" claim and add a short
parenthetical pointer to the specialized recipes that omit it (referencing their
config names such as the KV-only configs).

Comment on lines +106 to 109
for entry in quant_config["quant_cfg"]:
p = entry.get("cfg", {})
if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p:
p["trt_high_precision_dtype"] = trt_high_precision_dtype
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle sequential cfg lists when injecting trt_high_precision_dtype.

The new quant_cfg format allows cfg to be a list of attribute dicts, but Line 107 only updates plain dicts. Any sequential entry will skip this dtype injection entirely.

🛠️ Proposed fix
-    for entry in quant_config["quant_cfg"]:
-        p = entry.get("cfg", {})
-        if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p:
-            p["trt_high_precision_dtype"] = trt_high_precision_dtype
+    for entry in quant_config["quant_cfg"]:
+        cfg = entry.get("cfg")
+        cfg_items = [cfg] if isinstance(cfg, dict) else (cfg or [])
+        for attrs in cfg_items:
+            if "num_bits" in attrs and "trt_high_precision_dtype" not in attrs:
+                attrs["trt_high_precision_dtype"] = trt_high_precision_dtype
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/config.py` around lines 106 - 109, The loop
that injects trt_high_precision_dtype into quant_config["quant_cfg"] only
handles when cfg is a dict (variable p) and skips the case where cfg is a list
of attribute dicts; update the logic in the loop that iterates
quant_config["quant_cfg"] to detect if p is a list and, when so, iterate its
elements and for each element that is a dict and contains "num_bits" but not
"trt_high_precision_dtype", set element["trt_high_precision_dtype"] =
trt_high_precision_dtype; keep the existing dict-handling branch for backward
compatibility so both dict and list cfg forms are supported.

Comment on lines +128 to +147
quant_config["quant_cfg"].append(
{
"quantizer_path": aq_name,
"cfg": {
"num_bits": 8,
"axis": None,
"percentile": percentile,
"total_step": n_steps,
"collect_method": collect_method,
"calibrator": (
PercentileCalibrator,
(),
{
"num_bits": 8,
"axis": None,
"percentile": percentile,
"total_step": n_steps,
"collect_method": collect_method,
},
),
},
),
}
}
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make reset_set_int8_config() idempotent under list-based quant_cfg.

Line 128 switched this helper from a replace-by-key update to an unconditional append. Reusing the same quant_config object now accumulates duplicate or stale *{name}*input_quantizer* rules, which changes later-match behavior under the new schema.

🧼 Proposed fix
         if isinstance(module, nn.Conv2d):
             aq_name = f"*{name}*input_quantizer*"
+            quant_config["quant_cfg"] = [
+                entry
+                for entry in quant_config["quant_cfg"]
+                if not (isinstance(entry, dict) and entry.get("quantizer_path") == aq_name)
+            ]
             quant_config["quant_cfg"].append(
                 {
                     "quantizer_path": aq_name,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/diffusers/quantization/config.py` around lines 128 - 147, The
current append in reset_set_int8_config causes duplicate entries in
quant_config["quant_cfg"]; instead, make the operation idempotent by searching
quant_config["quant_cfg"] for an existing dict whose "quantizer_path" equals
aq_name and replace that dict in-place (or update its "cfg") if found, otherwise
append the new entry; modify the code around the append that builds the payload
(the block referencing aq_name, quant_config["quant_cfg"], and the
PercentileCalibrator tuple) to perform a find-and-replace by quantizer_path
rather than always appending.

Comment on lines +36 to +49
"quant_cfg": [
*mtq.config._base_disable_all,
{
"quantizer_path": "*weight_quantizer",
"cfg": {"num_bits": 4, "block_sizes": {-1: 128}},
"enable": True,
},
{
"quantizer_path": "*input_quantizer",
"cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}},
},
# Disable sensitive layers such as `lm_head`, gate layers in MoE etc.
**mtq.config._default_disabled_quantizer_cfg,
},
*mtq.config._default_disabled_quantizer_cfg,
],
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Update the BMM-attention detection for list-based quant_cfg.

After switching quant_cfg to a list of entries, _quantize_model_with_dataset()'s later for key in mtq_cfg["quant_cfg"] / "bmm_quantizer" in key check no longer works: iterating the list yields dicts, so that membership test only inspects dict keys. Configs that target *bmm_quantizer now skip register_hf_attentions_on_the_fly(net) and silently miss KV-attention quantization.

💡 Suggested update outside this hunk
-        quantize_bmm_attention = False
-        for key in mtq_cfg["quant_cfg"]:
-            if "bmm_quantizer" in key:
-                quantize_bmm_attention = True
+        quantize_bmm_attention = any(
+            "bmm_quantizer" in entry["quantizer_path"]
+            for entry in mtq.config.normalize_quant_cfg_list(mtq_cfg["quant_cfg"])
+        )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_eval/quantization_utils.py` around lines 36 - 49, The loop that
detects BMM-attention entries fails because mtq_cfg["quant_cfg"] is now a list
of dicts; update _quantize_model_with_dataset to handle both list and dict
formats by inspecting each entry: if an entry is a dict, read its
"quantizer_path" (or keys for legacy string entries) and test whether the path
string contains "bmm_quantizer" or matches "*bmm_quantizer"; if
mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old
membership check. Ensure this detection is used before calling
register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via
"quantizer_path": "*bmm_quantizer" are not skipped.

Comment on lines +183 to +191
"quant_cfg": [
# Disable all quantizers by default
{"quantizer_path": "*", "enable": False},
# "num_bits" specifies the number of bits for quantization
# "axis" specifies the axis for quantization
"*weight_quantizer": {"num_bits": 8, "axis": 0},
"*input_quantizer": {"num_bits": 8, "axis": -1},

# Default quantization settings
"default": {"num_bits": 8, "axis": None},
}
"algorithm": "max"
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": -1}},
],
"algorithm": "max",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Finish updating this quantize() doc block to the list-based schema.

Lines 183-191 show the new list-of-dicts format, but the prose immediately above still describes quant_cfg as a dictionary mapping wildcard keys to attributes. That leaves the public API docs contradictory in the same section.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/model_quant.py` around lines 183 - 191, The
docstring for quantize() is out of sync: it still describes quant_cfg as a dict
mapping wildcard keys, but the implementation and example use a list-of-dicts
schema. Update the prose in the quantize() doc block to describe quant_cfg as a
list where each item is a dict with keys like "quantizer_path" (wildcard
pattern), optional "enable" (bool), optional "cfg" (dict, e.g., {"num_bits":
int, "axis": int}), and that multiple entries are applied in order; also mention
the top-level "algorithm" field (e.g., "max") and how entries such as
{"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and
{"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are
interpreted by quantize()/quant_cfg.

Comment on lines +50 to +61
for entry in config["quant_cfg"]:
pat = (
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)
if pat == "*weight_quantizer":
if isinstance(entry, dict) and "quantizer_path" in entry:
entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size}
else:
entry[1]["block_sizes"] = {-1: block_size}
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail fast when *weight_quantizer is missing.

If no matching entry exists, this function currently returns without applying block_sizes, which can silently invalidate downstream test intent.

🔧 Proposed fix
 def get_awq_config(algorithm="awq_lite", block_size=8):
     config = copy.deepcopy(mtq.INT4_AWQ_CFG)
+    found_weight_entry = False
     for entry in config["quant_cfg"]:
         pat = (
             entry["quantizer_path"]
             if isinstance(entry, dict) and "quantizer_path" in entry
             else entry[0]
         )
         if pat == "*weight_quantizer":
+            found_weight_entry = True
             if isinstance(entry, dict) and "quantizer_path" in entry:
                 entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size}
             else:
                 entry[1]["block_sizes"] = {-1: block_size}
             break
+    if not found_weight_entry:
+        raise KeyError("No '*weight_quantizer' entry found in quant_cfg")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/_test_utils/torch/quantization/quantize_common.py` around lines 50 -
61, The loop over config["quant_cfg"] looks for a "*weight_quantizer" entry and
mutates its "block_sizes" but silently does nothing if no match is found; add a
fail-fast check by tracking whether a matching entry was found (e.g., set a
boolean flag when pat == "*weight_quantizer") and after the loop raise an
explicit error (ValueError or AssertionError) if no match was found so tests
fail early rather than silently proceeding without applying block_size to the
intended entry.

Comment on lines +133 to +135
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Deep-copy the shared config before patching block_sizes.

config here is the imported mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG object. Mutating config["quant_cfg"] in place leaks the reduced block size into later parametrized cases and any other tests that reuse the same constant.

💡 Suggested fix
     if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG:
+        import copy
+
+        config = copy.deepcopy(config)
         # reduce block sizes for simple testing models
         for entry in config["quant_cfg"]:
             if entry.get("quantizer_path") == "*weight_quantizer":
                 entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}
if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG:
import copy
config = copy.deepcopy(config)
# reduce block sizes for simple testing models
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu/torch/quantization/test_quantize_cuda.py` around lines 133 - 135,
The test is mutating the shared constant mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by
editing config["quant_cfg"] in place; instead, create a deep copy of that
constant (e.g., via copy.deepcopy) into the local variable config before
patching so changes to entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2:
8} only affect this test; locate the code that assigns config from
mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a deep-copied version
so modifications to quant_cfg/ block_sizes/quantizer_path do not leak to other
parametrized cases.

Comment on lines +492 to +507
assert isinstance(config["quant_cfg"], list)
assert any(
(
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)
== "*"
and (
entry.get("enable")
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[1].get("enable")
)
is False
for entry in config["quant_cfg"]
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep this assertion strict to QuantizerCfgEntry dicts.

The fallback to entry[0] / entry[1] means this test now accepts tuple-style payloads, so it no longer protects the new list-of-dicts contract from regressions. Assert on dict entries directly.

🧪 Proposed fix
-    assert any(
-        (
-            entry["quantizer_path"]
-            if isinstance(entry, dict) and "quantizer_path" in entry
-            else entry[0]
-        )
-        == "*"
-        and (
-            entry.get("enable")
-            if isinstance(entry, dict) and "quantizer_path" in entry
-            else entry[1].get("enable")
-        )
-        is False
-        for entry in config["quant_cfg"]
-    )
+    assert all(isinstance(entry, dict) for entry in config["quant_cfg"])
+    assert any(
+        entry.get("quantizer_path") == "*" and entry.get("enable") is False
+        for entry in config["quant_cfg"]
+    )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/torch/quantization/test_autoquant.py` around lines 492 - 507, The
current assertion allows tuple-style payloads by falling back to
entry[0]/entry[1]; restrict the check to dict-style QuantizerCfgEntry entries
only by iterating over config["quant_cfg"] and filtering with isinstance(entry,
dict), then assert that some dict has entry.get("quantizer_path") == "*" and
entry.get("enable") is False; update the assertion around the quant_cfg check in
tests/unit/torch/quantization/test_autoquant.py accordingly (referencing
config["quant_cfg"] and the dict keys "quantizer_path" and "enable").

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
@shengliangxu shengliangxu force-pushed the shengliangx/quant_cfg-list branch from 5971a7b to fe2d2f3 Compare March 23, 2026 07:51
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
modelopt/torch/quantization/config.py (1)

1622-1628: ⚠️ Potential issue | 🟠 Major

Legacy top-level dict format bypasses normalization.

The before validator returns non-list inputs unchanged (line 1626-1627). If a user passes a legacy dict-style quant_cfg like {"*weight_quantizer": {"num_bits": 8}} directly (not wrapped in a list), it will not be normalized. This can cause need_calibration() and downstream consumers to fail when iterating, as they expect list entries.

Consider converting top-level dict inputs to list form before calling normalize_quant_cfg_list():

Suggested fix
 `@field_validator`("quant_cfg", mode="before")
 `@classmethod`
 def normalize_quant_cfg(cls, v):
     """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts."""
+    if isinstance(v, dict):
+        # Convert legacy top-level dict format to list of single-key dicts
+        v = [{k: val} for k, val in v.items()]
     if not isinstance(v, list):
         return v
     return normalize_quant_cfg_list(v)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The
before-validator normalize_quant_cfg currently returns non-list inputs
unchanged, so legacy top-level dict forms for quant_cfg bypass normalization;
update normalize_quant_cfg (the `@field_validator` for "quant_cfg") to detect a
dict input (e.g., mapping of pattern->entry) and wrap/convert it into a one-item
list (or otherwise translate into the list form expected by
normalize_quant_cfg_list) before calling normalize_quant_cfg_list, then return
the normalized list so downstream functions like need_calibration() always
receive a list of QuantizerCfgEntry dicts.
🧹 Nitpick comments (2)
modelopt/torch/quantization/config.py (2)

1630-1639: Consider validating list-form cfg entries as well.

The validator validates dict cfg values against QuantizerAttributeConfig, but skips list cfg (used for SequentialQuantizer). While downstream processing may catch errors, early validation would surface misconfigurations sooner.

Optional enhancement
 `@field_validator`("quant_cfg", mode="after")
 `@classmethod`
 def validate_quant_cfg_entries(cls, v):
     """Validate quantizer attribute configs to surface errors (e.g. invalid axis/block_sizes)."""
     qac_fields = set(QuantizerAttributeConfig.model_fields.keys())
     for entry in v:
         cfg = entry.get("cfg", {})
         if isinstance(cfg, dict) and qac_fields & set(cfg.keys()):
             QuantizerAttributeConfig.model_validate(cfg)
+        elif isinstance(cfg, list):
+            for sub_cfg in cfg:
+                if isinstance(sub_cfg, dict) and qac_fields & set(sub_cfg.keys()):
+                    QuantizerAttributeConfig.model_validate(sub_cfg)
     return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1630 - 1639, The
validator validate_quant_cfg_entries currently only validates dict cfg values
against QuantizerAttributeConfig but ignores list-form cfg (used by
SequentialQuantizer); update validate_quant_cfg_entries to also handle when
entry.get("cfg") is a list by iterating over its items and, for each item that
is a dict and contains keys in QuantizerAttributeConfig.model_fields, call
QuantizerAttributeConfig.model_validate(item) so each element is validated
early; preserve existing behavior for non-dict/non-list cfgs and return v as
before.

1554-1567: Replace assert with ValueError for clearer error messaging.

The assert at line 1556 will produce an unclear AssertionError if the legacy nn.* format has unexpected structure. A ValueError with a descriptive message would be more helpful for users debugging config issues.

Suggested improvement
     def _dict_to_entry(key: str, value) -> QuantizerCfgEntry:
         if isinstance(key, str) and key.startswith("nn."):
-            assert isinstance(value, dict) and len(value) == 1
+            if not isinstance(value, dict) or len(value) != 1:
+                raise ValueError(
+                    f"Legacy nn.*-scoped entry expects a single-key dict value, got: {value!r}"
+                )
             q_path, sub_cfg = next(iter(value.items()))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1554 - 1567, In
_dict_to_entry, replace the brittle assert inside the branch that handles legacy
"nn." keys with a ValueError that gives a clear message; specifically in the
function _dict_to_entry where you check if key.startswith("nn."), validate that
value is a dict with exactly one item and if not raise ValueError("invalid
legacy nn.* quantizer entry: expected a dict with a single mapping from
quantizer path to config, got: {repr(value)}") (include the offending value in
the message) so callers get a descriptive error instead of an AssertionError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can silently mis-index string
keys when an unnormalized dict is present; update the logic in the loop that
determines name/cfg (the code using entry["quantizer_path"] vs
entry[0]/entry[1]) to explicitly check types: if entry is a dict require the
"quantizer_path" key and use entry.get("cfg"), otherwise require entry to be a
sequence/tuple/list of length >= 2 before using entry[0] and entry[1]; if the
input is neither a valid dict nor a valid 2+ element sequence, raise a clear
ValueError (or log and skip) so _not_dynamic and downstream logic never index
into a string or malformed structure. Ensure you reference quant_cfg, entry,
name, cfg, and _not_dynamic when implementing the checks.

---

Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged, so legacy top-level dict forms for quant_cfg
bypass normalization; update normalize_quant_cfg (the `@field_validator` for
"quant_cfg") to detect a dict input (e.g., mapping of pattern->entry) and
wrap/convert it into a one-item list (or otherwise translate into the list form
expected by normalize_quant_cfg_list) before calling normalize_quant_cfg_list,
then return the normalized list so downstream functions like need_calibration()
always receive a list of QuantizerCfgEntry dicts.

---

Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1630-1639: The validator validate_quant_cfg_entries currently only
validates dict cfg values against QuantizerAttributeConfig but ignores list-form
cfg (used by SequentialQuantizer); update validate_quant_cfg_entries to also
handle when entry.get("cfg") is a list by iterating over its items and, for each
item that is a dict and contains keys in QuantizerAttributeConfig.model_fields,
call QuantizerAttributeConfig.model_validate(item) so each element is validated
early; preserve existing behavior for non-dict/non-list cfgs and return v as
before.
- Around line 1554-1567: In _dict_to_entry, replace the brittle assert inside
the branch that handles legacy "nn." keys with a ValueError that gives a clear
message; specifically in the function _dict_to_entry where you check if
key.startswith("nn."), validate that value is a dict with exactly one item and
if not raise ValueError("invalid legacy nn.* quantizer entry: expected a dict
with a single mapping from quantizer path to config, got: {repr(value)}")
(include the offending value in the message) so callers get a descriptive error
instead of an AssertionError.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: a4a13435-75c6-4c01-93a8-09afd75e09ad

📥 Commits

Reviewing files that changed from the base of the PR and between 5115452 and 5971a7b.

📒 Files selected for processing (2)
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/utils/core_utils.py

Comment on lines +1677 to 1699
quant_cfg: list = config.get("quant_cfg") or []
for entry in quant_cfg:
name = (
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)
if isinstance(entry, dict) and "quantizer_path" in entry:
cfg = dict(entry.get("cfg") or {})
if "enable" in entry:
cfg["enable"] = entry["enable"]
else:
cfg = entry[1]
if "weight_quantizer" in name:
# We don't calibrate weight quantizer
continue
# quantization like W4A8 has a list of weight quantizers
if isinstance(cfg, list):
for _config in cfg:
if _not_dynamic(_config):
print(f"{cfg}: True")
return True
elif _not_dynamic(cfg):
print(f"{cfg}: True")
elif isinstance(cfg, dict) and _not_dynamic(cfg):
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fallback logic for tuple format may mask errors from unnormalized dict inputs.

The fallback to entry[0]/entry[1] (lines 1682, 1689) is intended for legacy tuple format support. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, and entry[0]/entry[1] would index into string characters (e.g., "*"[0]"*"), leading to silent misbehavior rather than a clear error.

Consider adding explicit type checking or removing the tuple fallback if tuples are no longer a supported input format:

Suggested defensive check
 quant_cfg: list = config.get("quant_cfg") or []
 for entry in quant_cfg:
+    if not isinstance(entry, dict):
+        raise TypeError(f"Expected dict entry in quant_cfg, got {type(entry).__name__}: {entry!r}")
     name = (
         entry["quantizer_path"]
         if isinstance(entry, dict) and "quantizer_path" in entry
         else entry[0]
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop
over quant_cfg can silently mis-index string keys when an unnormalized dict is
present; update the logic in the loop that determines name/cfg (the code using
entry["quantizer_path"] vs entry[0]/entry[1]) to explicitly check types: if
entry is a dict require the "quantizer_path" key and use entry.get("cfg"),
otherwise require entry to be a sequence/tuple/list of length >= 2 before using
entry[0] and entry[1]; if the input is neither a valid dict nor a valid 2+
element sequence, raise a clear ValueError (or log and skip) so _not_dynamic and
downstream logic never index into a string or malformed structure. Ensure you
reference quant_cfg, entry, name, cfg, and _not_dynamic when implementing the
checks.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)

1677-1699: ⚠️ Potential issue | 🟡 Minor

Fallback logic for tuple format may mask errors from unnormalized dict inputs.

The fallback to entry[0]/entry[1] (lines 1682, 1689) is intended for legacy tuple format. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, and entry[0]/entry[1] would index into string characters (e.g., "*weight_quantizer"[0]"*"), causing silent misbehavior.

Suggested defensive check
 quant_cfg: list = config.get("quant_cfg") or []
 for entry in quant_cfg:
+    if not isinstance(entry, (dict, tuple, list)):
+        raise TypeError(f"Expected dict or tuple entry in quant_cfg, got {type(entry).__name__}: {entry!r}")
     name = (
         entry["quantizer_path"]
         if isinstance(entry, dict) and "quantizer_path" in entry
         else entry[0]
     )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop
over quant_cfg can mis-index string keys when a dict slips through the legacy
tuple/path fallback; update the logic in the quant_cfg iteration (around
variables entry, name, cfg) to first check types explicitly: if entry is a dict
use entry["quantizer_path"] and entry.get("cfg"), else if entry is a sequence
(tuple/list) with length >= 2 use entry[0]/entry[1]; otherwise raise a clear
ValueError or skip with a logged warning. Ensure the branch that checks
"weight_quantizer" still uses the computed name and that _not_dynamic is only
called with validated cfg objects.

1622-1628: ⚠️ Potential issue | 🟠 Major

Legacy dict-style quant_cfg is not normalized by the before-validator.

The validator returns non-list inputs unchanged (line 1626-1627), so a legacy dict format like {"*weight_quantizer": {"num_bits": 8}} bypasses normalize_quant_cfg_list() entirely. If such a dict reaches downstream code expecting a list, it will fail.

Since normalize_quant_cfg_list() already supports converting legacy dict formats via _dict_to_entry(), the validator should route dicts to it:

Suggested fix
 `@field_validator`("quant_cfg", mode="before")
 `@classmethod`
 def normalize_quant_cfg(cls, v):
     """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts."""
-    if not isinstance(v, list):
-        return v
-    return normalize_quant_cfg_list(v)
+    if isinstance(v, dict):
+        # Legacy top-level dict format: convert to list of single-key dicts
+        v = [{k: val} for k, val in v.items()]
+    if isinstance(v, list):
+        return normalize_quant_cfg_list(v)
+    return v
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The
before-validator normalize_quant_cfg currently returns non-list values
unchanged, so legacy dict-style quant_cfg like {"*weight_quantizer":
{"num_bits": 8}} bypasses normalization; update normalize_quant_cfg to detect
dict inputs and pass them into normalize_quant_cfg_list (which uses
_dict_to_entry) so all legacy dict formats are converted into the expected list
of QuantizerCfgEntry objects—i.e., in normalize_quant_cfg (the `@field_validator`
for "quant_cfg") call normalize_quant_cfg_list(v) for dicts as well as lists so
downstream code always receives a normalized list.
🧹 Nitpick comments (1)
modelopt/torch/quantization/config.py (1)

162-168: TypedDict total=False conflicts with required quantizer_path documentation.

The docstring states quantizer_path is required, but total=False makes all fields optional at the type level. Consider using typing.Required from typing_extensions to mark quantizer_path as required while keeping other fields optional:

-class QuantizerCfgEntry(TypedDict, total=False):
+from typing_extensions import Required
+
+class QuantizerCfgEntry(TypedDict, total=False):
     """A single entry in a ``quant_cfg`` list."""
 
-    quantizer_path: str  # required; matched against quantizer module names
+    quantizer_path: Required[str]  # required; matched against quantizer module names
     parent_class: str | None  # optional; filters by pytorch module class name (e.g. "nn.Linear")

This ensures static type checkers flag missing quantizer_path while allowing other fields to be omitted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/config.py` around lines 162 - 168,
QuantizerCfgEntry currently uses TypedDict(total=False) which makes all keys
optional but the docstring says quantizer_path is required; update
QuantizerCfgEntry to mark quantizer_path as required using
typing_extensions.Required (or typing.Required on newer Python): change the
TypedDict definition so quantizer_path: Required[str] while keeping
parent_class, cfg, and enable as optional, and import Required from
typing_extensions if not available in typing to ensure static type-checkers
enforce the required key.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 830-849: The function update_quant_cfg_with_kv_cache_quant can
raise a TypeError if quant_cfg.get("quant_cfg") returns a legacy dict instead of
a list; change it to defensively normalize inner by checking the type of
quant_cfg.get("quant_cfg") and converting a dict into a single-item list (or
treating None/Falsey as the existing default list), then set
quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg) and return the
deep-copied quant_cfg; update the logic around the inner variable in
update_quant_cfg_with_kv_cache_quant to handle list, dict, and None cases
explicitly.

---

Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can mis-index string keys when
a dict slips through the legacy tuple/path fallback; update the logic in the
quant_cfg iteration (around variables entry, name, cfg) to first check types
explicitly: if entry is a dict use entry["quantizer_path"] and entry.get("cfg"),
else if entry is a sequence (tuple/list) with length >= 2 use entry[0]/entry[1];
otherwise raise a clear ValueError or skip with a logged warning. Ensure the
branch that checks "weight_quantizer" still uses the computed name and that
_not_dynamic is only called with validated cfg objects.
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list values unchanged, so legacy dict-style quant_cfg like
{"*weight_quantizer": {"num_bits": 8}} bypasses normalization; update
normalize_quant_cfg to detect dict inputs and pass them into
normalize_quant_cfg_list (which uses _dict_to_entry) so all legacy dict formats
are converted into the expected list of QuantizerCfgEntry objects—i.e., in
normalize_quant_cfg (the `@field_validator` for "quant_cfg") call
normalize_quant_cfg_list(v) for dicts as well as lists so downstream code always
receives a normalized list.

---

Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 162-168: QuantizerCfgEntry currently uses TypedDict(total=False)
which makes all keys optional but the docstring says quantizer_path is required;
update QuantizerCfgEntry to mark quantizer_path as required using
typing_extensions.Required (or typing.Required on newer Python): change the
TypedDict definition so quantizer_path: Required[str] while keeping
parent_class, cfg, and enable as optional, and import Required from
typing_extensions if not available in typing to ensure static type-checkers
enforce the required key.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ea0ca383-fd43-415d-8bef-252ce66fbfc5

📥 Commits

Reviewing files that changed from the base of the PR and between 5971a7b and fe2d2f3.

📒 Files selected for processing (2)
  • modelopt/torch/quantization/config.py
  • modelopt/torch/quantization/utils/core_utils.py

Comment on lines 830 to +849
def update_quant_cfg_with_kv_cache_quant(
quant_cfg: dict[str, Any], kv_cache_quant_cfg: dict[str, Any]
quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry]
) -> dict[str, Any]:
"""Update the quant_cfg with the kv cache quant_cfg."""
"""Update the quant_cfg with the kv cache quant_cfg.

Args:
quant_cfg: The outer quantization config dict (with ``"quant_cfg"`` and ``"algorithm"`` keys).
kv_cache_quant_cfg: A list of :class:`QuantizerCfgEntry
<modelopt.torch.quantization.config.QuantizerCfgEntry>` dicts for KV cache quantization,
typically ``some_kv_cfg["quant_cfg"]``.

Returns:
A deep copy of ``quant_cfg`` with the KV cache entries appended to ``quant_cfg["quant_cfg"]``.
"""
# If quant_cfg["quant_cfg"] is None, it corresponds to only kv cache quantization case
quant_cfg = copy.deepcopy(quant_cfg)
quant_cfg["quant_cfg"] = quant_cfg.get("quant_cfg") or {"default": {"enable": False}}
quant_cfg["quant_cfg"].update(kv_cache_quant_cfg)
inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [
{"quantizer_path": "*", "enable": False}
]
quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential issue if quant_cfg["quant_cfg"] is a legacy dict format.

The function assumes quant_cfg.get("quant_cfg") returns a list (or falsey value). If a caller passes an unnormalized config where quant_cfg["quant_cfg"] is still a legacy dict, line 849 would raise TypeError when attempting dict + list.

Consider normalizing the inner config or adding a type check:

Suggested defensive fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list
+
 def update_quant_cfg_with_kv_cache_quant(
     quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry]
 ) -> dict[str, Any]:
     ...
     quant_cfg = copy.deepcopy(quant_cfg)
-    inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [
-        {"quantizer_path": "*", "enable": False}
-    ]
+    raw_inner = quant_cfg.get("quant_cfg")
+    if raw_inner is None:
+        inner = [{"quantizer_path": "*", "enable": False}]
+    elif isinstance(raw_inner, dict):
+        # Legacy dict format - convert to list
+        inner = normalize_quant_cfg_list([{k: v} for k, v in raw_inner.items()])
+    else:
+        inner = list(raw_inner)
     quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/utils/core_utils.py` around lines 830 - 849, The
function update_quant_cfg_with_kv_cache_quant can raise a TypeError if
quant_cfg.get("quant_cfg") returns a legacy dict instead of a list; change it to
defensively normalize inner by checking the type of quant_cfg.get("quant_cfg")
and converting a dict into a single-item list (or treating None/Falsey as the
existing default list), then set quant_cfg["quant_cfg"] = inner +
list(kv_cache_quant_cfg) and return the deep-copied quant_cfg; update the logic
around the inner variable in update_quant_cfg_with_kv_cache_quant to handle
list, dict, and None cases explicitly.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1094/

Built to branch gh-pages at 2026-03-26 06:14 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Comment on lines +128 to +133
# 1. Deny all quantizers by default
{"quantizer_path": "*", "enable": False},

# 2. Enable and configure the target quantizers
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# 1. Deny all quantizers by default
{"quantizer_path": "*", "enable": False},
# 2. Enable and configure the target quantizers
{"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}},
{"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}},
# 1. Deny all quantizers by default
{"*", {"enable": False}},
# 2. Enable and configure the target quantizers
{"*weight_quantizer": {"num_bits": 8, "axis": 0}},
{"*input_quantizer": {"num_bits": 8, "axis": None}},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need the following fields "quantizer_path", "cfg"

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/gpu/torch/quantization/test_real_quantize_cuda.py`:
- Around line 50-56: The loop over config["quant_cfg"] that looks for
entry.get("quantizer_path") == "*weight_quantizer" must fail fast if no such
entry is found and must validate the shape of entry["cfg"]["block_sizes"];
update the code that currently sets entry.setdefault("cfg", {})["block_sizes"] =
{...} to instead: scan config["quant_cfg"] and set a found flag when the
matching entry is located, then after the loop call pytest.fail or raise
AssertionError if not found; for the found entry, ensure entry.setdefault("cfg",
{}) contains a "block_sizes" dict and validate that it has the required keys
(e.g., -1 and "scale_bits") and that their values are integers (or otherwise of
expected type/shape), raising a test failure if the validation fails; apply the
same fail-fast + validation change to the other similar loop in the file that
handles the weight_quantizer (the block around the second occurrence).
- Around line 50-56: The test mutates shared quant config constants (e.g.,
mtq.INT4_AWQ_CFG) by updating entries in config["quant_cfg"], which can leak
state between tests; to fix, make a defensive deep copy of the config before
modifying it (e.g., copy.deepcopy(config) or deepcopy the specific
config["quant_cfg"] list) and perform the block_sizes modification on that copy
so the original mtq.INT4_AWQ_CFG remains unchanged; locate the code that
iterates over config["quant_cfg"] and change it to operate on the copied object
(references: config, config["quant_cfg"], entry, mtq.INT4_AWQ_CFG).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 14610ec4-1ae7-433b-b1df-66856f70d771

📥 Commits

Reviewing files that changed from the base of the PR and between fe2d2f3 and b9d67d3.

📒 Files selected for processing (1)
  • tests/gpu/torch/quantization/test_real_quantize_cuda.py

Comment on lines +50 to +56
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
entry.setdefault("cfg", {})["block_sizes"] = {
-1: 16,
"scale_bits": 8,
}
break
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fail fast if *weight_quantizer entry is missing (and validate cfg shape).

The current loop silently does nothing when the entry is absent, which can hide config-schema regressions in tests.

Suggested fix
         for entry in config["quant_cfg"]:
             if entry.get("quantizer_path") == "*weight_quantizer":
-                entry.setdefault("cfg", {})["block_sizes"] = {
+                cfg = entry.setdefault("cfg", {})
+                if not isinstance(cfg, dict):
+                    pytest.fail("Expected dict cfg for *weight_quantizer in INT4_AWQ_CFG")
+                cfg["block_sizes"] = {
                     "scale_bits": 8,
                 }
                 break
+        else:
+            pytest.fail("INT4_AWQ_CFG missing *weight_quantizer entry")

Also applies to: 107-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu/torch/quantization/test_real_quantize_cuda.py` around lines 50 -
56, The loop over config["quant_cfg"] that looks for entry.get("quantizer_path")
== "*weight_quantizer" must fail fast if no such entry is found and must
validate the shape of entry["cfg"]["block_sizes"]; update the code that
currently sets entry.setdefault("cfg", {})["block_sizes"] = {...} to instead:
scan config["quant_cfg"] and set a found flag when the matching entry is
located, then after the loop call pytest.fail or raise AssertionError if not
found; for the found entry, ensure entry.setdefault("cfg", {}) contains a
"block_sizes" dict and validate that it has the required keys (e.g., -1 and
"scale_bits") and that their values are integers (or otherwise of expected
type/shape), raising a test failure if the validation fails; apply the same
fail-fast + validation change to the other similar loop in the file that handles
the weight_quantizer (the block around the second occurrence).

⚠️ Potential issue | 🟠 Major

Avoid mutating shared quant config constants in-place across tests.

These blocks mutate mtq.INT4_AWQ_CFG through config, which can leak state to later tests and make runs order-dependent.

Suggested fix
+import copy
@@
 def test_real_quantize(model_cls, config):
     """Test quantize function can run without problems."""
+    config = copy.deepcopy(config)
@@
 def test_save_restore(model_cls, config):
+    config = copy.deepcopy(config)
     # update config to fit test cases

Also applies to: 107-113

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/gpu/torch/quantization/test_real_quantize_cuda.py` around lines 50 -
56, The test mutates shared quant config constants (e.g., mtq.INT4_AWQ_CFG) by
updating entries in config["quant_cfg"], which can leak state between tests; to
fix, make a defensive deep copy of the config before modifying it (e.g.,
copy.deepcopy(config) or deepcopy the specific config["quant_cfg"] list) and
perform the block_sizes modification on that copy so the original
mtq.INT4_AWQ_CFG remains unchanged; locate the code that iterates over
config["quant_cfg"] and change it to operate on the copied object (references:
config, config["quant_cfg"], entry, mtq.INT4_AWQ_CFG).

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
examples/llm_ptq/example_utils.py (1)

206-247: ⚠️ Potential issue | 🟠 Major

Normalize before editing quant_cfg, and don't assume the first weight rule wins.

build_quant_cfg() now mutates quant_cfg["quant_cfg"] before the QuantizeConfig validator can normalize legacy dict-style configs, and the AWQ branch grabs the first *weight_quantizer entry even though list configs are ordered last-wins. That means old configs can fail here, and appended weight overrides can be silently ignored.

🛠️ Suggested fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list
...
 def build_quant_cfg(
     qformat,
     quant_cfg,
     awq_block_size,
     model_type,
     moe_calib_experts_ratio: float | None = None,
 ) -> dict[str, Any]:
     quant_cfg = copy.deepcopy(quant_cfg)
+    quant_cfg["quant_cfg"] = normalize_quant_cfg_list(quant_cfg.get("quant_cfg", []))
+
     if "awq" in str(quant_cfg.get("algorithm")):
         weight_quantizer_entry = next(
             e
-            for e in quant_cfg["quant_cfg"]
-            if isinstance(e, dict) and e.get("quantizer_path") == "*weight_quantizer"
+            for e in reversed(quant_cfg["quant_cfg"])
+            if e.get("quantizer_path") == "*weight_quantizer" and e.get("cfg") is not None
         )
         weight_quantizer = weight_quantizer_entry.get("cfg", {})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@examples/llm_ptq/example_utils.py` around lines 206 - 247, In
build_quant_cfg, avoid mutating quant_cfg["quant_cfg"] before it is normalized
and stop assuming the first "*weight_quantizer" rule wins: first run the
normalization/validation step used elsewhere (i.e., the QuantizeConfig or
whatever normalizer the codebase provides) on quant_cfg so legacy dict-style
entries are canonicalized, then when handling the AWQ branch find the effective
weight_quantizer by scanning quant_cfg["quant_cfg"] in reverse (or otherwise
selecting the last matching entry) to respect last-wins semantics; update that
entry's cfg (and apply awq_block_size to weight_quantizer["block_sizes"][-1])
and only then adjust quant_cfg["algorithm"] for AWQ or other model-specific
tweaks.
modelopt/torch/quantization/conversion.py (1)

423-454: ⚠️ Potential issue | 🟠 Major

Enforce the no-sequential-list contract in this context manager.

The docstring says cfg: [...] entries are unsupported here, but the function no longer rejects them. If one gets through, set_quantizer_by_cfg() can replace foo.quantizer with a SequentialQuantizer while the exit path only snapshots/restores TensorQuantizer names, so restoration can miss the swap or fail on new child names like foo.quantizer.0.

🛠️ Suggested fix
 def set_quantizer_by_cfg_context(quant_model: nn.Module, quant_cfg: QuantizeQuantCfgType):
     """Context manager that temporarily applies a quantization config and restores the original state on exit.
@@
     """
     quant_cfg = normalize_quant_cfg_list(quant_cfg)
+    if any(isinstance(entry["cfg"], list) for entry in quant_cfg if entry["cfg"] is not None):
+        raise ValueError(
+            "set_quantizer_by_cfg_context() does not support SequentialQuantizer cfg lists."
+        )
 
     original_attributes = {}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/conversion.py` around lines 423 - 454, The
context manager set_quantizer_by_cfg_context must reject sequential cfg lists
before mutating the model: validate quant_cfg (after normalize_quant_cfg_list or
on the original input) and if any entry is a list/sequence (i.e., a sequential
cfg that would produce a SequentialQuantizer) raise a ValueError with a clear
message; perform this check prior to calling set_quantizer_by_cfg so
set_quantizer_by_cfg_context never allows a config that could swap in a
SequentialQuantizer and break the restore logic which only looks for
TensorQuantizer instances.
♻️ Duplicate comments (2)
docs/source/guides/_pytorch_quantization.rst (1)

258-272: ⚠️ Potential issue | 🟡 Minor

Make this example self-contained.

_default_disabled_quantizer_cfg is still undefined in this snippet, so the block fails when copied verbatim. Please qualify/import it or inline those exclusions instead of referencing the bare private helper.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/source/guides/_pytorch_quantization.rst` around lines 258 - 272, The
example references an undefined symbol _default_disabled_quantizer_cfg causing
copy-paste failures; fix by either importing or inlining its entries: update the
snippet that defines MY_CUSTOM_CONFIG to remove the bare reference and instead
import the helper (e.g., from the module that exports
_default_disabled_quantizer_cfg) or expand its contents directly into the
"quant_cfg" list so the configuration is self-contained; ensure the change is
applied to the MY_CUSTOM_CONFIG definition and keep the existing entries for
"*", "*weight_quantizer", and "*input_quantizer" intact.
modelopt/torch/quantization/conversion.py (1)

340-356: ⚠️ Potential issue | 🔴 Critical

Full replacements still don't swap incompatible quantizer shapes.

The list branch only warns when len(attributes) != len(module), and the scalar branch mutates an existing SequentialQuantizer in place. Both cases leave stale sub-quantizers around instead of honoring the advertised “cfg is a complete replacement” behavior.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/conversion.py` around lines 340 - 356, The loop
handling quantizer replacements currently mutates existing SequentialQuantizer
instances or only warns on length mismatches, leaving stale sub-quantizers;
change it to perform full replacements: when _match_quantizer(...) finds a
target and attributes is a list, always construct a new
SequentialQuantizer(*(TensorQuantizer() for _ in range(len(attributes)))) and
replace the submodule via quant_model.get_submodule(...)/setattr before calling
set_from_attribute_config on the new module (don’t just warn when
len(attributes) != len(module)); when attributes is not a list and the existing
module is a SequentialQuantizer, replace it with a fresh TensorQuantizer() in
the parent and then call set_from_attribute_config on that new TensorQuantizer
instead of mutating the old SequentialQuantizer in place; use the same name
resolution logic already present (name.rpartition / name.split) to locate parent
and attribute name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/torch/quantization/conversion.py`:
- Around line 258-263: The current truthy check treats an explicit empty config
({}) the same as omitted config, so change the condition to detect only omitted
configs by replacing the "if not cfg" logic with an explicit None check (e.g.,
"if cfg is None") so that normalize_quant_cfg_list()'s semantics are respected;
this ensures that when cfg == {} the else branch runs and
set_quantizer_attributes_partial/other reset logic for quant_model,
quantizer_path and parent_class will execute to reset attributes rather than
only toggling "enable".

---

Outside diff comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 206-247: In build_quant_cfg, avoid mutating quant_cfg["quant_cfg"]
before it is normalized and stop assuming the first "*weight_quantizer" rule
wins: first run the normalization/validation step used elsewhere (i.e., the
QuantizeConfig or whatever normalizer the codebase provides) on quant_cfg so
legacy dict-style entries are canonicalized, then when handling the AWQ branch
find the effective weight_quantizer by scanning quant_cfg["quant_cfg"] in
reverse (or otherwise selecting the last matching entry) to respect last-wins
semantics; update that entry's cfg (and apply awq_block_size to
weight_quantizer["block_sizes"][-1]) and only then adjust quant_cfg["algorithm"]
for AWQ or other model-specific tweaks.

In `@modelopt/torch/quantization/conversion.py`:
- Around line 423-454: The context manager set_quantizer_by_cfg_context must
reject sequential cfg lists before mutating the model: validate quant_cfg (after
normalize_quant_cfg_list or on the original input) and if any entry is a
list/sequence (i.e., a sequential cfg that would produce a SequentialQuantizer)
raise a ValueError with a clear message; perform this check prior to calling
set_quantizer_by_cfg so set_quantizer_by_cfg_context never allows a config that
could swap in a SequentialQuantizer and break the restore logic which only looks
for TensorQuantizer instances.

---

Duplicate comments:
In `@docs/source/guides/_pytorch_quantization.rst`:
- Around line 258-272: The example references an undefined symbol
_default_disabled_quantizer_cfg causing copy-paste failures; fix by either
importing or inlining its entries: update the snippet that defines
MY_CUSTOM_CONFIG to remove the bare reference and instead import the helper
(e.g., from the module that exports _default_disabled_quantizer_cfg) or expand
its contents directly into the "quant_cfg" list so the configuration is
self-contained; ensure the change is applied to the MY_CUSTOM_CONFIG definition
and keep the existing entries for "*", "*weight_quantizer", and
"*input_quantizer" intact.

In `@modelopt/torch/quantization/conversion.py`:
- Around line 340-356: The loop handling quantizer replacements currently
mutates existing SequentialQuantizer instances or only warns on length
mismatches, leaving stale sub-quantizers; change it to perform full
replacements: when _match_quantizer(...) finds a target and attributes is a
list, always construct a new SequentialQuantizer(*(TensorQuantizer() for _ in
range(len(attributes)))) and replace the submodule via
quant_model.get_submodule(...)/setattr before calling set_from_attribute_config
on the new module (don’t just warn when len(attributes) != len(module)); when
attributes is not a list and the existing module is a SequentialQuantizer,
replace it with a fresh TensorQuantizer() in the parent and then call
set_from_attribute_config on that new TensorQuantizer instead of mutating the
old SequentialQuantizer in place; use the same name resolution logic already
present (name.rpartition / name.split) to locate parent and attribute name.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 53589e40-16b3-46f6-8ff4-de3dd80a9420

📥 Commits

Reviewing files that changed from the base of the PR and between b9d67d3 and 823d602.

📒 Files selected for processing (5)
  • docs/source/guides/_pytorch_quantization.rst
  • examples/llm_ptq/example_utils.py
  • modelopt/torch/export/unified_export_hf.py
  • modelopt/torch/quantization/conversion.py
  • tests/unit/torch/quantization/plugins/test_peft.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • tests/unit/torch/quantization/plugins/test_peft.py
  • modelopt/torch/export/unified_export_hf.py

Comment on lines +258 to +263
if not cfg:
# No cfg: only toggle the enable state, leave all other attributes unchanged.
set_quantizer_attributes_partial(
quant_model, quantizer_path, {"enable": enable}, parent_class
)
for sub_pattern, sub_cfg in cfg.items():
set_quantizer_attribute(quant_model, sub_pattern, sub_cfg, parent_class)
continue
set_quantizer_attribute(quant_model, pattern, cfg)
else:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Treat cfg={} as a full reset, not as “no cfg”.

normalize_quant_cfg_list() already uses None to mean “cfg omitted”. The truthiness check here collapses an explicit empty config into the enable-only path, so {"cfg": {}} preserves old attributes instead of resetting them to defaults.

🛠️ Suggested fix
-        if not cfg:
+        if cfg is None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/quantization/conversion.py` around lines 258 - 263, The
current truthy check treats an explicit empty config ({}) the same as omitted
config, so change the condition to detect only omitted configs by replacing the
"if not cfg" logic with an explicit None check (e.g., "if cfg is None") so that
normalize_quant_cfg_list()'s semantics are respected; this ensures that when cfg
== {} the else branch runs and set_quantizer_attributes_partial/other reset
logic for quant_model, quantizer_path and parent_class will execute to reset
attributes rather than only toggling "enable".

Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants