[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094
[OMNIML-3689] PTQ quant_cfg semantic correction. Design in doc _quant_cfg.rst#1094shengliangxu wants to merge 34 commits intomainfrom
Conversation
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRefactors quantization configuration from a wildcard-keyed dict schema to an ordered list-of-entry schema (entries with Changes
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Right now the quant_cfg is a dict, but we are using the quant_cfg as if it is a list. When we apply the quant_cfg, we enumerate the items in the dict and apply the config one by one in modelopt/torch/quantization/conversion.py. This implementation actually has the semantic that the latter configs has higher precedence than the former configs. However, dicts do not have reliable ordering. Therefore, we make quant_cfg a list of patterns: 1. The latter config patterns have higher precedence. A latter config in the list overrides a fomer config if they target the same module. 2. A config to each module is atomic, each config provides the full information. We do not compose a quant module config from multiple config lines Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
…s_partial set_quantizer_attributes_full updates the full quantizer attributes, it has the atomic semantic set_quantizer_attributes_partial updates just a partial set of quantizer attributes, it has the merge semantic Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
192ea05 to
fb3bb07
Compare
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1094 +/- ##
==========================================
- Coverage 70.21% 70.18% -0.03%
==========================================
Files 228 229 +1
Lines 25952 26113 +161
==========================================
+ Hits 18221 18327 +106
- Misses 7731 7786 +55 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
4f38294 to
5115452
Compare
There was a problem hiding this comment.
Actionable comments posted: 20
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
examples/llm_autodeploy/run_auto_quantize.py (1)
134-143:⚠️ Potential issue | 🔴 CriticalRemove hardcoded
trust_remote_code=Truein HF model and tokenizer loading.Both
AutoModelForCausalLM.from_pretrained()(line 134) andAutoTokenizer.from_pretrained()(line 142) hardcodetrust_remote_code=True, creating a critical RCE vector from untrusted model repositories. Expose this as a user-controlled parameter defaulting toFalsein themodelopt_ptq()function signature.🔒 Proposed fix
def modelopt_ptq( model_path: str, output_dir: str, qformat: str | None = None, num_samples: int = 512, auto_quantize_bits: float | None = None, calib_dataset: str = "cnn_dailymail", calib_batch_size: int = 8, + trust_remote_code: bool = False, ) -> torch.nn.Module: """Quantize the model with modelopt.""" model = AutoModelForCausalLM.from_pretrained( - model_path, trust_remote_code=True, torch_dtype="auto", device_map="auto" + model_path, + trust_remote_code=trust_remote_code, + torch_dtype="auto", + device_map="auto", ) model.eval() tokenizer = AutoTokenizer.from_pretrained( model_path, model_max_length=2048, padding_side="left", - trust_remote_code=True, + trust_remote_code=trust_remote_code, )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_autodeploy/run_auto_quantize.py` around lines 134 - 143, The code currently hardcodes trust_remote_code=True when calling AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained, creating an RCE risk; update the modelopt_ptq(...) function signature to accept a new parameter (e.g., trust_remote_code: bool = False) and replace the hardcoded True with that parameter for both AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained so callers can opt-in explicitly while defaulting to False.examples/llm_ptq/example_utils.py (1)
208-247:⚠️ Potential issue | 🟠 MajorNormalize
quant_cfg["quant_cfg"]before mutating it.Lines 210 and 244-247 now assume the new list schema. If a caller still passes a legacy dict-style config, the AWQ branch will fail before
mtq.quantize()reaches the backward-compat normalizer, and thephi4mmbranch will hit.append()on a dict. Even in list form, walking from the front can rewrite a shadowed*weight_quantizerrule instead of the effective last match.💡 Suggested fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list @@ def build_quant_cfg( @@ ) -> dict[str, Any]: quant_cfg = copy.deepcopy(quant_cfg) + quant_cfg["quant_cfg"] = normalize_quant_cfg_list(quant_cfg["quant_cfg"]) if "awq" in str(quant_cfg.get("algorithm")): weight_quantizer_entry = next( e - for e in quant_cfg["quant_cfg"] - if isinstance(e, dict) and e.get("quantizer_path") == "*weight_quantizer" + for e in reversed(quant_cfg["quant_cfg"]) + if e.get("quantizer_path") == "*weight_quantizer" ) weight_quantizer = weight_quantizer_entry.get("cfg", {})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/example_utils.py` around lines 208 - 247, The code mutates quant_cfg["quant_cfg"] assuming it's a list; normalize quant_cfg["quant_cfg"] to the list schema up front (convert legacy dict form into the new list form) before any mutations so weight_quantizer lookup and .append() calls are safe; when locating the weight quantizer use a reverse-scan (e.g., search from the end) to find the effective "*weight_quantizer" entry referenced by weight_quantizer_entry so awq_block_size handling (weight_quantizer_entry / weight_quantizer) and the model_type == "phi4mm" .append() calls operate on the correct, normalized list.modelopt/onnx/llm_export_utils/quantization_utils.py (1)
57-111:⚠️ Potential issue | 🟠 MajorCopy the preset before rewriting
quant_cfg.Line 111 mutates
mtq.FP8_DEFAULT_CFG/mtq.NVFP4_DEFAULT_CFG/mtq.INT4_AWQ_CFGin place. With list-based rules, repeated calls can now retain or duplicate previously appendedlm_headentries, soget_quant_config()becomes order-dependent across calls in the same process.💡 Suggested fix
+import copy import time @@ if precision == "fp8": - quant_cfg = mtq.FP8_DEFAULT_CFG + quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG) elif precision == "nvfp4": - quant_cfg = mtq.NVFP4_DEFAULT_CFG + quant_cfg = copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) elif precision == "int4_awq": - quant_cfg = mtq.INT4_AWQ_CFG + quant_cfg = copy.deepcopy(mtq.INT4_AWQ_CFG)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/llm_export_utils/quantization_utils.py` around lines 57 - 111, get_quant_config currently mutates the shared presets mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which causes duplicated lm_head entries on repeated calls; fix by making a deep copy of the chosen preset (e.g., import copy and do quant_cfg = copy.deepcopy(mtq.FP8_DEFAULT_CFG) / copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) / copy.deepcopy(mtq.INT4_AWQ_CFG) inside get_quant_config) before building quant_cfg_list and appending lm_head entries so the original mtq presets remain unchanged.modelopt/torch/quantization/conversion.py (1)
423-454:⚠️ Potential issue | 🟠 MajorReject sequential cfgs here or restore full module topology.
This context manager snapshots only
TensorQuantizerproperties. If a temporary config contains a listcfg,set_quantizer_by_cfg()can replace aTensorQuantizerwith aSequentialQuantizer, and the exit path never converts it back. The docstring already says sequential cfg lists are unsupported here, but that isn’t enforced.🧯 Suggested fix
quant_cfg = normalize_quant_cfg_list(quant_cfg) + if any(isinstance(entry["cfg"], list) for entry in quant_cfg): + raise ValueError("set_quantizer_by_cfg_context does not support sequential cfg lists.") original_attributes = {}modelopt/torch/quantization/config.py (1)
541-545:⚠️ Potential issue | 🔴 CriticalRemove
enablefrom_nvfp4_cfg_bs32.The nested
"enable": Truein_nvfp4_cfg_bs32(line 545) causes a duplicate keyword argument error at runtime. When_nvfp4_selective_quant_cfg()passes this dict as thecfgfield inQuantizerCfgEntry(line 559), the conversion code later unpacks it withQuantizerAttributeConfig(**cfg, enable=enable)(conversion.py line 266). This passesenabletwice and raises TypeError before conversion can complete.Suggested fix
_nvfp4_cfg_bs32 = { "num_bits": (2, 1), "block_sizes": {-1: 32, "type": "dynamic", "scale_bits": (4, 3)}, - "enable": True, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 541 - 545, The dict _nvfp4_cfg_bs32 contains an internal "enable" key which causes a duplicate keyword when later unpacked into QuantizerAttributeConfig; remove the "enable" entry from _nvfp4_cfg_bs32 so it only contains its attribute settings (num_bits, block_sizes, scale_bits/type entries) and let the caller (_nvfp4_selective_quant_cfg -> QuantizerCfgEntry) supply the enable flag separately; specifically edit the _nvfp4_cfg_bs32 constant to drop the "enable": True key so that conversion code that does QuantizerAttributeConfig(**cfg, enable=enable) no longer passes enable twice.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/source/guides/_pytorch_quantization.rst`:
- Around line 260-272: The example references _default_disabled_quantizer_cfg
but doesn't define or import it; fix by either adding an import for it (e.g.,
from mtq.config import _default_disabled_quantizer_cfg) at the top of the
snippet or qualify the reference inline as
mtq.config._default_disabled_quantizer_cfg so the example is self-contained and
runnable; update the code block containing "quant_cfg" to use the qualified name
or add the import near the example.
In `@docs/source/guides/_quant_cfg.rst`:
- Around line 123-143: The docs currently state that _base_disable_all (a.k.a.
the deny-all entry) is prepended to every built-in config; update the text to
accurately say that the deny-all entry (referenced as _base_disable_all or
mtq.config._default_disabled_quantizer_cfg) is included in most built-in configs
but not all, and call out that some specialized recipes (e.g., the KV-only
configs in the quantization config set) intentionally omit the deny-all prelude;
adjust the example wording and the note to avoid the absolute "every built-in
config" claim and add a short parenthetical pointer to the specialized recipes
that omit it (referencing their config names such as the KV-only configs).
In `@examples/diffusers/quantization/config.py`:
- Around line 106-109: The loop that injects trt_high_precision_dtype into
quant_config["quant_cfg"] only handles when cfg is a dict (variable p) and skips
the case where cfg is a list of attribute dicts; update the logic in the loop
that iterates quant_config["quant_cfg"] to detect if p is a list and, when so,
iterate its elements and for each element that is a dict and contains "num_bits"
but not "trt_high_precision_dtype", set element["trt_high_precision_dtype"] =
trt_high_precision_dtype; keep the existing dict-handling branch for backward
compatibility so both dict and list cfg forms are supported.
- Around line 128-147: The current append in reset_set_int8_config causes
duplicate entries in quant_config["quant_cfg"]; instead, make the operation
idempotent by searching quant_config["quant_cfg"] for an existing dict whose
"quantizer_path" equals aq_name and replace that dict in-place (or update its
"cfg") if found, otherwise append the new entry; modify the code around the
append that builds the payload (the block referencing aq_name,
quant_config["quant_cfg"], and the PercentileCalibrator tuple) to perform a
find-and-replace by quantizer_path rather than always appending.
In `@examples/llm_eval/quantization_utils.py`:
- Around line 36-49: The loop that detects BMM-attention entries fails because
mtq_cfg["quant_cfg"] is now a list of dicts; update _quantize_model_with_dataset
to handle both list and dict formats by inspecting each entry: if an entry is a
dict, read its "quantizer_path" (or keys for legacy string entries) and test
whether the path string contains "bmm_quantizer" or matches "*bmm_quantizer"; if
mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old
membership check. Ensure this detection is used before calling
register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via
"quantizer_path": "*bmm_quantizer" are not skipped.
In `@examples/llm_ptq/hf_ptq.py`:
- Line 332: The code is using attribute access on plain dicts in
_default_disabled_quantizer_cfg which will raise AttributeError; update the
disabled_layers construction to access the dict key "quantizer_path" for each
entry (i.e., replace entry.quantizer_path with entry["quantizer_path"]) so that
disabled_layers is built from the correct dictionary key.
In `@examples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynb`:
- Around line 193-198: The code fails because quant_cfg["quant_cfg"] contains
dict entries, not 2-tuples; change the extraction to find the dict whose
quantizer_path (or appropriate key) equals "*weight_quantizer" and then take its
"cfg" field (e.g. use next(entry["cfg"] for entry in quant_cfg["quant_cfg"] if
entry.get("quantizer_path") == "*weight_quantizer")), then handle the case where
weight_quantizer is a list as before and set weight_quantizer["block_sizes"][-1]
= 128; update references to quant_cfg, weight_quantizer, and
quant_cfg["quant_cfg"] accordingly.
In `@examples/vllm_serve/fakequant_worker.py`:
- Around line 173-184: The code currently finds only the first kv rule
(kv_entry) and appends minimal clones at the end losing order and full fields;
instead iterate over kv_quant_cfg with indexes, find every entry where
isinstance(e, dict) and e.get("quantizer_path") == "*[kv]_bmm_quantizer", and
for each matching entry insert (not append) two full deep-copied clones of that
entire entry right after the original entry with quantizer_path set to
"*kv_c_bmm_quantizer" and "*k_pe_bmm_quantizer" respectively so you preserve
list order and all other fields (enable, parent_class, etc.) — operate on
kv_quant_cfg and use copy.deepcopy to clone the whole entry before changing
quantizer_path.
In
`@examples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.py`:
- Around line 260-271: The quant_cfg list currently uses tuple entries and
embeds "enable" inside the quantizer config dict (_nvfp4_cfg), which fails
validation in normalize_quant_cfg_list(); change quant_cfg to a list of
QuantizerCfgEntry dicts where each entry is e.g. {"quantizer_path": "<pattern>",
"cfg": <cfg_dict>} for enabled entries (move "enable" out of _nvfp4_cfg) and
{"quantizer_path": "<pattern>", "enable": False} for disabled patterns (use
SENSITIVE_LAYER_PATTERNS and the f"*transformer_blocks.{i}.*" patterns for
exclude_blocks), ensuring _nvfp4_cfg no longer contains a top-level "enable" key
and only used as the "cfg" value.
In `@modelopt/torch/quantization/algorithms.py`:
- Around line 1333-1344: The loop in get_auto_quantize_config is calling
_match_quantizer_cfg with only the short names
("input_quantizer"/"weight_quantizer"), so path- or parent_class-scoped
selectors never match and enable-only matches are dropped; change the call to
pass the fully qualified quantizer name (e.g.,
f"{module_name}.{quantizer_attr}") to _match_quantizer_cfg and ensure the caller
appends an entry when either matched_cfg is not None or matched_enable is True
(so enable-only matches are preserved); apply the same fix for the analogous
block that starts at the later occurrence (the other loop referencing
_match_quantizer_cfg, module_name, and quantizer_attr).
- Around line 96-98: The code currently drops the entry-level "enable" flag and
estimates compression from every entry's "cfg"; change the selection to only
include cfgs from QuantizerCfgEntry items where e.get("enable", True) is truthy
and the "cfg" is not None. Concretely, in the block using quant_cfg.quant_cfg
and calling estimate_quant_compression_for_quantizer, filter entries by
e.get("enable", True) before extracting e.get("cfg") (and still skip None cfgs)
so estimate_quant_compression_for_quantizer receives only enabled cfgs.
In `@modelopt/torch/quantization/backends/fp8_per_tensor_gemm.py`:
- Around line 100-116: The availability probe currently uses next(...) without a
default and assertions on input_cfg/weight_cfg which can raise
StopIteration/AssertionError; change the two next(...) calls that produce
input_cfg and weight_cfg to include a default (e.g., next(..., None)) and
replace the assert isinstance(...) checks with explicit conditional checks that
return False from the probe when input_cfg or weight_cfg is missing or not a
dict; ensure you reference FP8_DEFAULT_CFG["quant_cfg"], quant_cfg_list,
input_cfg and weight_cfg in the probe logic so malformed or missing default
entries cause the function to return False instead of throwing.
In `@modelopt/torch/quantization/backends/nvfp4_gemm.py`:
- Around line 216-231: The availability check currently uses next(...) without
defaults and asserts which can raise instead of returning False; update the
_nvfp4_availability_check implementation to retrieve input_cfg and weight_cfg
from quant_cfg_list using next(..., None) (or equivalent), verify they are dicts
with proper "cfg" entries and that their "quantizer_path" matches
"*input_quantizer" / "*weight_quantizer", and if either is missing or not a dict
return False instead of asserting or allowing StopIteration; reference the
symbols quant_cfg_list, input_cfg, weight_cfg and the existing next(...)/assert
lines to locate and change the lookup and validation to defensive checks that
return False on malformed/missing configs.
In `@modelopt/torch/quantization/config.py`:
- Around line 1625-1631: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged causing legacy dict-style quant_cfg to bypass
normalization; update normalize_quant_cfg to detect dict inputs and pass them
through normalize_quant_cfg_list (which handles dict -> entry conversion via
_dict_to_entry) so legacy {"*weight_quantizer": {...}} shapes are converted to
the expected list form; ensure this change preserves existing behavior for lists
and other non-dict types so downstream callers like need_calibration (which
iterates entries) receive normalized entry lists.
In `@modelopt/torch/quantization/conversion.py`:
- Around line 341-356: The current logic in _match_quantizer branch mutates
existing SequentialQuantizer instances instead of fully replacing them when the
incoming attributes change shape; to fix, ensure full replacement instead of
partial assignment: when attributes is a list, if module is not a
SequentialQuantizer create a new SequentialQuantizer (as before) OR if module is
a SequentialQuantizer but len(attributes) != len(module) replace the existing
module with a new SequentialQuantizer of the correct length before calling
set_from_attribute_config; conversely, when attributes is not a list and module
is a SequentialQuantizer, replace it with a fresh TensorQuantizer instance and
then call set_from_attribute_config on that new instance (use
quant_model.get_submodule/name.rpartition to locate parent_module and setattr to
swap the module).
In `@modelopt/torch/quantization/model_quant.py`:
- Around line 183-191: The docstring for quantize() is out of sync: it still
describes quant_cfg as a dict mapping wildcard keys, but the implementation and
example use a list-of-dicts schema. Update the prose in the quantize() doc block
to describe quant_cfg as a list where each item is a dict with keys like
"quantizer_path" (wildcard pattern), optional "enable" (bool), optional "cfg"
(dict, e.g., {"num_bits": int, "axis": int}), and that multiple entries are
applied in order; also mention the top-level "algorithm" field (e.g., "max") and
how entries such as
{"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and
{"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are
interpreted by quantize()/quant_cfg.
In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 835-836: The code is appending dict.items() tuples into
quant_cfg["quant_cfg"], creating mixed tuples instead of the expected
list[QuantizerCfgEntry] and can error when quant_cfg["quant_cfg"] is a dict;
normalize both sources into proper QuantizerCfgEntry dicts: ensure inner is a
list (if quant_cfg.get("quant_cfg") is a dict, convert it to a list of entries),
and transform kv_cache_quant_cfg (the dict of pattern->cfg) into a list of
{"quantizer_path": pattern, ...cfg} or at least {"quantizer_path": pattern,
"enable": cfg.get("enable", False)} entries before concatenating; assign the
concatenated list back to quant_cfg["quant_cfg"] so downstream consumers only
see a homogeneous list[QuantizerCfgEntry].
In `@tests/_test_utils/torch/quantization/quantize_common.py`:
- Around line 50-61: The loop over config["quant_cfg"] looks for a
"*weight_quantizer" entry and mutates its "block_sizes" but silently does
nothing if no match is found; add a fail-fast check by tracking whether a
matching entry was found (e.g., set a boolean flag when pat ==
"*weight_quantizer") and after the loop raise an explicit error (ValueError or
AssertionError) if no match was found so tests fail early rather than silently
proceeding without applying block_size to the intended entry.
In `@tests/gpu/torch/quantization/test_quantize_cuda.py`:
- Around line 133-135: The test is mutating the shared constant
mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by editing config["quant_cfg"] in place;
instead, create a deep copy of that constant (e.g., via copy.deepcopy) into the
local variable config before patching so changes to entry.setdefault("cfg",
{})["block_sizes"] = {-1: 8, -2: 8} only affect this test; locate the code that
assigns config from mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a
deep-copied version so modifications to quant_cfg/ block_sizes/quantizer_path do
not leak to other parametrized cases.
In `@tests/unit/torch/quantization/test_autoquant.py`:
- Around line 492-507: The current assertion allows tuple-style payloads by
falling back to entry[0]/entry[1]; restrict the check to dict-style
QuantizerCfgEntry entries only by iterating over config["quant_cfg"] and
filtering with isinstance(entry, dict), then assert that some dict has
entry.get("quantizer_path") == "*" and entry.get("enable") is False; update the
assertion around the quant_cfg check in
tests/unit/torch/quantization/test_autoquant.py accordingly (referencing
config["quant_cfg"] and the dict keys "quantizer_path" and "enable").
---
Outside diff comments:
In `@examples/llm_autodeploy/run_auto_quantize.py`:
- Around line 134-143: The code currently hardcodes trust_remote_code=True when
calling AutoModelForCausalLM.from_pretrained and AutoTokenizer.from_pretrained,
creating an RCE risk; update the modelopt_ptq(...) function signature to accept
a new parameter (e.g., trust_remote_code: bool = False) and replace the
hardcoded True with that parameter for both AutoModelForCausalLM.from_pretrained
and AutoTokenizer.from_pretrained so callers can opt-in explicitly while
defaulting to False.
In `@examples/llm_ptq/example_utils.py`:
- Around line 208-247: The code mutates quant_cfg["quant_cfg"] assuming it's a
list; normalize quant_cfg["quant_cfg"] to the list schema up front (convert
legacy dict form into the new list form) before any mutations so
weight_quantizer lookup and .append() calls are safe; when locating the weight
quantizer use a reverse-scan (e.g., search from the end) to find the effective
"*weight_quantizer" entry referenced by weight_quantizer_entry so awq_block_size
handling (weight_quantizer_entry / weight_quantizer) and the model_type ==
"phi4mm" .append() calls operate on the correct, normalized list.
In `@modelopt/onnx/llm_export_utils/quantization_utils.py`:
- Around line 57-111: get_quant_config currently mutates the shared presets
mtq.FP8_DEFAULT_CFG / mtq.NVFP4_DEFAULT_CFG / mtq.INT4_AWQ_CFG in place which
causes duplicated lm_head entries on repeated calls; fix by making a deep copy
of the chosen preset (e.g., import copy and do quant_cfg =
copy.deepcopy(mtq.FP8_DEFAULT_CFG) / copy.deepcopy(mtq.NVFP4_DEFAULT_CFG) /
copy.deepcopy(mtq.INT4_AWQ_CFG) inside get_quant_config) before building
quant_cfg_list and appending lm_head entries so the original mtq presets remain
unchanged.
In `@modelopt/torch/quantization/config.py`:
- Around line 541-545: The dict _nvfp4_cfg_bs32 contains an internal "enable"
key which causes a duplicate keyword when later unpacked into
QuantizerAttributeConfig; remove the "enable" entry from _nvfp4_cfg_bs32 so it
only contains its attribute settings (num_bits, block_sizes, scale_bits/type
entries) and let the caller (_nvfp4_selective_quant_cfg -> QuantizerCfgEntry)
supply the enable flag separately; specifically edit the _nvfp4_cfg_bs32
constant to drop the "enable": True key so that conversion code that does
QuantizerAttributeConfig(**cfg, enable=enable) no longer passes enable twice.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fcb69c76-5a46-4dd0-82f2-905613c6cc83
📒 Files selected for processing (53)
docs/source/guides/1_quantization.rstdocs/source/guides/_pytorch_quantization.rstdocs/source/guides/_quant_cfg.rstexamples/diffusers/quantization/config.pyexamples/llm_autodeploy/run_auto_quantize.pyexamples/llm_eval/quantization_utils.pyexamples/llm_ptq/example_utils.pyexamples/llm_ptq/hf_ptq.pyexamples/llm_ptq/notebooks/2_PTQ_AWQ_Calibration.ipynbexamples/llm_ptq/notebooks/3_PTQ_AutoQuantization.ipynbexamples/llm_qat/main.pyexamples/vllm_serve/fakequant_worker.pyexamples/windows/torch_onnx/diffusers/qad_example/sample_example_qad_diffusers.pymodelopt/onnx/llm_export_utils/quantization_utils.pymodelopt/torch/export/unified_export_hf.pymodelopt/torch/quantization/algorithms.pymodelopt/torch/quantization/backends/fp8_per_tensor_gemm.pymodelopt/torch/quantization/backends/nvfp4_gemm.pymodelopt/torch/quantization/compress.pymodelopt/torch/quantization/config.pymodelopt/torch/quantization/conversion.pymodelopt/torch/quantization/model_calib.pymodelopt/torch/quantization/model_quant.pymodelopt/torch/quantization/nn/modules/tensor_quantizer.pymodelopt/torch/quantization/utils/core_utils.pymodelopt/torch/sparsity/attention_sparsity/conversion.pymodelopt_recipes/general/ptq/fp8_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_default-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_mlp_only-fp8_kv.ymlmodelopt_recipes/general/ptq/nvfp4_omlp_only-fp8_kv.ymltests/_test_utils/torch/export/utils.pytests/_test_utils/torch/quantization/onnx_export.pytests/_test_utils/torch/quantization/quantize_common.pytests/gpu/torch/quantization/test_hadamard.pytests/gpu/torch/quantization/test_quant_rnn_cuda.pytests/gpu/torch/quantization/test_quantize_cuda.pytests/gpu_megatron/torch/peft/plugins/test_megatron_peft.pytests/gpu_megatron/torch/quantization/plugins/test_apex.pytests/gpu_megatron/torch/quantization/plugins/test_megatron.pytests/unit/recipe/test_loader.pytests/unit/torch/quantization/plugins/test_attention_quant.pytests/unit/torch/quantization/plugins/test_huggingface.pytests/unit/torch/quantization/plugins/test_peft.pytests/unit/torch/quantization/test_autoquant.pytests/unit/torch/quantization/test_compute_quantization_mse.pytests/unit/torch/quantization/test_config_validation.pytests/unit/torch/quantization/test_custom_backend.pytests/unit/torch/quantization/test_quant_activations.pytests/unit/torch/quantization/test_quant_batchnorm.pytests/unit/torch/quantization/test_quant_rnn.pytests/unit/torch/quantization/test_quantize_cpu.pytests/unit/torch/quantization/test_quantize_replace.pytests/unit/torch/quantization/test_tensor_quant_cpu.py
| "quant_cfg": [ | ||
| # Disable all quantizers by default, then enable selectively | ||
| {"quantizer_path": "*", "enable": False}, | ||
| # Configure weight quantizers with 4-bit precision and 128-element blocks | ||
| "*weight_quantizer": {"num_bits": 4, "block_sizes": {-1: 128}, "enable": True}, | ||
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 4, "block_sizes": {-1: 128}}, "enable": True}, | ||
| # Configure input quantizers with 8-bit dynamic quantization | ||
| "*input_quantizer": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}, | ||
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}}, | ||
| # Include default disabled quantizer configurations | ||
| **_default_disabled_quantizer_cfg, | ||
| }, | ||
| *_default_disabled_quantizer_cfg, | ||
| ], |
There was a problem hiding this comment.
Define or qualify _default_disabled_quantizer_cfg in this example.
This snippet now references _default_disabled_quantizer_cfg without importing or qualifying it, so it fails when copied verbatim. Please either add the import or qualify it as mtq.config._default_disabled_quantizer_cfg.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/source/guides/_pytorch_quantization.rst` around lines 260 - 272, The
example references _default_disabled_quantizer_cfg but doesn't define or import
it; fix by either adding an import for it (e.g., from mtq.config import
_default_disabled_quantizer_cfg) at the top of the snippet or qualify the
reference inline as mtq.config._default_disabled_quantizer_cfg so the example is
self-contained and runnable; update the code block containing "quant_cfg" to use
the qualified name or add the import near the example.
| The recommended pattern used by all built-in configs is: | ||
|
|
||
| .. code-block:: python | ||
|
|
||
| "quant_cfg": [ | ||
| # 1. Deny all quantizers by default | ||
| {"quantizer_path": "*", "enable": False}, | ||
|
|
||
| # 2. Enable and configure the target quantizers | ||
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}}, | ||
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}}, | ||
|
|
||
| # 3. Apply standard exclusions last (BatchNorm, LM head, MoE routers, etc.) | ||
| *mtq.config._default_disabled_quantizer_cfg, | ||
| ] | ||
|
|
||
| .. note:: | ||
|
|
||
| The deny-all entry ``{"quantizer_path": "*", "enable": False}`` is available as | ||
| :data:`modelopt.torch.quantization.config._base_disable_all` and is prepended to every | ||
| built-in config. This ensures quantizers not explicitly targeted remain disabled. |
There was a problem hiding this comment.
Don’t describe _base_disable_all as universal across built-ins.
Several specialized recipes in modelopt/torch/quantization/config.py intentionally omit the deny-all prelude (for example the KV-only configs). Saying it is prepended to every built-in config will send users to the wrong pattern for those formats.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/source/guides/_quant_cfg.rst` around lines 123 - 143, The docs currently
state that _base_disable_all (a.k.a. the deny-all entry) is prepended to every
built-in config; update the text to accurately say that the deny-all entry
(referenced as _base_disable_all or mtq.config._default_disabled_quantizer_cfg)
is included in most built-in configs but not all, and call out that some
specialized recipes (e.g., the KV-only configs in the quantization config set)
intentionally omit the deny-all prelude; adjust the example wording and the note
to avoid the absolute "every built-in config" claim and add a short
parenthetical pointer to the specialized recipes that omit it (referencing their
config names such as the KV-only configs).
| for entry in quant_config["quant_cfg"]: | ||
| p = entry.get("cfg", {}) | ||
| if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p: | ||
| p["trt_high_precision_dtype"] = trt_high_precision_dtype |
There was a problem hiding this comment.
Handle sequential cfg lists when injecting trt_high_precision_dtype.
The new quant_cfg format allows cfg to be a list of attribute dicts, but Line 107 only updates plain dicts. Any sequential entry will skip this dtype injection entirely.
🛠️ Proposed fix
- for entry in quant_config["quant_cfg"]:
- p = entry.get("cfg", {})
- if isinstance(p, dict) and "num_bits" in p and "trt_high_precision_dtype" not in p:
- p["trt_high_precision_dtype"] = trt_high_precision_dtype
+ for entry in quant_config["quant_cfg"]:
+ cfg = entry.get("cfg")
+ cfg_items = [cfg] if isinstance(cfg, dict) else (cfg or [])
+ for attrs in cfg_items:
+ if "num_bits" in attrs and "trt_high_precision_dtype" not in attrs:
+ attrs["trt_high_precision_dtype"] = trt_high_precision_dtype🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/diffusers/quantization/config.py` around lines 106 - 109, The loop
that injects trt_high_precision_dtype into quant_config["quant_cfg"] only
handles when cfg is a dict (variable p) and skips the case where cfg is a list
of attribute dicts; update the logic in the loop that iterates
quant_config["quant_cfg"] to detect if p is a list and, when so, iterate its
elements and for each element that is a dict and contains "num_bits" but not
"trt_high_precision_dtype", set element["trt_high_precision_dtype"] =
trt_high_precision_dtype; keep the existing dict-handling branch for backward
compatibility so both dict and list cfg forms are supported.
| quant_config["quant_cfg"].append( | ||
| { | ||
| "quantizer_path": aq_name, | ||
| "cfg": { | ||
| "num_bits": 8, | ||
| "axis": None, | ||
| "percentile": percentile, | ||
| "total_step": n_steps, | ||
| "collect_method": collect_method, | ||
| "calibrator": ( | ||
| PercentileCalibrator, | ||
| (), | ||
| { | ||
| "num_bits": 8, | ||
| "axis": None, | ||
| "percentile": percentile, | ||
| "total_step": n_steps, | ||
| "collect_method": collect_method, | ||
| }, | ||
| ), | ||
| }, | ||
| ), | ||
| } | ||
| } | ||
| ) |
There was a problem hiding this comment.
Make reset_set_int8_config() idempotent under list-based quant_cfg.
Line 128 switched this helper from a replace-by-key update to an unconditional append. Reusing the same quant_config object now accumulates duplicate or stale *{name}*input_quantizer* rules, which changes later-match behavior under the new schema.
🧼 Proposed fix
if isinstance(module, nn.Conv2d):
aq_name = f"*{name}*input_quantizer*"
+ quant_config["quant_cfg"] = [
+ entry
+ for entry in quant_config["quant_cfg"]
+ if not (isinstance(entry, dict) and entry.get("quantizer_path") == aq_name)
+ ]
quant_config["quant_cfg"].append(
{
"quantizer_path": aq_name,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/diffusers/quantization/config.py` around lines 128 - 147, The
current append in reset_set_int8_config causes duplicate entries in
quant_config["quant_cfg"]; instead, make the operation idempotent by searching
quant_config["quant_cfg"] for an existing dict whose "quantizer_path" equals
aq_name and replace that dict in-place (or update its "cfg") if found, otherwise
append the new entry; modify the code around the append that builds the payload
(the block referencing aq_name, quant_config["quant_cfg"], and the
PercentileCalibrator tuple) to perform a find-and-replace by quantizer_path
rather than always appending.
| "quant_cfg": [ | ||
| *mtq.config._base_disable_all, | ||
| { | ||
| "quantizer_path": "*weight_quantizer", | ||
| "cfg": {"num_bits": 4, "block_sizes": {-1: 128}}, | ||
| "enable": True, | ||
| }, | ||
| { | ||
| "quantizer_path": "*input_quantizer", | ||
| "cfg": {"num_bits": 8, "type": "dynamic", "block_sizes": {-1: None}}, | ||
| }, | ||
| # Disable sensitive layers such as `lm_head`, gate layers in MoE etc. | ||
| **mtq.config._default_disabled_quantizer_cfg, | ||
| }, | ||
| *mtq.config._default_disabled_quantizer_cfg, | ||
| ], |
There was a problem hiding this comment.
Update the BMM-attention detection for list-based quant_cfg.
After switching quant_cfg to a list of entries, _quantize_model_with_dataset()'s later for key in mtq_cfg["quant_cfg"] / "bmm_quantizer" in key check no longer works: iterating the list yields dicts, so that membership test only inspects dict keys. Configs that target *bmm_quantizer now skip register_hf_attentions_on_the_fly(net) and silently miss KV-attention quantization.
💡 Suggested update outside this hunk
- quantize_bmm_attention = False
- for key in mtq_cfg["quant_cfg"]:
- if "bmm_quantizer" in key:
- quantize_bmm_attention = True
+ quantize_bmm_attention = any(
+ "bmm_quantizer" in entry["quantizer_path"]
+ for entry in mtq.config.normalize_quant_cfg_list(mtq_cfg["quant_cfg"])
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@examples/llm_eval/quantization_utils.py` around lines 36 - 49, The loop that
detects BMM-attention entries fails because mtq_cfg["quant_cfg"] is now a list
of dicts; update _quantize_model_with_dataset to handle both list and dict
formats by inspecting each entry: if an entry is a dict, read its
"quantizer_path" (or keys for legacy string entries) and test whether the path
string contains "bmm_quantizer" or matches "*bmm_quantizer"; if
mtq_cfg["quant_cfg"] is still a dict or contains raw strings, preserve the old
membership check. Ensure this detection is used before calling
register_hf_attentions_on_the_fly(net) so KV-attention quantizers targeted via
"quantizer_path": "*bmm_quantizer" are not skipped.
| "quant_cfg": [ | ||
| # Disable all quantizers by default | ||
| {"quantizer_path": "*", "enable": False}, | ||
| # "num_bits" specifies the number of bits for quantization | ||
| # "axis" specifies the axis for quantization | ||
| "*weight_quantizer": {"num_bits": 8, "axis": 0}, | ||
| "*input_quantizer": {"num_bits": 8, "axis": -1}, | ||
|
|
||
| # Default quantization settings | ||
| "default": {"num_bits": 8, "axis": None}, | ||
| } | ||
| "algorithm": "max" | ||
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}}, | ||
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": -1}}, | ||
| ], | ||
| "algorithm": "max", |
There was a problem hiding this comment.
Finish updating this quantize() doc block to the list-based schema.
Lines 183-191 show the new list-of-dicts format, but the prose immediately above still describes quant_cfg as a dictionary mapping wildcard keys to attributes. That leaves the public API docs contradictory in the same section.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/model_quant.py` around lines 183 - 191, The
docstring for quantize() is out of sync: it still describes quant_cfg as a dict
mapping wildcard keys, but the implementation and example use a list-of-dicts
schema. Update the prose in the quantize() doc block to describe quant_cfg as a
list where each item is a dict with keys like "quantizer_path" (wildcard
pattern), optional "enable" (bool), optional "cfg" (dict, e.g., {"num_bits":
int, "axis": int}), and that multiple entries are applied in order; also mention
the top-level "algorithm" field (e.g., "max") and how entries such as
{"quantizer_path":"*weight_quantizer","cfg":{"num_bits":8,"axis":0}} and
{"quantizer_path":"*input_quantizer","cfg":{"num_bits":8,"axis":-1}} are
interpreted by quantize()/quant_cfg.
| for entry in config["quant_cfg"]: | ||
| pat = ( | ||
| entry["quantizer_path"] | ||
| if isinstance(entry, dict) and "quantizer_path" in entry | ||
| else entry[0] | ||
| ) | ||
| if pat == "*weight_quantizer": | ||
| if isinstance(entry, dict) and "quantizer_path" in entry: | ||
| entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size} | ||
| else: | ||
| entry[1]["block_sizes"] = {-1: block_size} | ||
| break |
There was a problem hiding this comment.
Fail fast when *weight_quantizer is missing.
If no matching entry exists, this function currently returns without applying block_sizes, which can silently invalidate downstream test intent.
🔧 Proposed fix
def get_awq_config(algorithm="awq_lite", block_size=8):
config = copy.deepcopy(mtq.INT4_AWQ_CFG)
+ found_weight_entry = False
for entry in config["quant_cfg"]:
pat = (
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)
if pat == "*weight_quantizer":
+ found_weight_entry = True
if isinstance(entry, dict) and "quantizer_path" in entry:
entry.setdefault("cfg", {})["block_sizes"] = {-1: block_size}
else:
entry[1]["block_sizes"] = {-1: block_size}
break
+ if not found_weight_entry:
+ raise KeyError("No '*weight_quantizer' entry found in quant_cfg")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/_test_utils/torch/quantization/quantize_common.py` around lines 50 -
61, The loop over config["quant_cfg"] looks for a "*weight_quantizer" entry and
mutates its "block_sizes" but silently does nothing if no match is found; add a
fail-fast check by tracking whether a matching entry was found (e.g., set a
boolean flag when pat == "*weight_quantizer") and after the loop raise an
explicit error (ValueError or AssertionError) if no match was found so tests
fail early rather than silently proceeding without applying block_size to the
intended entry.
| for entry in config["quant_cfg"]: | ||
| if entry.get("quantizer_path") == "*weight_quantizer": | ||
| entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8} |
There was a problem hiding this comment.
Deep-copy the shared config before patching block_sizes.
config here is the imported mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG object. Mutating config["quant_cfg"] in place leaks the reduced block size into later parametrized cases and any other tests that reuse the same constant.
💡 Suggested fix
if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG:
+ import copy
+
+ config = copy.deepcopy(config)
# reduce block sizes for simple testing models
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for entry in config["quant_cfg"]: | |
| if entry.get("quantizer_path") == "*weight_quantizer": | |
| entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8} | |
| if config == mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG: | |
| import copy | |
| config = copy.deepcopy(config) | |
| # reduce block sizes for simple testing models | |
| for entry in config["quant_cfg"]: | |
| if entry.get("quantizer_path") == "*weight_quantizer": | |
| entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2: 8} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/gpu/torch/quantization/test_quantize_cuda.py` around lines 133 - 135,
The test is mutating the shared constant mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG by
editing config["quant_cfg"] in place; instead, create a deep copy of that
constant (e.g., via copy.deepcopy) into the local variable config before
patching so changes to entry.setdefault("cfg", {})["block_sizes"] = {-1: 8, -2:
8} only affect this test; locate the code that assigns config from
mtq.FP8_2D_BLOCKWISE_WEIGHT_ONLY_CFG and replace it with a deep-copied version
so modifications to quant_cfg/ block_sizes/quantizer_path do not leak to other
parametrized cases.
| assert isinstance(config["quant_cfg"], list) | ||
| assert any( | ||
| ( | ||
| entry["quantizer_path"] | ||
| if isinstance(entry, dict) and "quantizer_path" in entry | ||
| else entry[0] | ||
| ) | ||
| == "*" | ||
| and ( | ||
| entry.get("enable") | ||
| if isinstance(entry, dict) and "quantizer_path" in entry | ||
| else entry[1].get("enable") | ||
| ) | ||
| is False | ||
| for entry in config["quant_cfg"] | ||
| ) |
There was a problem hiding this comment.
Keep this assertion strict to QuantizerCfgEntry dicts.
The fallback to entry[0] / entry[1] means this test now accepts tuple-style payloads, so it no longer protects the new list-of-dicts contract from regressions. Assert on dict entries directly.
🧪 Proposed fix
- assert any(
- (
- entry["quantizer_path"]
- if isinstance(entry, dict) and "quantizer_path" in entry
- else entry[0]
- )
- == "*"
- and (
- entry.get("enable")
- if isinstance(entry, dict) and "quantizer_path" in entry
- else entry[1].get("enable")
- )
- is False
- for entry in config["quant_cfg"]
- )
+ assert all(isinstance(entry, dict) for entry in config["quant_cfg"])
+ assert any(
+ entry.get("quantizer_path") == "*" and entry.get("enable") is False
+ for entry in config["quant_cfg"]
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/torch/quantization/test_autoquant.py` around lines 492 - 507, The
current assertion allows tuple-style payloads by falling back to
entry[0]/entry[1]; restrict the check to dict-style QuantizerCfgEntry entries
only by iterating over config["quant_cfg"] and filtering with isinstance(entry,
dict), then assert that some dict has entry.get("quantizer_path") == "*" and
entry.get("enable") is False; update the assertion around the quant_cfg check in
tests/unit/torch/quantization/test_autoquant.py accordingly (referencing
config["quant_cfg"] and the dict keys "quantizer_path" and "enable").
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
5971a7b to
fe2d2f3
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
modelopt/torch/quantization/config.py (1)
1622-1628:⚠️ Potential issue | 🟠 MajorLegacy top-level dict format bypasses normalization.
The
beforevalidator returns non-list inputs unchanged (line 1626-1627). If a user passes a legacy dict-stylequant_cfglike{"*weight_quantizer": {"num_bits": 8}}directly (not wrapped in a list), it will not be normalized. This can causeneed_calibration()and downstream consumers to fail when iterating, as they expect list entries.Consider converting top-level dict inputs to list form before calling
normalize_quant_cfg_list():Suggested fix
`@field_validator`("quant_cfg", mode="before") `@classmethod` def normalize_quant_cfg(cls, v): """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts.""" + if isinstance(v, dict): + # Convert legacy top-level dict format to list of single-key dicts + v = [{k: val} for k, val in v.items()] if not isinstance(v, list): return v return normalize_quant_cfg_list(v)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The before-validator normalize_quant_cfg currently returns non-list inputs unchanged, so legacy top-level dict forms for quant_cfg bypass normalization; update normalize_quant_cfg (the `@field_validator` for "quant_cfg") to detect a dict input (e.g., mapping of pattern->entry) and wrap/convert it into a one-item list (or otherwise translate into the list form expected by normalize_quant_cfg_list) before calling normalize_quant_cfg_list, then return the normalized list so downstream functions like need_calibration() always receive a list of QuantizerCfgEntry dicts.
🧹 Nitpick comments (2)
modelopt/torch/quantization/config.py (2)
1630-1639: Consider validating list-formcfgentries as well.The validator validates dict
cfgvalues againstQuantizerAttributeConfig, but skips listcfg(used forSequentialQuantizer). While downstream processing may catch errors, early validation would surface misconfigurations sooner.Optional enhancement
`@field_validator`("quant_cfg", mode="after") `@classmethod` def validate_quant_cfg_entries(cls, v): """Validate quantizer attribute configs to surface errors (e.g. invalid axis/block_sizes).""" qac_fields = set(QuantizerAttributeConfig.model_fields.keys()) for entry in v: cfg = entry.get("cfg", {}) if isinstance(cfg, dict) and qac_fields & set(cfg.keys()): QuantizerAttributeConfig.model_validate(cfg) + elif isinstance(cfg, list): + for sub_cfg in cfg: + if isinstance(sub_cfg, dict) and qac_fields & set(sub_cfg.keys()): + QuantizerAttributeConfig.model_validate(sub_cfg) return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1630 - 1639, The validator validate_quant_cfg_entries currently only validates dict cfg values against QuantizerAttributeConfig but ignores list-form cfg (used by SequentialQuantizer); update validate_quant_cfg_entries to also handle when entry.get("cfg") is a list by iterating over its items and, for each item that is a dict and contains keys in QuantizerAttributeConfig.model_fields, call QuantizerAttributeConfig.model_validate(item) so each element is validated early; preserve existing behavior for non-dict/non-list cfgs and return v as before.
1554-1567: Replace assert with ValueError for clearer error messaging.The
assertat line 1556 will produce an unclearAssertionErrorif the legacynn.*format has unexpected structure. AValueErrorwith a descriptive message would be more helpful for users debugging config issues.Suggested improvement
def _dict_to_entry(key: str, value) -> QuantizerCfgEntry: if isinstance(key, str) and key.startswith("nn."): - assert isinstance(value, dict) and len(value) == 1 + if not isinstance(value, dict) or len(value) != 1: + raise ValueError( + f"Legacy nn.*-scoped entry expects a single-key dict value, got: {value!r}" + ) q_path, sub_cfg = next(iter(value.items()))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1554 - 1567, In _dict_to_entry, replace the brittle assert inside the branch that handles legacy "nn." keys with a ValueError that gives a clear message; specifically in the function _dict_to_entry where you check if key.startswith("nn."), validate that value is a dict with exactly one item and if not raise ValueError("invalid legacy nn.* quantizer entry: expected a dict with a single mapping from quantizer path to config, got: {repr(value)}") (include the offending value in the message) so callers get a descriptive error instead of an AssertionError.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can silently mis-index string
keys when an unnormalized dict is present; update the logic in the loop that
determines name/cfg (the code using entry["quantizer_path"] vs
entry[0]/entry[1]) to explicitly check types: if entry is a dict require the
"quantizer_path" key and use entry.get("cfg"), otherwise require entry to be a
sequence/tuple/list of length >= 2 before using entry[0] and entry[1]; if the
input is neither a valid dict nor a valid 2+ element sequence, raise a clear
ValueError (or log and skip) so _not_dynamic and downstream logic never index
into a string or malformed structure. Ensure you reference quant_cfg, entry,
name, cfg, and _not_dynamic when implementing the checks.
---
Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list inputs unchanged, so legacy top-level dict forms for quant_cfg
bypass normalization; update normalize_quant_cfg (the `@field_validator` for
"quant_cfg") to detect a dict input (e.g., mapping of pattern->entry) and
wrap/convert it into a one-item list (or otherwise translate into the list form
expected by normalize_quant_cfg_list) before calling normalize_quant_cfg_list,
then return the normalized list so downstream functions like need_calibration()
always receive a list of QuantizerCfgEntry dicts.
---
Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1630-1639: The validator validate_quant_cfg_entries currently only
validates dict cfg values against QuantizerAttributeConfig but ignores list-form
cfg (used by SequentialQuantizer); update validate_quant_cfg_entries to also
handle when entry.get("cfg") is a list by iterating over its items and, for each
item that is a dict and contains keys in QuantizerAttributeConfig.model_fields,
call QuantizerAttributeConfig.model_validate(item) so each element is validated
early; preserve existing behavior for non-dict/non-list cfgs and return v as
before.
- Around line 1554-1567: In _dict_to_entry, replace the brittle assert inside
the branch that handles legacy "nn." keys with a ValueError that gives a clear
message; specifically in the function _dict_to_entry where you check if
key.startswith("nn."), validate that value is a dict with exactly one item and
if not raise ValueError("invalid legacy nn.* quantizer entry: expected a dict
with a single mapping from quantizer path to config, got: {repr(value)}")
(include the offending value in the message) so callers get a descriptive error
instead of an AssertionError.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a4a13435-75c6-4c01-93a8-09afd75e09ad
📒 Files selected for processing (2)
modelopt/torch/quantization/config.pymodelopt/torch/quantization/utils/core_utils.py
| quant_cfg: list = config.get("quant_cfg") or [] | ||
| for entry in quant_cfg: | ||
| name = ( | ||
| entry["quantizer_path"] | ||
| if isinstance(entry, dict) and "quantizer_path" in entry | ||
| else entry[0] | ||
| ) | ||
| if isinstance(entry, dict) and "quantizer_path" in entry: | ||
| cfg = dict(entry.get("cfg") or {}) | ||
| if "enable" in entry: | ||
| cfg["enable"] = entry["enable"] | ||
| else: | ||
| cfg = entry[1] | ||
| if "weight_quantizer" in name: | ||
| # We don't calibrate weight quantizer | ||
| continue | ||
| # quantization like W4A8 has a list of weight quantizers | ||
| if isinstance(cfg, list): | ||
| for _config in cfg: | ||
| if _not_dynamic(_config): | ||
| print(f"{cfg}: True") | ||
| return True | ||
| elif _not_dynamic(cfg): | ||
| print(f"{cfg}: True") | ||
| elif isinstance(cfg, dict) and _not_dynamic(cfg): | ||
| return True |
There was a problem hiding this comment.
Fallback logic for tuple format may mask errors from unnormalized dict inputs.
The fallback to entry[0]/entry[1] (lines 1682, 1689) is intended for legacy tuple format support. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, and entry[0]/entry[1] would index into string characters (e.g., "*"[0] → "*"), leading to silent misbehavior rather than a clear error.
Consider adding explicit type checking or removing the tuple fallback if tuples are no longer a supported input format:
Suggested defensive check
quant_cfg: list = config.get("quant_cfg") or []
for entry in quant_cfg:
+ if not isinstance(entry, dict):
+ raise TypeError(f"Expected dict entry in quant_cfg, got {type(entry).__name__}: {entry!r}")
name = (
entry["quantizer_path"]
if isinstance(entry, dict) and "quantizer_path" in entry
else entry[0]
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop
over quant_cfg can silently mis-index string keys when an unnormalized dict is
present; update the logic in the loop that determines name/cfg (the code using
entry["quantizer_path"] vs entry[0]/entry[1]) to explicitly check types: if
entry is a dict require the "quantizer_path" key and use entry.get("cfg"),
otherwise require entry to be a sequence/tuple/list of length >= 2 before using
entry[0] and entry[1]; if the input is neither a valid dict nor a valid 2+
element sequence, raise a clear ValueError (or log and skip) so _not_dynamic and
downstream logic never index into a string or malformed structure. Ensure you
reference quant_cfg, entry, name, cfg, and _not_dynamic when implementing the
checks.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
modelopt/torch/quantization/config.py (2)
1677-1699:⚠️ Potential issue | 🟡 MinorFallback logic for tuple format may mask errors from unnormalized dict inputs.
The fallback to
entry[0]/entry[1](lines 1682, 1689) is intended for legacy tuple format. However, if a top-level dict bypasses the validator normalization, iterating over it yields string keys, andentry[0]/entry[1]would index into string characters (e.g.,"*weight_quantizer"[0]→"*"), causing silent misbehavior.Suggested defensive check
quant_cfg: list = config.get("quant_cfg") or [] for entry in quant_cfg: + if not isinstance(entry, (dict, tuple, list)): + raise TypeError(f"Expected dict or tuple entry in quant_cfg, got {type(entry).__name__}: {entry!r}") name = ( entry["quantizer_path"] if isinstance(entry, dict) and "quantizer_path" in entry else entry[0] )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1677 - 1699, The loop over quant_cfg can mis-index string keys when a dict slips through the legacy tuple/path fallback; update the logic in the quant_cfg iteration (around variables entry, name, cfg) to first check types explicitly: if entry is a dict use entry["quantizer_path"] and entry.get("cfg"), else if entry is a sequence (tuple/list) with length >= 2 use entry[0]/entry[1]; otherwise raise a clear ValueError or skip with a logged warning. Ensure the branch that checks "weight_quantizer" still uses the computed name and that _not_dynamic is only called with validated cfg objects.
1622-1628:⚠️ Potential issue | 🟠 MajorLegacy dict-style
quant_cfgis not normalized by the before-validator.The validator returns non-list inputs unchanged (line 1626-1627), so a legacy dict format like
{"*weight_quantizer": {"num_bits": 8}}bypassesnormalize_quant_cfg_list()entirely. If such a dict reaches downstream code expecting a list, it will fail.Since
normalize_quant_cfg_list()already supports converting legacy dict formats via_dict_to_entry(), the validator should route dicts to it:Suggested fix
`@field_validator`("quant_cfg", mode="before") `@classmethod` def normalize_quant_cfg(cls, v): """Normalize quant_cfg entries: convert dict and tuple forms to QuantizerCfgEntry dicts.""" - if not isinstance(v, list): - return v - return normalize_quant_cfg_list(v) + if isinstance(v, dict): + # Legacy top-level dict format: convert to list of single-key dicts + v = [{k: val} for k, val in v.items()] + if isinstance(v, list): + return normalize_quant_cfg_list(v) + return v🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 1622 - 1628, The before-validator normalize_quant_cfg currently returns non-list values unchanged, so legacy dict-style quant_cfg like {"*weight_quantizer": {"num_bits": 8}} bypasses normalization; update normalize_quant_cfg to detect dict inputs and pass them into normalize_quant_cfg_list (which uses _dict_to_entry) so all legacy dict formats are converted into the expected list of QuantizerCfgEntry objects—i.e., in normalize_quant_cfg (the `@field_validator` for "quant_cfg") call normalize_quant_cfg_list(v) for dicts as well as lists so downstream code always receives a normalized list.
🧹 Nitpick comments (1)
modelopt/torch/quantization/config.py (1)
162-168: TypedDicttotal=Falseconflicts with requiredquantizer_pathdocumentation.The docstring states
quantizer_pathis required, buttotal=Falsemakes all fields optional at the type level. Consider usingtyping.Requiredfromtyping_extensionsto markquantizer_pathas required while keeping other fields optional:-class QuantizerCfgEntry(TypedDict, total=False): +from typing_extensions import Required + +class QuantizerCfgEntry(TypedDict, total=False): """A single entry in a ``quant_cfg`` list.""" - quantizer_path: str # required; matched against quantizer module names + quantizer_path: Required[str] # required; matched against quantizer module names parent_class: str | None # optional; filters by pytorch module class name (e.g. "nn.Linear")This ensures static type checkers flag missing
quantizer_pathwhile allowing other fields to be omitted.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/config.py` around lines 162 - 168, QuantizerCfgEntry currently uses TypedDict(total=False) which makes all keys optional but the docstring says quantizer_path is required; update QuantizerCfgEntry to mark quantizer_path as required using typing_extensions.Required (or typing.Required on newer Python): change the TypedDict definition so quantizer_path: Required[str] while keeping parent_class, cfg, and enable as optional, and import Required from typing_extensions if not available in typing to ensure static type-checkers enforce the required key.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/utils/core_utils.py`:
- Around line 830-849: The function update_quant_cfg_with_kv_cache_quant can
raise a TypeError if quant_cfg.get("quant_cfg") returns a legacy dict instead of
a list; change it to defensively normalize inner by checking the type of
quant_cfg.get("quant_cfg") and converting a dict into a single-item list (or
treating None/Falsey as the existing default list), then set
quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg) and return the
deep-copied quant_cfg; update the logic around the inner variable in
update_quant_cfg_with_kv_cache_quant to handle list, dict, and None cases
explicitly.
---
Duplicate comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 1677-1699: The loop over quant_cfg can mis-index string keys when
a dict slips through the legacy tuple/path fallback; update the logic in the
quant_cfg iteration (around variables entry, name, cfg) to first check types
explicitly: if entry is a dict use entry["quantizer_path"] and entry.get("cfg"),
else if entry is a sequence (tuple/list) with length >= 2 use entry[0]/entry[1];
otherwise raise a clear ValueError or skip with a logged warning. Ensure the
branch that checks "weight_quantizer" still uses the computed name and that
_not_dynamic is only called with validated cfg objects.
- Around line 1622-1628: The before-validator normalize_quant_cfg currently
returns non-list values unchanged, so legacy dict-style quant_cfg like
{"*weight_quantizer": {"num_bits": 8}} bypasses normalization; update
normalize_quant_cfg to detect dict inputs and pass them into
normalize_quant_cfg_list (which uses _dict_to_entry) so all legacy dict formats
are converted into the expected list of QuantizerCfgEntry objects—i.e., in
normalize_quant_cfg (the `@field_validator` for "quant_cfg") call
normalize_quant_cfg_list(v) for dicts as well as lists so downstream code always
receives a normalized list.
---
Nitpick comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 162-168: QuantizerCfgEntry currently uses TypedDict(total=False)
which makes all keys optional but the docstring says quantizer_path is required;
update QuantizerCfgEntry to mark quantizer_path as required using
typing_extensions.Required (or typing.Required on newer Python): change the
TypedDict definition so quantizer_path: Required[str] while keeping
parent_class, cfg, and enable as optional, and import Required from
typing_extensions if not available in typing to ensure static type-checkers
enforce the required key.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: ea0ca383-fd43-415d-8bef-252ce66fbfc5
📒 Files selected for processing (2)
modelopt/torch/quantization/config.pymodelopt/torch/quantization/utils/core_utils.py
| def update_quant_cfg_with_kv_cache_quant( | ||
| quant_cfg: dict[str, Any], kv_cache_quant_cfg: dict[str, Any] | ||
| quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry] | ||
| ) -> dict[str, Any]: | ||
| """Update the quant_cfg with the kv cache quant_cfg.""" | ||
| """Update the quant_cfg with the kv cache quant_cfg. | ||
|
|
||
| Args: | ||
| quant_cfg: The outer quantization config dict (with ``"quant_cfg"`` and ``"algorithm"`` keys). | ||
| kv_cache_quant_cfg: A list of :class:`QuantizerCfgEntry | ||
| <modelopt.torch.quantization.config.QuantizerCfgEntry>` dicts for KV cache quantization, | ||
| typically ``some_kv_cfg["quant_cfg"]``. | ||
|
|
||
| Returns: | ||
| A deep copy of ``quant_cfg`` with the KV cache entries appended to ``quant_cfg["quant_cfg"]``. | ||
| """ | ||
| # If quant_cfg["quant_cfg"] is None, it corresponds to only kv cache quantization case | ||
| quant_cfg = copy.deepcopy(quant_cfg) | ||
| quant_cfg["quant_cfg"] = quant_cfg.get("quant_cfg") or {"default": {"enable": False}} | ||
| quant_cfg["quant_cfg"].update(kv_cache_quant_cfg) | ||
| inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [ | ||
| {"quantizer_path": "*", "enable": False} | ||
| ] | ||
| quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg) |
There was a problem hiding this comment.
Potential issue if quant_cfg["quant_cfg"] is a legacy dict format.
The function assumes quant_cfg.get("quant_cfg") returns a list (or falsey value). If a caller passes an unnormalized config where quant_cfg["quant_cfg"] is still a legacy dict, line 849 would raise TypeError when attempting dict + list.
Consider normalizing the inner config or adding a type check:
Suggested defensive fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list
+
def update_quant_cfg_with_kv_cache_quant(
quant_cfg: dict[str, Any], kv_cache_quant_cfg: list[QuantizerCfgEntry]
) -> dict[str, Any]:
...
quant_cfg = copy.deepcopy(quant_cfg)
- inner: list[QuantizerCfgEntry] = quant_cfg.get("quant_cfg") or [
- {"quantizer_path": "*", "enable": False}
- ]
+ raw_inner = quant_cfg.get("quant_cfg")
+ if raw_inner is None:
+ inner = [{"quantizer_path": "*", "enable": False}]
+ elif isinstance(raw_inner, dict):
+ # Legacy dict format - convert to list
+ inner = normalize_quant_cfg_list([{k: v} for k, v in raw_inner.items()])
+ else:
+ inner = list(raw_inner)
quant_cfg["quant_cfg"] = inner + list(kv_cache_quant_cfg)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/utils/core_utils.py` around lines 830 - 849, The
function update_quant_cfg_with_kv_cache_quant can raise a TypeError if
quant_cfg.get("quant_cfg") returns a legacy dict instead of a list; change it to
defensively normalize inner by checking the type of quant_cfg.get("quant_cfg")
and converting a dict into a single-item list (or treating None/Falsey as the
existing default list), then set quant_cfg["quant_cfg"] = inner +
list(kv_cache_quant_cfg) and return the deep-copied quant_cfg; update the logic
around the inner variable in update_quant_cfg_with_kv_cache_quant to handle
list, dict, and None cases explicitly.
|
| # 1. Deny all quantizers by default | ||
| {"quantizer_path": "*", "enable": False}, | ||
|
|
||
| # 2. Enable and configure the target quantizers | ||
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}}, | ||
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}}, |
There was a problem hiding this comment.
| # 1. Deny all quantizers by default | |
| {"quantizer_path": "*", "enable": False}, | |
| # 2. Enable and configure the target quantizers | |
| {"quantizer_path": "*weight_quantizer", "cfg": {"num_bits": 8, "axis": 0}}, | |
| {"quantizer_path": "*input_quantizer", "cfg": {"num_bits": 8, "axis": None}}, | |
| # 1. Deny all quantizers by default | |
| {"*", {"enable": False}}, | |
| # 2. Enable and configure the target quantizers | |
| {"*weight_quantizer": {"num_bits": 8, "axis": 0}}, | |
| {"*input_quantizer": {"num_bits": 8, "axis": None}}, |
There was a problem hiding this comment.
Why do we need the following fields "quantizer_path", "cfg"
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/gpu/torch/quantization/test_real_quantize_cuda.py`:
- Around line 50-56: The loop over config["quant_cfg"] that looks for
entry.get("quantizer_path") == "*weight_quantizer" must fail fast if no such
entry is found and must validate the shape of entry["cfg"]["block_sizes"];
update the code that currently sets entry.setdefault("cfg", {})["block_sizes"] =
{...} to instead: scan config["quant_cfg"] and set a found flag when the
matching entry is located, then after the loop call pytest.fail or raise
AssertionError if not found; for the found entry, ensure entry.setdefault("cfg",
{}) contains a "block_sizes" dict and validate that it has the required keys
(e.g., -1 and "scale_bits") and that their values are integers (or otherwise of
expected type/shape), raising a test failure if the validation fails; apply the
same fail-fast + validation change to the other similar loop in the file that
handles the weight_quantizer (the block around the second occurrence).
- Around line 50-56: The test mutates shared quant config constants (e.g.,
mtq.INT4_AWQ_CFG) by updating entries in config["quant_cfg"], which can leak
state between tests; to fix, make a defensive deep copy of the config before
modifying it (e.g., copy.deepcopy(config) or deepcopy the specific
config["quant_cfg"] list) and perform the block_sizes modification on that copy
so the original mtq.INT4_AWQ_CFG remains unchanged; locate the code that
iterates over config["quant_cfg"] and change it to operate on the copied object
(references: config, config["quant_cfg"], entry, mtq.INT4_AWQ_CFG).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 14610ec4-1ae7-433b-b1df-66856f70d771
📒 Files selected for processing (1)
tests/gpu/torch/quantization/test_real_quantize_cuda.py
| for entry in config["quant_cfg"]: | ||
| if entry.get("quantizer_path") == "*weight_quantizer": | ||
| entry.setdefault("cfg", {})["block_sizes"] = { | ||
| -1: 16, | ||
| "scale_bits": 8, | ||
| } | ||
| break |
There was a problem hiding this comment.
Fail fast if *weight_quantizer entry is missing (and validate cfg shape).
The current loop silently does nothing when the entry is absent, which can hide config-schema regressions in tests.
Suggested fix
for entry in config["quant_cfg"]:
if entry.get("quantizer_path") == "*weight_quantizer":
- entry.setdefault("cfg", {})["block_sizes"] = {
+ cfg = entry.setdefault("cfg", {})
+ if not isinstance(cfg, dict):
+ pytest.fail("Expected dict cfg for *weight_quantizer in INT4_AWQ_CFG")
+ cfg["block_sizes"] = {
"scale_bits": 8,
}
break
+ else:
+ pytest.fail("INT4_AWQ_CFG missing *weight_quantizer entry")Also applies to: 107-113
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/gpu/torch/quantization/test_real_quantize_cuda.py` around lines 50 -
56, The loop over config["quant_cfg"] that looks for entry.get("quantizer_path")
== "*weight_quantizer" must fail fast if no such entry is found and must
validate the shape of entry["cfg"]["block_sizes"]; update the code that
currently sets entry.setdefault("cfg", {})["block_sizes"] = {...} to instead:
scan config["quant_cfg"] and set a found flag when the matching entry is
located, then after the loop call pytest.fail or raise AssertionError if not
found; for the found entry, ensure entry.setdefault("cfg", {}) contains a
"block_sizes" dict and validate that it has the required keys (e.g., -1 and
"scale_bits") and that their values are integers (or otherwise of expected
type/shape), raising a test failure if the validation fails; apply the same
fail-fast + validation change to the other similar loop in the file that handles
the weight_quantizer (the block around the second occurrence).
Avoid mutating shared quant config constants in-place across tests.
These blocks mutate mtq.INT4_AWQ_CFG through config, which can leak state to later tests and make runs order-dependent.
Suggested fix
+import copy
@@
def test_real_quantize(model_cls, config):
"""Test quantize function can run without problems."""
+ config = copy.deepcopy(config)
@@
def test_save_restore(model_cls, config):
+ config = copy.deepcopy(config)
# update config to fit test casesAlso applies to: 107-113
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/gpu/torch/quantization/test_real_quantize_cuda.py` around lines 50 -
56, The test mutates shared quant config constants (e.g., mtq.INT4_AWQ_CFG) by
updating entries in config["quant_cfg"], which can leak state between tests; to
fix, make a defensive deep copy of the config before modifying it (e.g.,
copy.deepcopy(config) or deepcopy the specific config["quant_cfg"] list) and
perform the block_sizes modification on that copy so the original
mtq.INT4_AWQ_CFG remains unchanged; locate the code that iterates over
config["quant_cfg"] and change it to operate on the copied object (references:
config, config["quant_cfg"], entry, mtq.INT4_AWQ_CFG).
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
examples/llm_ptq/example_utils.py (1)
206-247:⚠️ Potential issue | 🟠 MajorNormalize before editing
quant_cfg, and don't assume the first weight rule wins.
build_quant_cfg()now mutatesquant_cfg["quant_cfg"]before theQuantizeConfigvalidator can normalize legacy dict-style configs, and the AWQ branch grabs the first*weight_quantizerentry even though list configs are ordered last-wins. That means old configs can fail here, and appended weight overrides can be silently ignored.🛠️ Suggested fix
+from modelopt.torch.quantization.config import normalize_quant_cfg_list ... def build_quant_cfg( qformat, quant_cfg, awq_block_size, model_type, moe_calib_experts_ratio: float | None = None, ) -> dict[str, Any]: quant_cfg = copy.deepcopy(quant_cfg) + quant_cfg["quant_cfg"] = normalize_quant_cfg_list(quant_cfg.get("quant_cfg", [])) + if "awq" in str(quant_cfg.get("algorithm")): weight_quantizer_entry = next( e - for e in quant_cfg["quant_cfg"] - if isinstance(e, dict) and e.get("quantizer_path") == "*weight_quantizer" + for e in reversed(quant_cfg["quant_cfg"]) + if e.get("quantizer_path") == "*weight_quantizer" and e.get("cfg") is not None ) weight_quantizer = weight_quantizer_entry.get("cfg", {})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@examples/llm_ptq/example_utils.py` around lines 206 - 247, In build_quant_cfg, avoid mutating quant_cfg["quant_cfg"] before it is normalized and stop assuming the first "*weight_quantizer" rule wins: first run the normalization/validation step used elsewhere (i.e., the QuantizeConfig or whatever normalizer the codebase provides) on quant_cfg so legacy dict-style entries are canonicalized, then when handling the AWQ branch find the effective weight_quantizer by scanning quant_cfg["quant_cfg"] in reverse (or otherwise selecting the last matching entry) to respect last-wins semantics; update that entry's cfg (and apply awq_block_size to weight_quantizer["block_sizes"][-1]) and only then adjust quant_cfg["algorithm"] for AWQ or other model-specific tweaks.modelopt/torch/quantization/conversion.py (1)
423-454:⚠️ Potential issue | 🟠 MajorEnforce the no-sequential-list contract in this context manager.
The docstring says
cfg: [...]entries are unsupported here, but the function no longer rejects them. If one gets through,set_quantizer_by_cfg()can replacefoo.quantizerwith aSequentialQuantizerwhile the exit path only snapshots/restoresTensorQuantizernames, so restoration can miss the swap or fail on new child names likefoo.quantizer.0.🛠️ Suggested fix
def set_quantizer_by_cfg_context(quant_model: nn.Module, quant_cfg: QuantizeQuantCfgType): """Context manager that temporarily applies a quantization config and restores the original state on exit. @@ """ quant_cfg = normalize_quant_cfg_list(quant_cfg) + if any(isinstance(entry["cfg"], list) for entry in quant_cfg if entry["cfg"] is not None): + raise ValueError( + "set_quantizer_by_cfg_context() does not support SequentialQuantizer cfg lists." + ) original_attributes = {}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/conversion.py` around lines 423 - 454, The context manager set_quantizer_by_cfg_context must reject sequential cfg lists before mutating the model: validate quant_cfg (after normalize_quant_cfg_list or on the original input) and if any entry is a list/sequence (i.e., a sequential cfg that would produce a SequentialQuantizer) raise a ValueError with a clear message; perform this check prior to calling set_quantizer_by_cfg so set_quantizer_by_cfg_context never allows a config that could swap in a SequentialQuantizer and break the restore logic which only looks for TensorQuantizer instances.
♻️ Duplicate comments (2)
docs/source/guides/_pytorch_quantization.rst (1)
258-272:⚠️ Potential issue | 🟡 MinorMake this example self-contained.
_default_disabled_quantizer_cfgis still undefined in this snippet, so the block fails when copied verbatim. Please qualify/import it or inline those exclusions instead of referencing the bare private helper.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/source/guides/_pytorch_quantization.rst` around lines 258 - 272, The example references an undefined symbol _default_disabled_quantizer_cfg causing copy-paste failures; fix by either importing or inlining its entries: update the snippet that defines MY_CUSTOM_CONFIG to remove the bare reference and instead import the helper (e.g., from the module that exports _default_disabled_quantizer_cfg) or expand its contents directly into the "quant_cfg" list so the configuration is self-contained; ensure the change is applied to the MY_CUSTOM_CONFIG definition and keep the existing entries for "*", "*weight_quantizer", and "*input_quantizer" intact.modelopt/torch/quantization/conversion.py (1)
340-356:⚠️ Potential issue | 🔴 CriticalFull replacements still don't swap incompatible quantizer shapes.
The list branch only warns when
len(attributes) != len(module), and the scalar branch mutates an existingSequentialQuantizerin place. Both cases leave stale sub-quantizers around instead of honoring the advertised “cfg is a complete replacement” behavior.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@modelopt/torch/quantization/conversion.py` around lines 340 - 356, The loop handling quantizer replacements currently mutates existing SequentialQuantizer instances or only warns on length mismatches, leaving stale sub-quantizers; change it to perform full replacements: when _match_quantizer(...) finds a target and attributes is a list, always construct a new SequentialQuantizer(*(TensorQuantizer() for _ in range(len(attributes)))) and replace the submodule via quant_model.get_submodule(...)/setattr before calling set_from_attribute_config on the new module (don’t just warn when len(attributes) != len(module)); when attributes is not a list and the existing module is a SequentialQuantizer, replace it with a fresh TensorQuantizer() in the parent and then call set_from_attribute_config on that new TensorQuantizer instead of mutating the old SequentialQuantizer in place; use the same name resolution logic already present (name.rpartition / name.split) to locate parent and attribute name.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/torch/quantization/conversion.py`:
- Around line 258-263: The current truthy check treats an explicit empty config
({}) the same as omitted config, so change the condition to detect only omitted
configs by replacing the "if not cfg" logic with an explicit None check (e.g.,
"if cfg is None") so that normalize_quant_cfg_list()'s semantics are respected;
this ensures that when cfg == {} the else branch runs and
set_quantizer_attributes_partial/other reset logic for quant_model,
quantizer_path and parent_class will execute to reset attributes rather than
only toggling "enable".
---
Outside diff comments:
In `@examples/llm_ptq/example_utils.py`:
- Around line 206-247: In build_quant_cfg, avoid mutating quant_cfg["quant_cfg"]
before it is normalized and stop assuming the first "*weight_quantizer" rule
wins: first run the normalization/validation step used elsewhere (i.e., the
QuantizeConfig or whatever normalizer the codebase provides) on quant_cfg so
legacy dict-style entries are canonicalized, then when handling the AWQ branch
find the effective weight_quantizer by scanning quant_cfg["quant_cfg"] in
reverse (or otherwise selecting the last matching entry) to respect last-wins
semantics; update that entry's cfg (and apply awq_block_size to
weight_quantizer["block_sizes"][-1]) and only then adjust quant_cfg["algorithm"]
for AWQ or other model-specific tweaks.
In `@modelopt/torch/quantization/conversion.py`:
- Around line 423-454: The context manager set_quantizer_by_cfg_context must
reject sequential cfg lists before mutating the model: validate quant_cfg (after
normalize_quant_cfg_list or on the original input) and if any entry is a
list/sequence (i.e., a sequential cfg that would produce a SequentialQuantizer)
raise a ValueError with a clear message; perform this check prior to calling
set_quantizer_by_cfg so set_quantizer_by_cfg_context never allows a config that
could swap in a SequentialQuantizer and break the restore logic which only looks
for TensorQuantizer instances.
---
Duplicate comments:
In `@docs/source/guides/_pytorch_quantization.rst`:
- Around line 258-272: The example references an undefined symbol
_default_disabled_quantizer_cfg causing copy-paste failures; fix by either
importing or inlining its entries: update the snippet that defines
MY_CUSTOM_CONFIG to remove the bare reference and instead import the helper
(e.g., from the module that exports _default_disabled_quantizer_cfg) or expand
its contents directly into the "quant_cfg" list so the configuration is
self-contained; ensure the change is applied to the MY_CUSTOM_CONFIG definition
and keep the existing entries for "*", "*weight_quantizer", and
"*input_quantizer" intact.
In `@modelopt/torch/quantization/conversion.py`:
- Around line 340-356: The loop handling quantizer replacements currently
mutates existing SequentialQuantizer instances or only warns on length
mismatches, leaving stale sub-quantizers; change it to perform full
replacements: when _match_quantizer(...) finds a target and attributes is a
list, always construct a new SequentialQuantizer(*(TensorQuantizer() for _ in
range(len(attributes)))) and replace the submodule via
quant_model.get_submodule(...)/setattr before calling set_from_attribute_config
on the new module (don’t just warn when len(attributes) != len(module)); when
attributes is not a list and the existing module is a SequentialQuantizer,
replace it with a fresh TensorQuantizer() in the parent and then call
set_from_attribute_config on that new TensorQuantizer instead of mutating the
old SequentialQuantizer in place; use the same name resolution logic already
present (name.rpartition / name.split) to locate parent and attribute name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 53589e40-16b3-46f6-8ff4-de3dd80a9420
📒 Files selected for processing (5)
docs/source/guides/_pytorch_quantization.rstexamples/llm_ptq/example_utils.pymodelopt/torch/export/unified_export_hf.pymodelopt/torch/quantization/conversion.pytests/unit/torch/quantization/plugins/test_peft.py
🚧 Files skipped from review as they are similar to previous changes (2)
- tests/unit/torch/quantization/plugins/test_peft.py
- modelopt/torch/export/unified_export_hf.py
| if not cfg: | ||
| # No cfg: only toggle the enable state, leave all other attributes unchanged. | ||
| set_quantizer_attributes_partial( | ||
| quant_model, quantizer_path, {"enable": enable}, parent_class | ||
| ) | ||
| for sub_pattern, sub_cfg in cfg.items(): | ||
| set_quantizer_attribute(quant_model, sub_pattern, sub_cfg, parent_class) | ||
| continue | ||
| set_quantizer_attribute(quant_model, pattern, cfg) | ||
| else: |
There was a problem hiding this comment.
Treat cfg={} as a full reset, not as “no cfg”.
normalize_quant_cfg_list() already uses None to mean “cfg omitted”. The truthiness check here collapses an explicit empty config into the enable-only path, so {"cfg": {}} preserves old attributes instead of resetting them to defaults.
🛠️ Suggested fix
- if not cfg:
+ if cfg is None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/torch/quantization/conversion.py` around lines 258 - 263, The
current truthy check treats an explicit empty config ({}) the same as omitted
config, so change the condition to detect only omitted configs by replacing the
"if not cfg" logic with an explicit None check (e.g., "if cfg is None") so that
normalize_quant_cfg_list()'s semantics are respected; this ensures that when cfg
== {} the else branch runs and set_quantizer_attributes_partial/other reset
logic for quant_model, quantizer_path and parent_class will execute to reset
attributes rather than only toggling "enable".
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
Signed-off-by: Shengliang Xu <shengliangx@nvidia.com>
What does this PR do?
Summary
Redesigns the
quant_cfgconfiguration format in ModelOpt's PyTorch quantization stack, replacing the previous dict-based format with an ordered list of typedQuantizerCfgEntrydicts.Motivation
The old
quant_cfgdict had several pain points:"default"key: an implicit, undocumented catch-all that was easy to misuseNew format
quant_cfgis now an ordered list ofQuantizerCfgEntryTypedDicts. Each entry has:quantizer_path(required):fnmatchwildcard matched against quantizer module namescfg(optional): dict (or list of dicts) ofQuantizerAttributeConfigfieldsenable(optional): toggles quantizer on/off independently ofcfgparent_class(optional): restricts match to quantizers inside the given PyTorch module classEntries are applied in list order; later entries override earlier ones. The canonical pattern is deny-all first (
_base_disable_all), then selectively re-enable, then apply standard exclusions (_default_disabled_quantizer_cfg).Changes
config.py:QuantizerCfgEntryTypedDict andnormalize_quant_cfg_list()added;_default_disabled_quantizer_cfg/_mamba_moe_disabled_quantizer_cfgconverted from dicts to lists; all built-in configs (INT8_DEFAULT_CFG,FP8_DEFAULT_CFG,NVFP4_DEFAULT_CFG, etc.) converted to list format; twoQuantizeConfigfield validators added for normalization and early validationconversion.py:set_quantizer_by_cfgrewritten to iterate the list directly;set_quantizer_attributes_fullandset_quantizer_attributes_partialdistinguished;parent_classresolution uses new_DMRegistryCls.get_nn_class()against the originalnn.Moduleclassdynamic.py:_DMRegistryCls.get_nn_class()added to resolve string class names to their originalnn.Modulesubclass forisinstancecheckstensor_quantizer.py:set_from_attribute_configsignature narrowed toQuantizerAttributeConfig; usesmodel_dump()(notexclude_unset) to enforce entry atomicity — unset fields reset to defaults rather than inheriting from prior entriesmodelopt_recipes/general/ptq/*.ymlfiles updated to the new list formatdocs/source/guides/_quant_cfg.rstcovering entry format, ordering, atomicity, and common patternsBackward compatibility
normalize_quant_cfg_list()is called automatically by theQuantizeConfigPydantic validator, so existing code using the old single-key dict format ({"*weight_quantizer": {"num_bits": 8}}) continues to work without modification.Test coverage
Additional Information
Summary by CodeRabbit
New Features
Documentation
Refactor
Tests