[BugFix][5271237][ONNX] Add Q/DQ placement for Conv->LayerNorm patterns#1117
[BugFix][5271237][ONNX] Add Q/DQ placement for Conv->LayerNorm patterns#1117
Conversation
📝 WalkthroughWalkthroughIdentify LayerNormalization nodes whose inputs originate from Conv partitions (possibly via Transpose/Reshape/Copy), include them among quantization targets, register LayerNormalization in the QDQ registry, and add a test model + unit test to validate Conv→LayerNorm Q/DQ placement. Changes
Sequence Diagram(s)sequenceDiagram
participant Graph as ONNX Graph
participant Scanner as Conv→LayerNorm Scanner
participant Quantizer as Quantization Engine
participant QDQReg as QDQ Registry
Graph->>Scanner: enumerate LayerNormalization nodes
Scanner->>Graph: trace input producers backwards (through Transpose/Reshape/Copy)
Graph-->>Scanner: report whether producer originates from CASK partition
Scanner->>Quantizer: return matched LayerNormalization nodes
Quantizer->>QDQReg: query handler for LayerNormalization
QDQReg-->>Quantizer: mapped to QDQNormalization
Quantizer->>Graph: mark/instrument Conv and LayerNorm activation paths with Q/DQ
Graph-->>Quantizer: return quantized model
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/quantization/graph_utils.py`:
- Around line 345-350: The code currently only adds outputs from the last_node
of each partition to cask_output_tensor_names which misses outputs produced by
earlier or branching nodes; update the loop over cask_fusible_partitions (and
the inner loop referencing last_node) to iterate over every node in each
partition (e.g., for node in partition) and collect node.outputs' names into
cask_output_tensor_names so all partition outputs are captured for
Conv→LayerNorm matching.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 763a51bf-433e-4e78-bdb4-d0e2466fae44
📒 Files selected for processing (5)
modelopt/onnx/quantization/graph_utils.pymodelopt/onnx/quantization/int8.pymodelopt/onnx/quantization/ort_utils.pytests/_test_utils/onnx/lib_test_models.pytests/unit/onnx/quantization/test_qdq_rules_int8.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1117 +/- ##
==========================================
+ Coverage 70.21% 70.26% +0.05%
==========================================
Files 228 228
Lines 25952 25970 +18
==========================================
+ Hits 18221 18247 +26
+ Misses 7731 7723 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Review: Correctness & Simplicity
Overall the PR is well-motivated and the approach is sound. A few suggestions:
1. Deduplicate backward traversal logic
_is_input_from_cask_partition (new) does essentially the same backward-through-copy-ops traversal as the existing _is_following_cask_partition (line ~340), but with a different interface (tensors + output tensor name set vs. nodes + node name set).
Consider reusing _is_following_cask_partition directly. For each LayerNormalization node, get the producer of inputs[0] and call _is_following_cask_partition(producer). This would:
- Eliminate
_is_input_from_cask_partitionentirely - Remove the
cask_output_tensor_namesset construction - Avoid the assumption that
partition[-1]is the output node (see point 2)
for node in graph.nodes:
if node.op \!= "LayerNormalization":
continue
inp_tensor = node.inputs[0]
if inp_tensor.inputs:
producer = inp_tensor.inputs[0]
if _is_following_cask_partition(producer):
conv_to_ln_nodes.append(node)2. partition[-1] assumption
last_node = partition[-1]
for output_tensor in last_node.outputs:
cask_output_tensor_names.add(output_tensor.name)This assumes partitions are ordered with the output node last. If a CASK partition ever has different ordering or multiple output nodes not at the end, this would miss patterns. Reusing _is_following_cask_partition (which checks all nodes via cask_partition_nodes) avoids this assumption entirely.
3. QDQNormalization for LayerNorm — input count difference
BatchNormalization has 5 inputs while LayerNormalization has 3. Worth confirming that QDQNormalization only quantizes input[0] (activation) and doesn't attempt to access inputs that don't exist for LayerNorm.
4. Minor: recursive traversal without depth limit
_is_input_from_cask_partition recurses through all copy ops without a bound. The existing _is_following_cask_partition has the same issue, but a max-depth guard (e.g., 10) would be cheap insurance against pathological graphs for both.
|
|
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>
45cf99f to
523da9d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@modelopt/onnx/quantization/graph_utils.py`:
- Around line 346-353: The helper _is_following_cask_partition incorrectly
requires every parent of a copy op to lead to a CASK partition; instead, when
encountering copy ops (e.g., Reshape, Squeeze, Unsqueeze — the copy/shape-only
ops used in activation path), only follow the data input edge (the activation
input, typically input index 0) rather than all parents. Update the logic where
get_parent_nodes(node) is used in _is_following_cask_partition so that for
parents whose op_type is a copy op you recurse only on the parent node that
supplies the data input (e.g., obtain the input node at index 0 and recurse on
that), while leaving non-copy parents unchanged; this will also fix callers such
as filter_quantizable_kgen_heads that rely on this helper.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 486fadf3-a592-4f69-9c82-9ea690cc2d17
📒 Files selected for processing (5)
modelopt/onnx/quantization/graph_utils.pymodelopt/onnx/quantization/int8.pymodelopt/onnx/quantization/ort_utils.pytests/_test_utils/onnx/lib_test_models.pytests/unit/onnx/quantization/test_qdq_rules_int8.py
🚧 Files skipped from review as they are similar to previous changes (3)
- modelopt/onnx/quantization/int8.py
- tests/unit/onnx/quantization/test_qdq_rules_int8.py
- tests/_test_utils/onnx/lib_test_models.py
| parent_nodes = get_parent_nodes(node) | ||
| if len(parent_nodes) == 0: | ||
| return False | ||
|
|
||
| return all( | ||
| _is_following_cask_partition(parent, cask_partition_nodes, max_depth - 1) | ||
| for parent in parent_nodes | ||
| ) |
There was a problem hiding this comment.
Follow only the data edge through copy ops.
Line 350 currently requires every parent of a copy op to trace back to a CASK partition. That rejects valid Conv -> Reshape/Squeeze/Unsqueeze -> LayerNormalization chains when the shape/axes input is dynamic, because those auxiliary inputs are not part of the activation path. The same false negative now affects filter_quantizable_kgen_heads() too, since it shares this helper.
💡 Proposed fix
- parent_nodes = get_parent_nodes(node)
- if len(parent_nodes) == 0:
+ if not node.inputs or not node.inputs[0].inputs:
return False
- return all(
- _is_following_cask_partition(parent, cask_partition_nodes, max_depth - 1)
- for parent in parent_nodes
- )
+ data_parent = node.inputs[0].inputs[0]
+ return _is_following_cask_partition(data_parent, cask_partition_nodes, max_depth - 1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@modelopt/onnx/quantization/graph_utils.py` around lines 346 - 353, The helper
_is_following_cask_partition incorrectly requires every parent of a copy op to
lead to a CASK partition; instead, when encountering copy ops (e.g., Reshape,
Squeeze, Unsqueeze — the copy/shape-only ops used in activation path), only
follow the data input edge (the activation input, typically input index 0)
rather than all parents. Update the logic where get_parent_nodes(node) is used
in _is_following_cask_partition so that for parents whose op_type is a copy op
you recurse only on the parent node that supplies the data input (e.g., obtain
the input node at index 0 and recurse on that), while leaving non-copy parents
unchanged; this will also fix callers such as filter_quantizable_kgen_heads that
rely on this helper.
cjluo-nv
left a comment
There was a problem hiding this comment.
All review feedback has been addressed:
- Backward traversal logic deduplicated into a single
_is_following_cask_partitionat module level partition[-1]assumption removed — now checks all partition nodes- Recursion depth limit added (
max_depth=10) - Confirmed
QDQNormalizationis safe for LayerNorm (LayerNormalizationis inis_normalization_op, and onlyinput[0]/output[0]are quantized)
LGTM.
What does this PR do?
Type of change: BugFix
Changes:
Usage
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).Additional Information
Summary by CodeRabbit
New Features
Refactor
Tests