[BugFix][5271237][ONNX] Add Q/DQ placement for Conv->LayerNorm patterns by ajrasane · Pull Request #1117 · NVIDIA/Model-Optimizer

ajrasane · 2026-03-24T23:55:14Z

What does this PR do?

Type of change: BugFix

ModelOpt was not placing Q/DQ nodes between Conv and LayerNormalization, causing TensorRT to select slower i8f16 kernels instead of faster i8i8 kernels for Conv layers whose output feeds into LayerNorm (e.g., ConvNext models).

Changes:

Register LayerNormalization in ORT's QDQ registry (ort_utils.py)
Add find_conv_to_layernorm_nodes() to detect Conv→(copy ops)→LayerNorm patterns (graph_utils.py)
Add detected LayerNorm nodes to quantizable nodes list so Q/DQ pairs are inserted on Conv output (int8.py)

Usage

  # No API change — existing quantize() call now automatically handles Conv->LayerNorm patterns
  import modelopt.onnx.quantization as moq

  moq.quantize("convnext.onnx", quantize_mode="int8")
  # Output model will now have: Conv -> Transpose -> Q -> DQ -> LayerNorm

Testing

Added unit test test_conv_layernorm_quantization with a ConvNext-like test model (Conv→Transpose→LayerNorm→Transpose→Conv)
All 237 unit tests pass (236 existing + 1 new)
Validated on ConvNext-tiny (opset 17): 23/23 LayerNorm nodes get Q/DQ on activation input
Built TRT engine in nvcr.io/nvidia/tensorrt:26.02-py3 — Conv layers after LayerNorm select i8i8 kernels

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
Did you write any new necessary tests?: ✅
Did you update Changelog?: N/A

Additional Information

NVBug: 5271237
JIRA: OMNIML-2380
Note: The residual Add output quantization discussed in comments What is the difference between nvidia-ammo and TensorRT-Model-Optimizer ? #4-9 of the bug is not addressed here and will be handled in a separate PR.

Summary by CodeRabbit

New Features
- Expanded quantization to include LayerNormalization in Q/DQ flows and detect Conv→LayerNormalization patterns for quantization.
Refactor
- Improved internal graph-analysis helpers to make detection logic reusable and more reliable.
Tests
- Added a model builder for Conv→LayerNorm graphs and a unit test validating end-to-end Conv→LayerNormalization quantization.

coderabbitai · 2026-03-24T23:55:35Z

📝 Walkthrough

Walkthrough

Identify LayerNormalization nodes whose inputs originate from Conv partitions (possibly via Transpose/Reshape/Copy), include them among quantization targets, register LayerNormalization in the QDQ registry, and add a test model + unit test to validate Conv→LayerNorm Q/DQ placement.

Changes

Cohort / File(s)	Summary
Core Quantization Logic `modelopt/onnx/quantization/graph_utils.py`, `modelopt/onnx/quantization/int8.py`, `modelopt/onnx/quantization/ort_utils.py`	Added module-level `_is_following_cask_partition(node, cask_partition_nodes, max_depth=10)` and `find_conv_to_layernorm_nodes(...)` to detect LayerNormalization nodes fed (possibly through copy-like ops) from CASK partitions; updated `_find_nodes_to_quantize()` to include these nodes in `quantizable_nodes`; registered `LayerNormalization` in `QDQRegistry` mapped to `QDQNormalization`.
Test Infrastructure `tests/_test_utils/onnx/lib_test_models.py`	Added `build_conv_layernorm_model()` that constructs an ONNX model (Conv → Transpose → LayerNormalization → Transpose → Conv), sets opset/ir versions, runs shape inference and model checking, and returns a ready `onnx.ModelProto`.
Unit Tests `tests/unit/onnx/quantization/test_qdq_rules_int8.py`	Added `test_conv_layernorm_quantization()` which uses the new model builder, runs quantization, asserts the `.quant.onnx` output exists, and verifies Conv inputs and the LayerNormalization activation input are produced by `DequantizeLinear` nodes.

Sequence Diagram(s)

sequenceDiagram
    participant Graph as ONNX Graph
    participant Scanner as Conv→LayerNorm Scanner
    participant Quantizer as Quantization Engine
    participant QDQReg as QDQ Registry

    Graph->>Scanner: enumerate LayerNormalization nodes
    Scanner->>Graph: trace input producers backwards (through Transpose/Reshape/Copy)
    Graph-->>Scanner: report whether producer originates from CASK partition
    Scanner->>Quantizer: return matched LayerNormalization nodes
    Quantizer->>QDQReg: query handler for LayerNormalization
    QDQReg-->>Quantizer: mapped to QDQNormalization
    Quantizer->>Graph: mark/instrument Conv and LayerNorm activation paths with Q/DQ
    Graph-->>Quantizer: return quantized model

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main change: adding Q/DQ placement for Conv->LayerNorm patterns, which is the core functionality added across multiple files.
Security Anti-Patterns	✅ Passed	PR introduces Conv→LayerNorm quantization handling with no security anti-patterns, unsafe function calls, hardcoded credentials, or problematic dependencies.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch ajrasane/conv_layernorm

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-24T23:59:02Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1117/
Built to branch `gh-pages` at 2026-03-25 19:28 UTC. Preview will be ready when the GitHub Pages deployment is complete.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/graph_utils.py`:
- Around line 345-350: The code currently only adds outputs from the last_node
of each partition to cask_output_tensor_names which misses outputs produced by
earlier or branching nodes; update the loop over cask_fusible_partitions (and
the inner loop referencing last_node) to iterate over every node in each
partition (e.g., for node in partition) and collect node.outputs' names into
cask_output_tensor_names so all partition outputs are captured for
Conv→LayerNorm matching.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 763a51bf-433e-4e78-bdb4-d0e2466fae44

📥 Commits

Reviewing files that changed from the base of the PR and between 3431f8b and 88f2eca.

📒 Files selected for processing (5)

modelopt/onnx/quantization/graph_utils.py
modelopt/onnx/quantization/int8.py
modelopt/onnx/quantization/ort_utils.py
tests/_test_utils/onnx/lib_test_models.py
tests/unit/onnx/quantization/test_qdq_rules_int8.py

modelopt/onnx/quantization/graph_utils.py

codecov · 2026-03-25T00:08:34Z

Codecov Report

❌ Patch coverage is 80.64516% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.26%. Comparing base (291498b) to head (523da9d).

Files with missing lines	Patch %	Lines
modelopt/onnx/quantization/graph_utils.py	78.57%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1117      +/-   ##
==========================================
+ Coverage   70.21%   70.26%   +0.05%     
==========================================
  Files         228      228              
  Lines       25952    25970      +18     
==========================================
+ Hits        18221    18247      +26     
+ Misses       7731     7723       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

cjluo-nv

Review: Correctness & Simplicity

Overall the PR is well-motivated and the approach is sound. A few suggestions:

1. Deduplicate backward traversal logic

_is_input_from_cask_partition (new) does essentially the same backward-through-copy-ops traversal as the existing _is_following_cask_partition (line ~340), but with a different interface (tensors + output tensor name set vs. nodes + node name set).

Consider reusing _is_following_cask_partition directly. For each LayerNormalization node, get the producer of inputs[0] and call _is_following_cask_partition(producer). This would:

Eliminate _is_input_from_cask_partition entirely
Remove the cask_output_tensor_names set construction
Avoid the assumption that partition[-1] is the output node (see point 2)

for node in graph.nodes:
    if node.op \!= "LayerNormalization":
        continue
    inp_tensor = node.inputs[0]
    if inp_tensor.inputs:
        producer = inp_tensor.inputs[0]
        if _is_following_cask_partition(producer):
            conv_to_ln_nodes.append(node)

2. `partition[-1]` assumption

last_node = partition[-1]
for output_tensor in last_node.outputs:
    cask_output_tensor_names.add(output_tensor.name)

This assumes partitions are ordered with the output node last. If a CASK partition ever has different ordering or multiple output nodes not at the end, this would miss patterns. Reusing _is_following_cask_partition (which checks all nodes via cask_partition_nodes) avoids this assumption entirely.

3. `QDQNormalization` for LayerNorm — input count difference

BatchNormalization has 5 inputs while LayerNormalization has 3. Worth confirming that QDQNormalization only quantizes input[0] (activation) and doesn't attempt to access inputs that don't exist for LayerNorm.

4. Minor: recursive traversal without depth limit

_is_input_from_cask_partition recurses through all copy ops without a bound. The existing _is_following_cask_partition has the same issue, but a max-depth guard (e.g., 10) would be cheap insurance against pathological graphs for both.

ajrasane · 2026-03-25T19:23:26Z

LayerNormalization is already present in is_normalization_op

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/graph_utils.py`:
- Around line 346-353: The helper _is_following_cask_partition incorrectly
requires every parent of a copy op to lead to a CASK partition; instead, when
encountering copy ops (e.g., Reshape, Squeeze, Unsqueeze — the copy/shape-only
ops used in activation path), only follow the data input edge (the activation
input, typically input index 0) rather than all parents. Update the logic where
get_parent_nodes(node) is used in _is_following_cask_partition so that for
parents whose op_type is a copy op you recurse only on the parent node that
supplies the data input (e.g., obtain the input node at index 0 and recurse on
that), while leaving non-copy parents unchanged; this will also fix callers such
as filter_quantizable_kgen_heads that rely on this helper.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 486fadf3-a592-4f69-9c82-9ea690cc2d17

📥 Commits

Reviewing files that changed from the base of the PR and between 45cf99f and 523da9d.

📒 Files selected for processing (5)

modelopt/onnx/quantization/graph_utils.py
modelopt/onnx/quantization/int8.py
modelopt/onnx/quantization/ort_utils.py
tests/_test_utils/onnx/lib_test_models.py
tests/unit/onnx/quantization/test_qdq_rules_int8.py

🚧 Files skipped from review as they are similar to previous changes (3)

modelopt/onnx/quantization/int8.py
tests/unit/onnx/quantization/test_qdq_rules_int8.py
tests/_test_utils/onnx/lib_test_models.py

coderabbitai · 2026-03-25T19:34:29Z

modelopt/onnx/quantization/graph_utils.py

+    parent_nodes = get_parent_nodes(node)
+    if len(parent_nodes) == 0:
+        return False
+
+    return all(
+        _is_following_cask_partition(parent, cask_partition_nodes, max_depth - 1)
+        for parent in parent_nodes
+    )


⚠️ Potential issue | 🟠 Major

Follow only the data edge through copy ops.

Line 350 currently requires every parent of a copy op to trace back to a CASK partition. That rejects valid Conv -> Reshape/Squeeze/Unsqueeze -> LayerNormalization chains when the shape/axes input is dynamic, because those auxiliary inputs are not part of the activation path. The same false negative now affects filter_quantizable_kgen_heads() too, since it shares this helper.

💡 Proposed fix

- parent_nodes = get_parent_nodes(node) - if len(parent_nodes) == 0: + if not node.inputs or not node.inputs[0].inputs: return False - return all( - _is_following_cask_partition(parent, cask_partition_nodes, max_depth - 1) - for parent in parent_nodes - ) + data_parent = node.inputs[0].inputs[0] + return _is_following_cask_partition(data_parent, cask_partition_nodes, max_depth - 1)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@modelopt/onnx/quantization/graph_utils.py` around lines 346 - 353, The helper _is_following_cask_partition incorrectly requires every parent of a copy op to lead to a CASK partition; instead, when encountering copy ops (e.g., Reshape, Squeeze, Unsqueeze — the copy/shape-only ops used in activation path), only follow the data input edge (the activation input, typically input index 0) rather than all parents. Update the logic where get_parent_nodes(node) is used in _is_following_cask_partition so that for parents whose op_type is a copy op you recurse only on the parent node that supplies the data input (e.g., obtain the input node at index 0 and recurse on that), while leaving non-copy parents unchanged; this will also fix callers such as filter_quantizable_kgen_heads that rely on this helper.

cjluo-nv

All review feedback has been addressed:

Backward traversal logic deduplicated into a single _is_following_cask_partition at module level
partition[-1] assumption removed — now checks all partition nodes
Recursion depth limit added (max_depth=10)
Confirmed QDQNormalization is safe for LayerNorm (LayerNormalization is in is_normalization_op, and only input[0]/output[0] are quantized)

LGTM.

ajrasane self-assigned this Mar 24, 2026

ajrasane requested a review from a team as a code owner March 24, 2026 23:55

ajrasane requested a review from cjluo-nv March 24, 2026 23:55

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

modelopt/onnx/quantization/graph_utils.py Outdated Show resolved Hide resolved

cjluo-nv reviewed Mar 25, 2026

View reviewed changes

ajrasane added 2 commits March 25, 2026 19:24

[5271237][ONNX] Add Q/DQ placement for Conv->LayerNorm patterns

34139ec

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

Reuse _is_following_cask_partition

523da9d

Signed-off-by: ajrasane <131806219+ajrasane@users.noreply.github.com>

ajrasane force-pushed the ajrasane/conv_layernorm branch from 45cf99f to 523da9d Compare March 25, 2026 19:24

coderabbitai bot reviewed Mar 25, 2026

View reviewed changes

cjluo-nv approved these changes Mar 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BugFix][5271237][ONNX] Add Q/DQ placement for Conv->LayerNorm patterns#1117

[BugFix][5271237][ONNX] Add Q/DQ placement for Conv->LayerNorm patterns#1117
ajrasane wants to merge 2 commits intomainfrom
ajrasane/conv_layernorm

ajrasane commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 24, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-03-25 19:28 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

codecov bot commented Mar 25, 2026 •

edited

Loading

Uh oh!

cjluo-nv left a comment

Uh oh!

ajrasane commented Mar 25, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Mar 25, 2026

Uh oh!

cjluo-nv left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ajrasane commented Mar 24, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Changes:

Usage

Testing

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

github-actions bot commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-03-25 19:28 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov bot commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Review: Correctness & Simplicity

1. Deduplicate backward traversal logic

2. partition[-1] assumption

3. QDQNormalization for LayerNorm — input count difference

4. Minor: recursive traversal without depth limit

Uh oh!

ajrasane commented Mar 25, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 25, 2026

Choose a reason for hiding this comment

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ajrasane commented Mar 24, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 24, 2026 •

edited

Loading

github-actions bot commented Mar 24, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-03-25 19:28 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov bot commented Mar 25, 2026 •

edited

Loading

2. `partition[-1]` assumption

3. `QDQNormalization` for LayerNorm — input count difference