Skip to content

Conversation

@quic-tirupath
Copy link
Contributor

Description

  • ONNX models exported with older Opset version contains Gelu operator decomposed into multiple operators (Div, Erf, Add, Mul).
  • QNN doesn't support Erf operator but supports Gelu operator
  • Since QNN doesn't support Erf operator, the graphs contain Gelu pattern partition between QNN and CPU EPs and degrading the inference time.

Motivation and Context

  • Identify and fuse the Gelu pattern into a QNN Gelu node improves the inference time.

@quic-tirupath
Copy link
Contributor Author

@chilo-ms
As i mentioned in #26332, i would like to use this PR for merging this fusion.
Could you please help to trigger CI job ?

@quic-tirupath
Copy link
Contributor Author

@chilo-ms, @devang-ml
could you please refer to my comment in #26332 (comment).

Could you please help to trigger CI job on this PR.

Thanks,

@adrianlizarraga
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@quic-tirupath
Copy link
Contributor Author

quic-tirupath commented Nov 22, 2025

Hi @adrianlizarraga ,
I rebased the PR with removing dependency on ORT Core's QDQ selector.
Could you please trigger CI and help with merge after successful CI run.

@adrianlizarraga
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@adrianlizarraga
Copy link
Contributor

Hi @quic-tirupath, you may have to sync with the latest main branch to get CIs to pass.

quic-tirupath and others added 4 commits November 24, 2025 08:25
 - ONNX models exported with older Opset version contains Gelu operator
   decomposed into multiple operators (Div, Erf, Add, Mul).
 - QNN doesn't support Erf operator but supports Gelu operator
 - Since QNN doesn't support Erf operator, the graphs contain Gelu pattern
   partition between QNN and CPU EPs and degrading the inference time.
 - Identify and fuse the Gelu pattern into a QNN Gelu node improves
   the inference time.
@quic-tirupath quic-tirupath force-pushed the dev/tirupath/erf_gelu_qnn_fusion branch from e881532 to 5812d30 Compare November 24, 2025 16:25
@quic-tirupath
Copy link
Contributor Author

Hi @quic-tirupath, you may have to sync with the latest main branch to get CIs to pass.

@adrianlizarraga
I rebased the PR on tip. Could you please re-trigger CI and help to merge ?
Thanks,

@quic-tirupath
Copy link
Contributor Author

Hi @quic-tirupath, you may have to sync with the latest main branch to get CIs to pass.

@adrianlizarraga
I rebased the PR on tip. Could you please re-trigger CI and help to merge.

@adrianlizarraga
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@adrianlizarraga adrianlizarraga merged commit e8bcd0d into microsoft:main Nov 24, 2025
90 of 97 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants