Skip to content

Added fallback to preload cudnn dlls from nvidia cudnn venv package or torch venv package#1119

Open
hthadicherla wants to merge 1 commit intomainfrom
hthadicherla/add-ort-preload-dll-cudnn
Open

Added fallback to preload cudnn dlls from nvidia cudnn venv package or torch venv package#1119
hthadicherla wants to merge 1 commit intomainfrom
hthadicherla/add-ort-preload-dll-cudnn

Conversation

@hthadicherla
Copy link
Copy Markdown
Contributor

@hthadicherla hthadicherla commented Mar 25, 2026

What does this PR do?

Type of change: Bug fix

There was a QA team that was testing the modelopt 0.43 release and pointed out that we could install nvidia-cudnn pypi packages and use ort.preload_dlls() to load the dlls from the python venv instead of trying to search in system path only .

Here is the info about onnxruntime.preload_dlls() function
image

So added fallback to system path cudnn search to preload dlls and if that also fails then raise exception.

Testing

Tested quantization by installing nvidia-cudnn-cu12 package and removing cudnn dlls from system path. Working as expected.

Summary by CodeRabbit

  • Bug Fixes
    • Improved cuDNN/CUDA discovery to attempt loading missing libraries from Python site-packages when not found on the system path.
    • Added clearer runtime messages: warnings before fallback attempts, info on successful loads, and warnings on fallback failures.
    • Updated failure message to include guidance on installing appropriate cuDNN packages and to only error after fallback attempts fail.

@hthadicherla hthadicherla requested a review from a team as a code owner March 25, 2026 06:44
@hthadicherla hthadicherla requested a review from ajrasane March 25, 2026 06:44
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Added a fallback in _check_for_libcudnn() that, if the cuDNN library isn’t found via system library paths, attempts to preload DLLs via onnxruntime.preload_dlls(); logging was extended and the final error message now mentions both env var paths and site-packages installation.

Changes

Cohort / File(s) Summary
cuDNN Library Resolution Logic
modelopt/onnx/quantization/ort_utils.py
Updated _check_for_libcudnn() to try onnxruntime.preload_dlls() when system lookup fails; added warning/info logs around preload attempts and exceptions; adjusted control flow to return True on successful preload and updated FileNotFoundError text to reference env vars and site-packages (pip) installation guidance.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant _check_for_libcudnn
    participant OS_Loader as "OS library search (PATH/LD_LIBRARY_PATH)"
    participant ONNXRT as "onnxruntime.preload_dlls()"

    Caller->>_check_for_libcudnn: invoke
    _check_for_libcudnn->>OS_Loader: search for libcudnn
    alt found
        OS_Loader-->>_check_for_libcudnn: path to libcudnn
        _check_for_libcudnn-->>Caller: return True
    else not found
        OS_Loader-->>_check_for_libcudnn: not found
        _check_for_libcudnn->>ONNXRT: attempt preload_dlls()
        alt preload succeeds
            ONNXRT-->>_check_for_libcudnn: success
            _check_for_libcudnn-->>Caller: return True
        else preload fails or raises
            ONNXRT-->>_check_for_libcudnn: exception / failure
            _check_for_libcudnn-->>Caller: raise FileNotFoundError (mentions env var + preload guidance)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a fallback mechanism to preload cuDNN DLLs from Python venv packages when system PATH search fails.
Security Anti-Patterns ✅ Passed The PR changes do not violate security anti-patterns defined in SECURITY.md. Code safely calls onnxruntime.preload_dlls() with proper exception handling.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch hthadicherla/add-ort-preload-dll-cudnn

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 25, 2026

PR Preview Action v1.8.1

QR code for preview link

🚀 View preview at
https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1119/

Built to branch gh-pages at 2026-03-25 11:38 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 76-93: The error text incorrectly states that
onnxruntime.preload_dlls() failed even when preload_dlls doesn't exist; update
the failure path in the block around ort.preload_dlls to distinguish two cases:
if hasattr(ort, "preload_dlls") is False, log/error that preload_dlls is
unavailable in this ORT version (mention ort.preload_dlls is not present) and
include env_variable/lib_pattern in the message; if hasattr is True but the
try/except catches an exception from ort.preload_dlls(), preserve the existing
warning and then raise the FileNotFoundError with a message that clearly says
preload_dlls was attempted and failed (referencing ort.preload_dlls,
logger.warning, logger.error, env_variable, lib_pattern, and the final
FileNotFoundError).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 26fd6dd9-298b-47d5-bb9c-a56c9bebd6c5

📥 Commits

Reviewing files that changed from the base of the PR and between ae965a9 and 56f5c33.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/ort_utils.py

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2026

Codecov Report

❌ Patch coverage is 54.54545% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.21%. Comparing base (ae965a9) to head (c9f7052).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/onnx/quantization/ort_utils.py 54.54% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1119      +/-   ##
==========================================
+ Coverage   70.18%   70.21%   +0.02%     
==========================================
  Files         228      228              
  Lines       25952    25960       +8     
==========================================
+ Hits        18215    18227      +12     
+ Misses       7737     7733       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…a-cudnn-cu12 package incase the dlls don't exist in system path

Signed-off-by: Hrishith Thadicherla <hthadicherla@nvidia.com>
@hthadicherla hthadicherla force-pushed the hthadicherla/add-ort-preload-dll-cudnn branch from 56f5c33 to c9f7052 Compare March 25, 2026 11:35
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@modelopt/onnx/quantization/ort_utils.py`:
- Around line 77-86: preload_dlls() can silently do nothing, so update the
preload check in ort_utils.py to call ort.preload_dlls(verbose=True) and then
verify the runtime actually loaded GPU providers by inspecting
ort.get_all_providers() (check for 'CUDAExecutionProvider' and/or
'CUDNNExecutionProvider'); only return True when those providers are present,
otherwise log a warning with the provider list and return False so we don't
report success when DLLs weren't loaded.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ce2ec865-46e8-45b6-b906-3214ad906a93

📥 Commits

Reviewing files that changed from the base of the PR and between 56f5c33 and c9f7052.

📒 Files selected for processing (1)
  • modelopt/onnx/quantization/ort_utils.py

Comment on lines +77 to +86
try:
ort.preload_dlls()
logger.info(
"onnxruntime.preload_dlls() succeeded; CUDA/cuDNN DLLs preloaded from site-packages."
" Please check that this is the correct version needed for your ORT version at"
" https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements."
)
return True
except Exception as e:
logger.warning(f"onnxruntime.preload_dlls() also failed: {e}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

onnxruntime preload_dlls return value documentation

💡 Result:

onnxruntime.preload_dlls is a Python function available in onnxruntime-gpu package since version 1.21.0. It preloads CUDA, cuDNN, and MSVC runtime DLLs (on Windows) or shared libraries (on Linux) to ensure compatibility, especially with PyTorch. Function signature: onnxruntime.preload_dlls(cuda=True, cudnn=True, msvc=True, directory=None, verbose=False) It returns None (void function). No return value is documented or shown in usage examples across official docs, PRs, and source snippets. Calling it performs side effects (loading DLLs) and may print loaded DLLs if verbose=True, but does not return any value. Official documentation: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#preload-dlls (result 4 from first search). Source introduced in PR #23674: microsoft/onnxruntime#23674 (multiple results). Example usage: import onnxruntime onnxruntime.preload_dlls(verbose=True) # Prints list of loaded DLLs if successful, returns None

Citations:


preload_dlls() may silently fail without raising an exception, causing incorrect success reporting.

According to the official ONNX Runtime documentation, preload_dlls() is a void function that returns None and does not raise exceptions when DLLs are missing — it fails gracefully and continues. With the default parameters used here (no verbose=True), missing cuDNN will produce no warning or error, causing the function to return True even when the library was never loaded.

Either call preload_dlls(verbose=True) to detect missing libraries in the logging output, validate that the libraries were actually loaded after the call, or remove this unreliable check entirely.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/onnx/quantization/ort_utils.py` around lines 77 - 86, preload_dlls()
can silently do nothing, so update the preload check in ort_utils.py to call
ort.preload_dlls(verbose=True) and then verify the runtime actually loaded GPU
providers by inspecting ort.get_all_providers() (check for
'CUDAExecutionProvider' and/or 'CUDNNExecutionProvider'); only return True when
those providers are present, otherwise log a warning with the provider list and
return False so we don't report success when DLLs weren't loaded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant