Skip to content

Conversation

@GrigoryEvko
Copy link
Contributor

Description

Fixes runtime library loading failures when building with CUDA 13 by replacing hardcoded CUDA 12 references with dynamic version detection.

Related to #26516 which updates CUDA 13 build pipelines, but this PR fixes the Python runtime code that was still hardcoded to CUDA 12.

Problem

The build system correctly detects CUDA 13 via CMake, but the runtime Python code had CUDA 12 hardcoded in multiple locations, causing "CUDA 12 not found" errors on CUDA 13 systems.

Solution

Modified onnxruntime/init.py and setup.py to dynamically use the detected CUDA version instead of hardcoded "12" strings.

Changes

  • Dynamic CUDA version extraction from build info
  • Library paths now use f-strings with cuda_major_version
  • Added CUDA 13 support to extras_require and dependency exclusions
  • Fixed TensorRT RTX package to use correct CUDA version
  • Updated version validation to accept CUDA 12+
  • Fixed PyTorch compatibility checks to compare versions dynamically

Impact

  • CUDA 13 builds now load correct libraries
  • Backward compatible with CUDA 12
  • Forward compatible with future CUDA versions

Testing

Verified with CUDA 13.0 build that library paths resolve correctly and preload_dlls() loads CUDA 13 libraries without errors.

The runtime Python code had CUDA 12 hardcoded in multiple locations,
causing library loading failures when building with CUDA 13.

Changes in onnxruntime/__init__.py:
- Dynamic library path generation using detected CUDA version
- Updated version validation to accept CUDA 12+
- Fixed PyTorch compatibility check to compare CUDA versions dynamically
- Updated diagnostic checks to report correct nvidia packages

Changes in setup.py:
- Added is_cuda_version_13 flag and proper version parsing
- Added CUDA 13 library versions to dependency exclusion list
- Added CUDA 13 extras_require with nvidia-*-cu13 packages
- Fixed TensorRT RTX to use dynamic CUDA version

Fixes runtime "CUDA 12 not found" errors on CUDA 13 systems while
maintaining backward compatibility with CUDA 12 builds.
@GrigoryEvko
Copy link
Contributor Author

@microsoft-github-policy-service agree

- Extract CUDA major version parsing into _extract_cuda_major_version() helper
- Add _get_cufft_version() for CUDA-version-specific cufft mapping (11 for CUDA 12.x, 12 for CUDA 13.x)
- Replace all try-except-pass blocks with contextlib.suppress() (fixes RUFF/SIM105)
- Apply dynamic cufft version to both Windows and Linux DLL paths
- Add contextlib import to setup.py and onnxruntime/__init__.py

This eliminates code duplication and ensures correct library versions across CUDA versions.
…if blocks

- Replace separate is_cuda_version_12/13 conditions with single dynamic block
- Use cuda_major_version variable with f-strings for package names
- Apply correct cufft version mapping (11.0 for CUDA 12, 12.0 for CUDA 13)
- Works automatically for future CUDA versions
@GrigoryEvko GrigoryEvko requested a review from xadupre November 7, 2025 21:33
@tianleiwu
Copy link
Contributor

@GrigoryEvko
Copy link
Contributor Author

Maybe this way? deduplicated some code

@GrigoryEvko GrigoryEvko requested a review from tianleiwu November 8, 2025 23:15
@tianleiwu
Copy link
Contributor

/azp run Linux QNN CI Pipeline, Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@tianleiwu tianleiwu merged commit 3c5f177 into microsoft:main Nov 9, 2025
90 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants