Skip to content

Conversation

@tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Nov 6, 2025

Changes

Update cuda 13 python packaging pipeline:
(1) Use fatbin compress mode = size to reduce package size. This could significantly reduce package size.
(2) Update CMAKE_CUDA_ARCHITECTURES for cuda 13. Since we reduced package size, we are able to add more architectures.
(3) Fix cuda 13 packaging pipeline:

  • use correct (cuda13 instead of cuda12) manylinux docker. The new linux docker has cuda 13.0.2 and cuDNN 9.14.
  • pass cuda version properly to run build_linux_python_package.sh in docker. (CUDA_VERSION in docker was 12.8.1, and now we pass "12.8" from yml to be consistent).

Note that the compress mode and cuda archs settings are not changed for CUDA 12.8, so cuda 12 wheel size is larger than cuda 13 wheel size. We can update settings for cuda 12.8 in a separated PR if needed.

The nuget pipeline for cuda 13 need extra code change, and this PR only fixes python packaging pipeline.

Python GPU Wheel Size (Cuda Architectures + PTX)

CUDA Windows Linux
12.8 221 MB (52;61;75;86;89+90) 271 MB (60;70;75;80;86;90a+90)
13.0 186 MB (75;80;86;89;90a;100a;120a+120) 191 MB (75;80;86;89;90a;100a;120a+120)

@tianleiwu tianleiwu marked this pull request as draft November 6, 2025 22:40
@GrigoryEvko
Copy link
Contributor

Yes please, nightly feeds are not available publicly anymore and nobody raised this issue and all cuda 13 builds hardcode cuda 12 cublas and other libraries

@GrigoryEvko
Copy link
Contributor

Also this pr is required to fix cuda 13 builds #26518

@tianleiwu tianleiwu marked this pull request as ready for review November 7, 2025 20:13
@tianleiwu tianleiwu requested a review from snnn November 9, 2025 05:53
tianleiwu pushed a commit that referenced this pull request Nov 9, 2025
## Description
Fixes runtime library loading failures when building with CUDA 13 by
replacing hardcoded CUDA 12 references with dynamic version detection.

Related to #26516 which updates CUDA 13 build pipelines, but this PR
fixes the Python runtime code that was still hardcoded to CUDA 12.

## Problem
The build system correctly detects CUDA 13 via CMake, but the runtime
Python code had CUDA 12 hardcoded in multiple locations, causing "CUDA
12 not found" errors on CUDA 13 systems.

## Solution
Modified onnxruntime/__init__.py and setup.py to dynamically use the
detected CUDA version instead of hardcoded "12" strings.

## Changes
- Dynamic CUDA version extraction from build info
- Library paths now use f-strings with cuda_major_version
- Added CUDA 13 support to extras_require and dependency exclusions
- Fixed TensorRT RTX package to use correct CUDA version
- Updated version validation to accept CUDA 12+
- Fixed PyTorch compatibility checks to compare versions dynamically

## Impact
- CUDA 13 builds now load correct libraries
- Backward compatible with CUDA 12
- Forward compatible with future CUDA versions

## Testing
Verified with CUDA 13.0 build that library paths resolve correctly and
preload_dlls() loads CUDA 13 libraries without errors.
@tianleiwu tianleiwu enabled auto-merge (squash) November 10, 2025 03:30
@tianleiwu tianleiwu merged commit 8fceca0 into main Nov 10, 2025
123 of 127 checks passed
@tianleiwu tianleiwu deleted the tlwu/fix_cuda13_pipeline branch November 10, 2025 19:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants