[Core] Report CUDA versions when NVRTC compilation fails by timmoon10 · Pull Request #2842 · NVIDIA/TransformerEngine

timmoon10 · 2026-04-07T00:48:30Z

Description

NVRTC compilation involves three CUDA versions:

Compile-time CUDA: used to compile Transformer Engine
CUDA Runtime: linked during runtime and visible to libnvrtc.so
Run-time CUDA headers: included in run-time NVRTC compilations

If the user's system is misconfigured, these CUDA versions may be inconsistent and cause strange errors (e.g. #1018). This PR reports each of the CUDA versions to help with debugging.

Closes #1018.

Type of change

Documentation change (change only to the documentation, either a fix or a new content)
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Infra/Build change
Code refactoring

Changes

Report CUDA versions whe NVRTC compilation fails

Checklist:

I have read and followed the contributing guidelines
The functionality is complete
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes

When NVRTC kernel compilation fails, detect whether the linked NVRTC library and the CUDA headers used for compilation are from different CUDA versions, and if so emit an actionable note to stderr pointing the user toward NVTE_CUDA_INCLUDE_DIR / CUDA_HOME / LD_LIBRARY_PATH. The header version is obtained by compiling a tiny probe program that embeds CUDA_VERSION (from cuda.h) into a static_assert failure message, so the macro is resolved by the actual preprocessor rather than by parsing header text. All probe failures are silent; the check is purely informational and never causes a premature error. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>

Still buggy, include_directory_version returns CUDA runtime version instead of header version. Signed-off-by: Tim Moon <tmoon@nvidia.com>

The NVRTC probe approach was broken: NVRTC pre-defines CUDART_VERSION to its own version before processing any includes, so the probe always returned the NVRTC version regardless of the headers on the include path. Fix by reading cuda_runtime_api.h as text and parsing the "#define CUDART_VERSION <integer>" line directly. This is immune to NVRTC's internal macro management, and the format has been stable across all CUDA versions. Also decode raw CUDA version integers to "major.minor" strings in the error message for readability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>

Test that the CUDA include directory is found and that its version matches the compile-time CUDART_VERSION. Also export transformer_engine::cuda::* symbols and tighten the rtc export pattern in the version script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Tim Moon <tmoon@nvidia.com>

for more information, see https://pre-commit.ci

greptile-apps · 2026-04-07T00:53:47Z

Greptile Summary

This PR adds CUDA version diagnostics to NVRTC compilation failures, reporting compile-time CUDA (CUDA_VERSION), NVRTC runtime (nvrtcVersion()), and runtime header versions (include_directory_version()) to help debug misconfigured environments (e.g. issue #1018). A new include_directory_version() function parses CUDART_VERSION from cuda_runtime_api.h at runtime, and the ABI linker script is updated to export the new symbol for unit testing.

Confidence Score: 5/5

This PR is safe to merge — all prior review concerns have been addressed or accepted, and no new P0/P1 issues were found.

All substantive issues (mismatch warning with empty path when headers not found, lack of caching in include_directory_version, and wildcard ABI export) were identified and resolved in prior review threads. The remaining observations are purely P2 quality suggestions. The implementation is logically correct: CUDA version encoding (major1000+minor10) is consistent across all three version sources, the version_string lambda correctly handles negative values with '', and the new unit test provides good coverage.

No files require special attention; the libtransformer_engine.version wildcard change was explicitly approved by a senior developer.

Important Files Changed

Filename	Overview
transformer_engine/common/util/rtc.cpp	Logs compile-time CUDA, NVRTC runtime, and header versions on NVRTC compilation failure; emits mismatch warning when NVRTC version ≠ header version
transformer_engine/common/util/cuda_runtime.cpp	Adds include_directory_version() that parses CUDART_VERSION from cuda_runtime_api.h; not cached unlike sibling functions cudart_version()/cublas_version()
transformer_engine/common/util/cuda_runtime.h	Declares include_directory_version() with doc-comment matching sibling function style; clean addition
transformer_engine/common/libtransformer_engine.version	Widens cuda namespace export to transformer_engine::cuda::* wildcard; fixes rtc pattern to transformer_engine::rtc::*; intentional to expose utility functions for unit testing
tests/cpp/util/test_nvrtc.cpp	Adds CUDAHeaders test verifying include_directory() is non-empty and include_directory_version() == CUDART_VERSION

Sequence Diagram

sequenceDiagram
    participant C as compile()
    participant NVRTC as nvrtcCompileProgram
    participant IDV as include_directory_version()
    participant FS as cuda_runtime_api.h

    C->>NVRTC: nvrtcCompileProgram(...)
    NVRTC-->>C: NVRTC_ERROR_COMPILATION
    C->>C: read CUDA_VERSION (compile-time macro)
    C->>NVRTC: nvrtcVersion(&major, &minor)
    NVRTC-->>C: major, minor → nvrtc_version
    C->>IDV: cuda::include_directory_version()
    IDV->>IDV: cuda::include_directory(false)
    IDV->>FS: open & parse #define CUDART_VERSION
    FS-->>IDV: version int
    IDV-->>C: header_version
    C->>C: log all 3 versions via version_string()
    C->>C: if nvrtc_version != header_version → emit warning
    C->>NVRTC: nvrtcGetProgramLogSize / nvrtcGetProgramLog
    NVRTC-->>C: compilation log
    C->>C: std::cerr << full log
    C->>C: NVTE_CHECK_NVRTC(compile_result) → throw

_{Greploops — Automatically fix all review issues by running /greploops in Claude Code. It iterates: fix, push, re-review, repeat until 5/5 confidence.
Use the Greptile plugin for Claude Code to query reviews, search comments, and manage custom context directly from your terminal.}

_{Reviews (2): Last reviewed commit: "Tweak version message" | Re-trigger Greptile}

transformer_engine/common/util/rtc.cpp

greptile-apps · 2026-04-07T00:53:52Z

transformer_engine/common/util/cuda_runtime.cpp

+int include_directory_version(bool required) {
+  // Header path
+  const auto &include_dir = cuda::include_directory(false);
+  if (include_dir.empty()) {
+    if (required) {
+      NVTE_ERROR(
+          "Could not detect version of CUDA Toolkit headers "
+          "(CUDA Toolkit headers not found).");
+    }
+    return -1;
+  }
+
+  // Parse CUDART_VERSION from cuda_runtime_api.h.
+  const auto header_path = std::filesystem::path(include_dir) / "cuda_runtime_api.h";
+  std::ifstream header_file(header_path);
+  if (header_file.is_open()) {
+    const std::string define_prefix = "#define CUDART_VERSION ";
+    std::string line;
+    while (std::getline(header_file, line)) {
+      const auto pos = line.find(define_prefix);
+      if (pos == std::string::npos) {
+        continue;
+      }
+      try {
+        const int version = std::stoi(line.substr(pos + define_prefix.size()));
+        if (version > 0) {
+          return version;
+        }
+      } catch (...) {
+        continue;
+      }
+    }
+  }
+
+  if (required) {
+    NVTE_ERROR(
+        "Could not detect version of CUDA Toolkit headers "
+        "(Could not parse CUDART_VERSION from ",
+        header_path.string(), ").");
+  }
+  return -1;
+}


Result not cached, unlike sibling functions

include_directory_version() re-opens and re-parses cuda_runtime_api.h on every call. The sibling functions cudart_version() and cublas_version() in the same file both cache their result with a static local variable. While the function is currently only exercised on the failure path in rtc.cpp, caching the result would be consistent with the established pattern and avoids repeated I/O if the function is ever called more broadly (e.g. from tests):

int include_directory_version(bool required) { static int cached = [&]() -> int { // ... existing parsing logic ... }(); // handle `required` separately for the error message ... }

Alternatively, add a static int cached_version = -1; if (cached_version >= 0) return cached_version; guard at the top.

greptile-apps · 2026-04-07T00:53:53Z

transformer_engine/common/libtransformer_engine.version

-			transformer_engine::cuda::supports_multicast*;
-			transformer_engine::cuda::stream_priority_range*;
-			transformer_engine::cuda::current_device*;
+			transformer_engine::cuda::*;


Wildcard export broadens ABI surface

Changing the explicit list of transformer_engine::cuda::sm_count*, sm_arch*, etc. to the blanket transformer_engine::cuda::* will export every symbol in the cuda namespace — including any future internal helpers — as part of the library's public ABI. This may be intentional to avoid updating the linker script whenever a new function is added, but it is worth confirming the trade-off, since once a symbol is exported it must be maintained for ABI compatibility.

I agree, we shouldn't do that. Also, why do we need to modify this file in this PR at all? The newly added function is not part of the public API anyway.

You're right — the new include_directory_version() function is only used internally (in rtc.cpp for error reporting and in the test suite). Since the tests link against the same library, the symbol doesn't need to be exported in the version script.

The wildcard change appears to be unnecessary. You can revert the modifications to libtransformer_engine.version entirely — the explicit symbol list should work as-is without any changes for this PR.

We need to expose the utility functions if we want to test them individually in the C++ unit tests.

Hmmm, this actually shows a separate issue (that we should probably tackle in a separate PR and merge this one without the changes to the build and without the unit test) that the C++ tests as they are currently do not have the visibility into the internal functions. On one hand that is good, since it makes it possible to test that the API surface is correct and that we can do everything we want by just calling the right exposed functions, but on the other it makes it impossible to have targeted tests for those internal implementations without exposing them to the world. I think we should rethink this relationship to have either one or the other as the target of the C++ unit tests (and I would argue that the framework level integration already tackles the API exposure testing so we could make the C++ tests more coupled with the internals of the library itself).

timmoon10 · 2026-04-07T00:54:38Z

transformer_engine/common/util/cuda_runtime.cpp

+    return -1;
+  }
+
+  // Parse CUDART_VERSION from cuda_runtime_api.h.


I don't really like how we parse the header as a text file. However, when I tried compiling a test program with NVRTC it would override the header's CUDART_VERSION macro with the CUDA Runtime version.

timmoon10 · 2026-04-07T00:55:43Z

/te-ci

transformer_engine/common/util/rtc.cpp

@ptrendx

Suggestion from @ptrendx Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

timmoon10 and others added 4 commits April 4, 2026 01:54

Move CUDA header version check to CUDA runtime utils

1984669

Still buggy, include_directory_version returns CUDA runtime version instead of header version. Signed-off-by: Tim Moon <tmoon@nvidia.com>

timmoon10 requested a review from Oleg-Goncharov April 7, 2026 00:48

timmoon10 added the enhancement New feature or request label Apr 7, 2026

pre-commit-ci bot and others added 2 commits April 7, 2026 00:50

[pre-commit.ci] auto fixes from pre-commit.com hooks

2f0dae3

for more information, see https://pre-commit.ci

Merge branch 'main' into tmoon/nvrtc-version-check

ed4eaed

greptile-apps bot reviewed Apr 7, 2026

View reviewed changes

timmoon10 commented Apr 7, 2026

View reviewed changes

transformer_engine/common/util/rtc.cpp Outdated Show resolved Hide resolved

Tweak version message

2b90938

Suggestion from @ptrendx Signed-off-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Core] Report CUDA versions when NVRTC compilation fails#2842

[Core] Report CUDA versions when NVRTC compilation fails#2842
timmoon10 wants to merge 7 commits intoNVIDIA:mainfrom
timmoon10:tmoon/nvrtc-version-check

timmoon10 commented Apr 7, 2026

Uh oh!

greptile-apps bot commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

greptile-apps bot Apr 7, 2026

Uh oh!

greptile-apps bot Apr 7, 2026

Uh oh!

ptrendx Apr 7, 2026

Uh oh!

greptile-apps bot Apr 7, 2026

Uh oh!

timmoon10 Apr 7, 2026

Uh oh!

ptrendx Apr 7, 2026

Uh oh!

timmoon10 Apr 7, 2026

Uh oh!

timmoon10 commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

timmoon10 commented Apr 7, 2026

Description

Type of change

Changes

Checklist:

Uh oh!

greptile-apps bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

Uh oh!

greptile-apps bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

timmoon10 Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

ptrendx Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

timmoon10 Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

timmoon10 commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

greptile-apps bot commented Apr 7, 2026 •

edited

Loading