Skip to content

Commit 5140331

Browse files
committed
build: Restrict prebuilt static libraries to tensor_kernels on x86_64
1 parent 6a52484 commit 5140331

File tree

3 files changed

+204
-93
lines changed

3 files changed

+204
-93
lines changed

lib/kvbm-kernels/README.md

Lines changed: 32 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -159,7 +159,7 @@ with 128 blocks, that's the difference between 50μs and 5s of added latency per
159159
├── build.rs # NVCC build script (sm80+sm90 by default)
160160
├── cuda/
161161
│ ├── tensor_kernels.cu # Batched CUDA kernels + memcpy fallback
162-
│ └── prebuilt/ # Prebuilt .fatbin files with MD5 checksums
162+
│ └── prebuilt/ # Prebuilt .fatbin, .a (static libs), and .md5 checksums
163163
├── src/
164164
│ ├── lib.rs # Rust facade for the kernels
165165
│ └── tensor_kernels.rs # FFI wrappers + integration tests
@@ -230,21 +230,22 @@ path (block ⇄ universal ⇄ operational), and asserts lossless round-trips.
230230

231231
#### Prebuilt Kernels
232232

233-
By default, the build system uses prebuilt `.fatbin` files from `cuda/prebuilt/`
234-
if `nvcc` is not available. To force building from source:
233+
By default, the build system automatically:
234+
- **Uses prebuilt** `.fatbin` and `.a` files from `cuda/prebuilt/` if `nvcc` is **not available**
235+
- **Builds from source** if `nvcc` is **available**
236+
237+
To force using prebuilt kernels even when nvcc is available:
235238

236239
```bash
237-
# Disable prebuilt kernels
238-
export DYNAMO_USE_PREBUILT_KERNELS=false
239-
cargo build
240+
cargo build --features prebuilt-kernels
240241
```
241242

242243
After modifying CUDA source, regenerate prebuilt kernels and update checksums:
243244

244245
```bash
245246
# This rebuilds tensor_kernels.cu and updates MD5 hashes
246247
cargo build --release
247-
# Commit the updated cuda/prebuilt/tensor_kernels.{fatbin,md5}
248+
# Commit the updated cuda/prebuilt/tensor_kernels.{fatbin,a,md5}
248249
```
249250

250251
**Important:** If you change `CUDA_ARCHS` or update your nvcc version, you need to
@@ -260,6 +261,30 @@ cargo build --release
260261
The build system only checks if the `.cu` source has changed, not build configuration.
261262
This prevents CI from regenerating non-reproducible `.a` files unnecessarily.
262263

264+
##### Architecture Limitations
265+
266+
**Prebuilt mode currently only supports x86_64 architecture.**
267+
268+
Static libraries (`.a` files) contain compiled host-side C++ code and are CPU architecture-specific.
269+
The prebuilt `libtensor_kernels.a` is built for x86_64. On ARM (aarch64) or other architectures,
270+
you must install `nvcc` and build `tensor_kernels` from source.
271+
272+
The build will fail with a clear error message if you attempt prebuilt mode on ARM:
273+
274+
```
275+
╔════════════════════════════════════════════════════════════════════════╗
276+
║ Prebuilt mode is not supported on aarch64 architecture ║
277+
║ ║
278+
║ Static libraries (.a files) are CPU architecture-specific. ║
279+
║ Prebuilt libtensor_kernels.a is only available for x86_64. ║
280+
║ ║
281+
║ Please install nvcc to build from source, or use an x86_64 system. ║
282+
╚════════════════════════════════════════════════════════════════════════╝
283+
```
284+
285+
**Note:** Only `tensor_kernels.cu` requires a static library (`.a`) for FFI linking. The
286+
`vectorized_copy.cu` kernel loads at runtime via `.fatbin` and works on all architectures.
287+
263288
---
264289

265290
### Python Bindings & Tests

0 commit comments

Comments
 (0)