feat: re-introducing kvbm kernels after revert #4742

oandreeva-nv · 2025-12-03T22:45:29Z

Overview:

Resurrection of #4356 with fixes:

cudaMemcpyBatchAsync needs to be called differently based on CUDA version

Details:

The files changed/introduced in this pr can be divided into multiple subgroups. Main ones are:

Group 1: New KVBM Kernels Library

The core of this PR - a new standalone CUDA kernels library:

lib/kvbm-kernels/                                                                          
 ├── Cargo.toml                                    # [A] Package definition
 ├── build.rs                                      # [A] Build script (nvcc/prebuilt)
 ├── README.md                                     # [A] Documentation          
 ├── src/                                                                                   
 │   ├── lib.rs                                    # [A] Rust module
 │   └── tensor_kernels.rs                         # [A] Rust FFI wrappers     
 └── cuda/                                                                                  
     ├── tensor_kernels.cu                         # [A] CUDA layout conversions        
     ├── vectorized_copy.cu                        # [A] CUDA optimized memcpy
     └── prebuilt/                                                                          
         ├── tensor_kernels.fatbin                 # [A] Precompiled GPU binary         
         ├── libtensor_kernels.a                   # [A] Static library (host + device)
         ├── tensor_kernels.md5                    # [A] Checksum (.cu + .fatbin)
         ├── vectorized_copy.fatbin                # [R] Moved from lib/llm
         └── vectorized_copy.md5                   # [A] Checksum (.cu + .fatbin)

Since static libraries (.a) are arch dependent, I only provide those for x86

Current Behavior (which should be what you want):

When building from source on ARM with nvcc:

cc::Build compiles all .cu files and creates a library in OUT_DIR
This library gets linked into your Rust binary for the current build
You CAN build and run on ARM just fine
We skip generate_prebuilt_artifacts for tensor_kernels, so we don't save the ARM .a file to cuda/prebuilt/

When using prebuilt mode on ARM (no nvcc):

Fails with clear error because the cuda/prebuilt/libtensor_kernels.a is x86_64-only

Group 2: Python/PyTorch Bindings

Expose CUDA kernels to Python for testing:

lib/bindings/kvbm/                                                                                                                                                                       
├── Cargo.toml                                    # [M] Added kernels dependency                                                                                                         
├── Cargo.lock                                    # [M] Updated deps                                                                                                                     
├── python/kvbm/__init__.py                       # [M] Export kernels module                                                                                                            
├── src/                                                                                                                                                                                 
│   ├── lib.rs                                    # [M] Register kernels submodule                                                                                                       
│   └── kernels.rs                                # [A] PyO3 bindings (NEW)
└── tests/
    ├── test_tensor_kernels.py                    # [A] PyTorch validation tests (NEW)
    └── README.md                                 # [A] Testing architecture docs (NEW - we just added this!)

Purpose: PyO3 bindings to test kernels through PyTorch

Group 3: Integration with LLM Engine

Connect open-sourced vectorized copy to existing Dynamo code:

  lib/llm/
  ├── build.rs                                      # [M] Update default location for vectorized copy
  └── src/block_manager/block/transfer/cuda.rs      # [M] Updated debug prints for better debugging

Purpose: Replace old implementation with new kernel library

Group 4: Build & CI Infrastructure

Enable building and testing:

  .github/workflows/pre-merge-rust.yml              # [M] Add kvbm-kernels to CI matrix
  container/Dockerfile.vllm                          # [M] Add CUDA dev tools (nvcc, nvlink, etc.)

Purpose: CI/CD and container support for kernel compilation in dev containers

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

New Features
- Added CUDA-accelerated tensor kernel operations for KV block manager conversions between block, universal, and operational representations
- Support for multiple data types (float16, bfloat16, float32, float64) and block layouts with configurable backends
Chores
- Extended CI workflows for container validation and code formatting verification
- Added kvbm-kernels as workspace component; updated cudarc dependency
Tests
- Added comprehensive tensor kernel regression test suite validating round-trip conversions and error handling

_{✏️ Tip: You can customize this high-level summary in your review settings.}

copy-pr-bot · 2025-12-03T22:45:33Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

oandreeva-nv · 2025-12-03T22:47:21Z

/ok to test f2353f3

coderabbitai · 2025-12-03T22:53:19Z

Walkthrough

This PR introduces a comprehensive KVBM (KV Block Manager) kernels subsystem featuring CUDA tensor operations for converting between block-structured, universal, and operational tensor layouts. A new lib/kvbm-kernels crate provides CUDA kernels, Rust wrappers, and Python bindings. Workflows are updated to include the crate in build and validation pipelines.

Changes

Cohort / File(s)	Summary
Workflow Configuration `.github/workflows/container-validation-dynamo.yml`, `.github/workflows/pre-merge-rust.yml`	Added `--enable-kvbm` flag to container build step; expanded Rust matrix to include `lib/kvbm-kernels`; added `rustfmt` toolchain and code formatting verification step.
Workspace & Bindings Manifests `Cargo.toml`, `lib/bindings/kvbm/Cargo.toml`	Registered `lib/kvbm-kernels` as workspace member; added optional `kvbm_kernels` dependency to kvbm bindings with `block-manager` feature wiring; updated `cudarc` from 0.16.2 to 0.17.1.
KVBM Kernels Crate Core `lib/kvbm-kernels/Cargo.toml`, `lib/kvbm-kernels/build.rs`, `lib/kvbm-kernels/src/lib.rs`, `lib/kvbm-kernels/src/tensor_kernels.rs`	Established new crate with dual rlib/cdylib output; build script handles prebuilt kernel validation via MD5 hashing and CUDA compilation; Rust module exposes public enums (`TensorDataType`, `BlockLayout`, `OperationalCopyDirection`, `OperationalCopyBackend`) and unsafe FFI wrappers (`universal_from_block`, `block_from_universal`, `operational_copy`).
CUDA Tensor Kernels `lib/kvbm-kernels/cuda/tensor_kernels.cu`, `lib/kvbm-kernels/cuda/prebuilt/tensor_kernels.md5`	Implements three kernels (`block_to_universal_kernel`, `universal_to_block_kernel`, `operational_pack/unpack_kernel`) with multi-type support (F16, BF16, F32, F64); exposes C-callable entry points with CUDA 12.9+ memcpy batch support; adds prebuilt MD5 hash file for artifact validation.
Python Bindings & Integration `lib/bindings/kvbm/python/kvbm/__init__.py`, `lib/bindings/kvbm/src/kernels.rs`	Exports `kernels` submodule from PyO3 bindings; implements four pyfunctions (`block_to_universal`, `universal_to_block`, `block_to_operational`, `operational_to_block`) with CUDA context management, tensor validation, pointer marshalling, and error handling.
Tests & Documentation `lib/bindings/kvbm/tests/test_tensor_kernels.py`, `lib/kvbm-kernels/README.md`, `lib/kvbm-kernels/Dockerfile`	Comprehensive PyTorch/CUDA regression tests covering block↔universal/operational round-trips, dtype/layout combinations, backend variations, and error conditions; documentation covering layouts, API usage, build/test workflows, and troubleshooting; Docker setup for reproducible CUDA builds.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60–90 minutes

build.rs: Prebuilt kernel validation logic with MD5 hash computation and comparison; conditional CUDA compilation path requires verification of arch flag handling and linking directives.
lib/kvbm-kernels/cuda/tensor_kernels.cu: Multi-layout index calculation kernels and backend dispatch logic (KernelOnly, MemcpyAsync, MemcpyBatch) with CUDA version-specific branching; validate correctness of grid/block dimensions and pointer offset logic.
lib/bindings/kvbm/src/kernels.rs: Extensive tensor metadata extraction and validation; pointer buffer preparation; ensure proper CUDA synchronization and exception mapping.
lib/kvbm-kernels/src/tensor_kernels.rs: FFI wrapper correctness and enum-to-i32 flag conversions; internal test coverage for round-trip data integrity.

Poem

🐰 CUDA kernels hop and play,
Blocks transform in every way!
From universal to ops so sleek,
With Rust and Python—it's unique!
🚀 Tensors now have wings to fly,
Building faster, reaching sky! ✨

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and specifically describes the main change: re-introducing KVBM kernels after a previous revert, capturing the primary objective of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description check	✅ Passed	The PR description provides comprehensive details across all required sections: a clear overview of the resurrection of a previous PR with CUDA version fixes, detailed breakdown of four change groups (KVBM Kernels Library, Python/PyTorch Bindings, Integration with LLM Engine, and Build/CI Infrastructure), file structures with purpose annotations, and guidance for reviewers on where to start.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (12)

lib/kvbm-kernels/Dockerfile (2)
21-21: Consider pinning Rust toolchain version for reproducibility.

Using --default-toolchain stable may lead to different Rust versions across builds. Consider pinning to a specific version (e.g., 1.83.0) for reproducible builds.

27-30: Consider copying Cargo.lock for reproducible dependency versions.

If this crate has a Cargo.lock file, copying it ensures consistent dependency versions across builds.
 COPY Cargo.toml ./
+COPY Cargo.lock ./
 COPY src ./src
 COPY build.rs ./build.rs
 COPY cuda ./cuda
lib/kvbm-kernels/cuda/prebuilt/tensor_kernels.md5 (1)
1-3: Consider documenting what each hash validates.

The file contains three MD5 hashes without context. Consider using a format like <hash> <filename> (standard md5sum output format) or adding a header comment to indicate which files these hashes correspond to.
+# MD5 checksums for prebuilt kernel artifacts
+# Format: <hash>  <filename>
-7aac008ed704fe198ef5056a4e502069
-a7d1649c148ee0366de6d19896a80116
-4234cfd2ef4b283592a37a9ceed94666
+7aac008ed704fe198ef5056a4e502069  tensor_kernels.fatbin
+a7d1649c148ee0366de6d19896a80116  <corresponding_file>
+4234cfd2ef4b283592a37a9ceed94666  <corresponding_file>
lib/bindings/kvbm/python/kvbm/__init__.py (1)
14-17: Consider guarding kernels import for non‑block‑manager builds

Exposing kvbm.kernels via from kvbm._core import kernels as kernels is the right hook for the new submodule. If there’s any scenario where _core is built without the block-manager feature, this import will raise at import time; you could optionally make it defensive, e.g.:
try:
    from kvbm._core import kernels as kernels
except ImportError:
    kernels = None  # or omit export / re-raise with a clearer message
Not required if block-manager is guaranteed to be enabled for all builds, but worth considering.
.github/workflows/pre-merge-rust.yml (1)

69-73: cargo fmt -- --check is now run twice per directory

Both the tests and clippy jobs install rustfmt and run cargo fmt -- --check for each matrix.dir. Functionally fine, but it doubles formatting work for the overlapping dirs (., lib/bindings/python, etc.). If CI time becomes a concern, you could centralize the formatting check in just one of the jobs (or a tiny dedicated job) and keep clippy/tests focused on their respective tasks.

Also applies to: 120-123
lib/bindings/kvbm/tests/test_tensor_kernels.py (2)
30-50: Unused unpacking variables in _make_blocks.

The variables nh, nt, and hd are extracted but never used in this function. Consider using underscores to indicate intentional non-use, which also satisfies the Ruff linter.
 def _make_blocks(universal: torch.Tensor, layout: str) -> List[torch.Tensor]:
-    nh, nl, no, nt, hd = universal.shape
+    _nh, nl, no, _nt, _hd = universal.shape
     blocks = []
     for layer in range(nl):
104-118: Add strict=True to zip() calls for safer iteration.

Multiple zip() calls lack the strict= parameter. Adding strict=True would catch length mismatches early rather than silently truncating, improving test reliability.
-    for produced, expected in zip(outputs, universals):
+    for produced, expected in zip(outputs, universals, strict=True):
         assert torch.allclose(produced, expected, atol=atol, rtol=rtol)
-    for produced_set, expected_set in zip(blocks, expected_blocks):
-        for produced, expected in zip(produced_set, expected_set):
+    for produced_set, expected_set in zip(blocks, expected_blocks, strict=True):
+        for produced, expected in zip(produced_set, expected_set, strict=True):
             assert torch.allclose(produced, expected, atol=atol, rtol=rtol)
Similar changes should be applied at lines 157, 171-172.
lib/kvbm-kernels/build.rs (1)
68-70: Consider more robust nvcc detection.

The current check only verifies that nvcc --version can be spawned. It doesn't distinguish between nvcc not found vs. nvcc crashing. Consider checking exit status for more precise detection.
 fn is_nvcc_available() -> bool {
-    Command::new("nvcc").arg("--version").output().is_ok()
+    Command::new("nvcc")
+        .arg("--version")
+        .output()
+        .map(|o| o.status.success())
+        .unwrap_or(false)
 }
lib/bindings/kvbm/src/kernels.rs (4)

4-4: Consider scoping the unsafe_op_in_unsafe_fn allow more narrowly.

The file-level #![allow(unsafe_op_in_unsafe_fn)] suppresses warnings for the entire file. Consider applying it only to the specific functions that need it, or adding // SAFETY: comments to document the unsafe invariants.

226-362: Consider extracting shared validation logic.

block_to_universal and universal_to_block share nearly identical validation and setup code (lines 232-301 vs 375-444). Consider extracting the common pattern into a helper to reduce duplication and maintenance burden.

A helper like prepare_universal_transfer(py, universals, blocks, layout) could return validated infos and prepared pointer buffers, with the direction-specific kernel call remaining in each function.

Also applies to: 369-505

516-660: Similar duplication in operational functions.

block_to_operational and operational_to_block also share significant code. The same refactoring suggestion applies — extract common validation and pointer preparation into a shared helper.

Also applies to: 670-814

608-616: Host pointers created but may not be needed for all backends.

block_ptrs_host and operational_ptrs_host are always allocated, but they may only be needed for specific backends (e.g., MemcpyBatch). Consider lazy allocation based on the selected backend to reduce unnecessary allocations.

Also applies to: 762-770

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 96f8008 and f2353f3.

⛔ Files ignored due to path filters (2)

Cargo.lock is excluded by !**/*.lock
lib/bindings/kvbm/Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

.github/workflows/container-validation-dynamo.yml (1 hunks)
.github/workflows/pre-merge-rust.yml (2 hunks)
Cargo.toml (1 hunks)
lib/bindings/kvbm/Cargo.toml (2 hunks)
lib/bindings/kvbm/python/kvbm/__init__.py (1 hunks)
lib/bindings/kvbm/src/kernels.rs (1 hunks)
lib/bindings/kvbm/src/lib.rs (2 hunks)
lib/bindings/kvbm/tests/test_tensor_kernels.py (1 hunks)
lib/kvbm-kernels/Cargo.toml (1 hunks)
lib/kvbm-kernels/Dockerfile (1 hunks)
lib/kvbm-kernels/README.md (1 hunks)
lib/kvbm-kernels/build.rs (1 hunks)
lib/kvbm-kernels/cuda/prebuilt/tensor_kernels.md5 (1 hunks)
lib/kvbm-kernels/cuda/tensor_kernels.cu (1 hunks)
lib/kvbm-kernels/src/lib.rs (1 hunks)
lib/kvbm-kernels/src/tensor_kernels.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (11)

📚 Learning: 2025-12-02T18:13:40.065Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 4698
File: .github/workflows/container-validation-dynamo.yml:68-68
Timestamp: 2025-12-02T18:13:40.065Z
Learning: In the ai-dynamo/dynamo repository, backend-specific tests (vllm, sglang, trtllm) are intentionally excluded from the container-validation-dynamo.yml workflow using "not (vllm or sglang or trtllm)" because they run in a separate container-validation-backends.yml workflow that has dedicated jobs for each backend. This separation keeps framework-agnostic tests separate from backend-specific tests.

Applied to files:

.github/workflows/pre-merge-rust.yml
.github/workflows/container-validation-dynamo.yml

📚 Learning: 2025-08-30T20:43:49.632Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

.github/workflows/container-validation-dynamo.yml

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

.github/workflows/container-validation-dynamo.yml

📚 Learning: 2025-12-03T01:14:42.094Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 4657
File: container/Dockerfile.vllm:264-265
Timestamp: 2025-12-03T01:14:42.094Z
Learning: In container/Dockerfile.vllm, the recursive chmod -R g+w operation during early user setup (at lines 264-265, when creating the dynamo user and initializing /workspace, /home/dynamo, /opt/dynamo) is an intentional exception to the pattern of avoiding recursive operations, as it handles pre-existing paths and dotfiles created by useradd -m before bulk content is copied.

Applied to files:

.github/workflows/container-validation-dynamo.yml

📚 Learning: 2025-09-16T17:16:07.820Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 3051
File: container/templates/Dockerfile.vllm.j2:221-233
Timestamp: 2025-09-16T17:16:07.820Z
Learning: In the dynamo project, when converting Dockerfiles to Jinja2 templates, the primary goal is backward compatibility - generated Dockerfiles must be identical to the originals. Security improvements or other enhancements that would change the generated output are out of scope for templating PRs and should be addressed separately.

Applied to files:

.github/workflows/container-validation-dynamo.yml

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
Repo: ai-dynamo/dynamo PR: 2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

.github/workflows/container-validation-dynamo.yml

📚 Learning: 2025-11-15T06:06:37.067Z

Learnt from: ryanolson
Repo: ai-dynamo/dynamo PR: 4356
File: lib/kvbm-kernels/src/python.rs:23-34
Timestamp: 2025-11-15T06:06:37.067Z
Learning: In lib/kvbm-kernels/src/python.rs, the CUDA context is hard-coded to device 0 in get_context(). This is a known limitation that should be addressed when Python bindings are re-enabled by either: (1) enforcing device_index == 0 validation with clear error messages, or (2) implementing per-device context management using a map/vec of OnceCell<Arc<CudaContext>> keyed by device_index.

Applied to files:

lib/kvbm-kernels/README.md
lib/bindings/kvbm/src/lib.rs
lib/kvbm-kernels/Cargo.toml
lib/bindings/kvbm/src/kernels.rs

📚 Learning: 2025-08-18T20:51:51.324Z

Learnt from: PeaBrane
Repo: ai-dynamo/dynamo PR: 2465
File: lib/runtime/src/pipeline/network/egress/push_router.rs:0-0
Timestamp: 2025-08-18T20:51:51.324Z
Learning: The runtime crate cannot depend on the llm crate due to architectural dependency constraints, preventing imports from lib/llm into lib/runtime.

Applied to files:

lib/bindings/kvbm/src/lib.rs

📚 Learning: 2025-10-17T22:57:25.021Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 3700
File: lib/llm/src/block_manager/numa_allocator.rs:46-64
Timestamp: 2025-10-17T22:57:25.021Z
Learning: The dynamo library (specifically the llm/block_manager module) is developed to run on Linux only, so Linux-specific system calls and APIs can be used without platform guards.

Applied to files:

lib/bindings/kvbm/src/lib.rs
lib/bindings/kvbm/Cargo.toml

📚 Learning: 2025-09-19T18:21:03.693Z

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2746
File: lib/llm/src/grpc/service/tensor.rs:269-447
Timestamp: 2025-09-19T18:21:03.693Z
Learning: In lib/llm/src/grpc/service/tensor.rs, GuanLuo prefers using unsafe Vec::from_raw_parts operations to avoid data copying for performance reasons when converting raw tensor data from Vec<u8> to typed vectors like Vec<u32>, Vec<f32>, etc., even when it involves safety risks.

Applied to files:

lib/kvbm-kernels/src/lib.rs
lib/kvbm-kernels/src/tensor_kernels.rs

📚 Learning: 2025-09-10T22:47:29.325Z

Learnt from: oandreeva-nv
Repo: ai-dynamo/dynamo PR: 2989
File: lib/llm/src/block_manager/block/transfer/cuda.rs:528-539
Timestamp: 2025-09-10T22:47:29.325Z
Learning: CUDA kernel symbol names in cuModuleGetFunction calls must exactly match the function name in the kernel source code, even if it uses different spelling conventions (like British vs American English) compared to other parts of the codebase.

Applied to files:

lib/kvbm-kernels/cuda/tensor_kernels.cu

🧬 Code graph analysis (4)

lib/bindings/kvbm/src/lib.rs (1)

lib/bindings/kvbm/src/kernels.rs (1)

add_to_module (816-822)

lib/bindings/kvbm/python/kvbm/__init__.py (1)

lib/bindings/kvbm/src/lib.rs (1)

_core (25-53)

lib/kvbm-kernels/src/lib.rs (1)

lib/kvbm-kernels/src/tensor_kernels.rs (3)

block_from_universal (148-176)

operational_copy (186-218)

universal_from_block (116-144)

lib/kvbm-kernels/src/tensor_kernels.rs (1)

lib/kvbm-kernels/cuda/tensor_kernels.cu (6)

launch_universal_from_block (391-415)

launch_universal_from_block (392-394)

launch_block_from_universal (417-441)

launch_block_from_universal (418-420)

launch_operational_copy (450-615)

launch_operational_copy (451-454)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/4742/merge) by oandreeva-nv.

lib/kvbm-kernels/cuda/tensor_kernels.cu

[error] 1-1: clang-format hook failed in pre-commit: files were modified by this hook. Re-run 'pre-commit run --all-files' to format, then commit the updated changes. Command: 'pre-commit run --show-diff-on-failure --color=always --all-files'.

🪛 LanguageTool

lib/kvbm-kernels/README.md

[grammar] ~144-~144: Ensure spelling is correct
Context: ... tensors must be CUDA accessible by the specificed device and match the expected shapes an...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🪛 markdownlint-cli2 (0.18.1)

lib/kvbm-kernels/README.md

46-46: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

177-177: Tables should be surrounded by blank lines

(MD058, blanks-around-tables)

🪛 Ruff (0.14.7)

lib/bindings/kvbm/tests/test_tensor_kernels.py

38-38: Unpacked variable nh is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

38-38: Unpacked variable nt is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

38-38: Unpacked variable hd is never used

Prefix it with an underscore or any other dummy variable pattern

(RUF059)

48-48: Avoid specifying long messages outside the exception class

(TRY003)

104-104: zip() without an explicit strict= parameter