Add: paged attention unroll scene test with 4D input shapes#380
Draft
chenshengxin2026 wants to merge 1 commit intohw-native-sys:mainfrom
Draft
Add: paged attention unroll scene test with 4D input shapes#380chenshengxin2026 wants to merge 1 commit intohw-native-sys:mainfrom
chenshengxin2026 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
Contributor
chenshengxin2026
commented
Mar 27, 2026
- New paged_attention_unroll_4dims test under tensormap_and_ringbuffer
- Query and output tensors use 4D format (batch, seq_len, num_heads, head_dim)
- 6 kernels: QK/PV matmul (AIC), softmax_prepare/online_update (AIV), hub stubs
- Orchestration with N_UNROLL=64, 4 tasks per group, online softmax accumulation
- Golden wraps shared paged_attention_golden with 4D reshape adapter
- Three test cases: varying batch/heads/head_dim at production scale (bfloat16)
There was a problem hiding this comment.
Code Review
This pull request introduces a paged attention implementation with unrolling support, featuring specialized AIC kernels for matrix multiplications (QK and PV) and AIV kernels for vector operations (Softmax and Online Update). It also includes an orchestration layer to manage task scheduling and memory layout. Review feedback suggests improving the robustness of the golden computation logic in golden.py to avoid potential issues with tensor reshaping and correcting a documentation inconsistency regarding the N_UNROLL value in the orchestration code.
tests/st/a2a3/tensormap_and_ringbuffer/paged_attention_unroll_4dims/golden.py
Show resolved
Hide resolved
...p_and_ringbuffer/paged_attention_unroll_4dims/kernels/orchestration/paged_attention_orch.cpp
Show resolved
Hide resolved
- New paged_attention_unroll_4dims test under tensormap_and_ringbuffer - Query and output tensors use 4D format (batch, seq_len, num_heads, head_dim) - 6 kernels: QK/PV matmul (AIC), softmax_prepare/online_update (AIV), hub stubs - Orchestration with N_UNROLL=64, 4 tasks per group, online softmax accumulation - Golden wraps shared paged_attention_golden with 4D reshape adapter - Three test cases: varying batch/heads/head_dim at production scale (bfloat16)
2125ab0 to
d8ccf87
Compare
| uint32_t value_cache_shapes[4] = {(uint32_t)total_blocks_count, (uint32_t)block_size, (uint32_t)kv_head_num, (uint32_t)head_dim}; | ||
| uint32_t out_shapes[4] = {(uint32_t)batch, (uint32_t)seq_len, (uint32_t)num_heads, (uint32_t)head_dim}; | ||
|
|
||
| Tensor query = make_tensor_external(query_ptr, query_shapes, 4, data_type, false); |
Collaborator
There was a problem hiding this comment.
如果这里的shape和golden返回的是一致的,可直接调用to_tensor()接口
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.