Add: paged attention unroll scene test with 4D input shapes by chenshengxin2026 · Pull Request #380 · hw-native-sys/simpler

chenshengxin2026 · 2026-03-27T06:17:44Z

New paged_attention_unroll_4dims test under tensormap_and_ringbuffer
Query and output tensors use 4D format (batch, seq_len, num_heads, head_dim)
6 kernels: QK/PV matmul (AIC), softmax_prepare/online_update (AIV), hub stubs
Orchestration with N_UNROLL=64, 4 tasks per group, online softmax accumulation
Golden wraps shared paged_attention_golden with 4D reshape adapter
Three test cases: varying batch/heads/head_dim at production scale (bfloat16)

gemini-code-assist

Code Review

This pull request introduces a paged attention implementation with unrolling support, featuring specialized AIC kernels for matrix multiplications (QK and PV) and AIV kernels for vector operations (Softmax and Online Update). It also includes an orchestration layer to manage task scheduling and memory layout. Review feedback suggests improving the robustness of the golden computation logic in golden.py to avoid potential issues with tensor reshaping and correcting a documentation inconsistency regarding the N_UNROLL value in the orchestration code.

tests/st/a2a3/tensormap_and_ringbuffer/paged_attention_unroll_4dims/golden.py

...p_and_ringbuffer/paged_attention_unroll_4dims/kernels/orchestration/paged_attention_orch.cpp

- New paged_attention_unroll_4dims test under tensormap_and_ringbuffer - Query and output tensors use 4D format (batch, seq_len, num_heads, head_dim) - 6 kernels: QK/PV matmul (AIC), softmax_prepare/online_update (AIV), hub stubs - Orchestration with N_UNROLL=64, 4 tasks per group, online softmax accumulation - Golden wraps shared paged_attention_golden with 4D reshape adapter - Three test cases: varying batch/heads/head_dim at production scale (bfloat16)

ChaoZheng109 · 2026-03-27T08:45:43Z

...p_and_ringbuffer/paged_attention_unroll_4dims/kernels/orchestration/paged_attention_orch.cpp

+    uint32_t value_cache_shapes[4] = {(uint32_t)total_blocks_count, (uint32_t)block_size, (uint32_t)kv_head_num, (uint32_t)head_dim};
+    uint32_t out_shapes[4] = {(uint32_t)batch, (uint32_t)seq_len, (uint32_t)num_heads, (uint32_t)head_dim};
+
+    Tensor query = make_tensor_external(query_ptr, query_shapes, 4, data_type, false);


如果这里的shape和golden返回的是一致的，可直接调用to_tensor()接口

gemini-code-assist bot reviewed Mar 27, 2026

View reviewed changes

tests/st/a2a3/tensormap_and_ringbuffer/paged_attention_unroll_4dims/golden.py Show resolved Hide resolved

...p_and_ringbuffer/paged_attention_unroll_4dims/kernels/orchestration/paged_attention_orch.cpp Show resolved Hide resolved

chenshengxin2026 force-pushed the add-paged-attention-unroll-4dims-st branch from 2125ab0 to d8ccf87 Compare March 27, 2026 08:35

ChaoZheng109 reviewed Mar 27, 2026

View reviewed changes

ChaoWao marked this pull request as draft March 30, 2026 01:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add: paged attention unroll scene test with 4D input shapes#380

Add: paged attention unroll scene test with 4D input shapes#380
chenshengxin2026 wants to merge 1 commit intohw-native-sys:mainfrom
chenshengxin2026:add-paged-attention-unroll-4dims-st

chenshengxin2026 commented Mar 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

ChaoZheng109 Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

chenshengxin2026 commented Mar 27, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

ChaoZheng109 Mar 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants