Skip to content

fix: run a5sim bgemm through the shared pipe path#397

Open
zhoubot wants to merge 1 commit intohw-native-sys:mainfrom
zhoubot:codex/issue-26-a5sim-bgemm-fix
Open

fix: run a5sim bgemm through the shared pipe path#397
zhoubot wants to merge 1 commit intohw-native-sys:mainfrom
zhoubot:codex/issue-26-a5sim-bgemm-fix

Conversation

@zhoubot
Copy link
Copy Markdown

@zhoubot zhoubot commented Mar 30, 2026

Summary

  • remove the __CPU_SIM-only GM fallback from the A5 BGEMM mixed kernel so sim and hardware use the same [A, B, C] TPUSH/TPOP interface
  • export per-dispatch CPU-sim execution context and task-cookie hooks from the A5 sim runtime so PTO ISA can share pipe state correctly across AIC/AIV DSOs
  • fix a5sim kernel compilation to preserve the requested core role (__DAV_CUBE__ vs __DAV_VEC__) instead of building two no-op host kernels

Root Cause

There were two separate issues in the end-to-end a5sim path:

  1. The example kernel still had a CPU-only ABI split, so simulation was not exercising the same TPUSH/TPOP call interface as A5 hardware.
  2. The host-side sim compiler ignored core_type, so both compiled kernel binaries were missing __DAV_CUBE__ / __DAV_VEC__. That left the mixed BGEMM kernels effectively empty even after PTO-ISA pipe support was fixed.

This PR switches the example back to the shared cross-platform interface and publishes the runtime hooks that the PTO-ISA fix consumes.

Validation

source .venv313/bin/activate
export PTO_ISA_ROOT=/Users/zhoubot/github/pto-isa
export CC=/opt/homebrew/bin/gcc-15
export CXX=/opt/homebrew/bin/g++-15
python examples/scripts/run_example.py \
  -k tests/st/a5/tensormap_and_ringbuffer/bgemm/kernels \
  -g tests/st/a5/tensormap_and_ringbuffer/bgemm/golden.py \
  -p a5sim \
  --build

Result:

  • TEST PASSED
  • C: PASS (131072/131072 elements matched)

Related

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the CPU simulation environment by introducing a mechanism to manage execution context, task cookies, and shared storage. It implements thread-local variables and a global storage map to track simulation state, and updates the kernel compiler to handle core-specific definitions and platform-specific linker flags. These changes enable the unification of kernel code by removing simulation-specific branches in test kernels. I have no feedback to provide.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dde86eb8a1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

uint64_t start_time = get_sys_cnt_aicore();

// Execute the task
pto_cpu_sim_set_task_cookie(reinterpret_cast<uint64_t>(payload->args));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Guard CPU-sim task-cookie hook behind __CPU_SIM

aicore_execute now calls pto_cpu_sim_set_task_cookie(...) unconditionally, but the only implementation in this repo is in the sim host runner (src/a5/platform/sim/host/device_runner.cpp). This executor source is shared by the A5 runtime build (src/a5/runtime/tensormap_and_ringbuffer/build_config.py) and is linked for onboard via src/a5/platform/onboard/aicore/CMakeLists.txt, so non-sim A5 builds/runtime binaries will carry an unresolved CPU-sim symbol path. This should be gated to CPU simulation (or have a non-sim definition) to avoid breaking onboard builds.

Useful? React with 👍 / 👎.

@zhoubot zhoubot force-pushed the codex/issue-26-a5sim-bgemm-fix branch 3 times, most recently from 8f28c20 to 61ee9e8 Compare March 31, 2026 02:27
@zhoubot zhoubot force-pushed the codex/issue-26-a5sim-bgemm-fix branch from 61ee9e8 to da3d1e0 Compare March 31, 2026 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant