fix(handshake): add store-store barriers and sim bootstrap assertion#420
fix(handshake): add store-store barriers and sim bootstrap assertion#420yanghaoran29 wants to merge 1 commit intohw-native-sys:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the OUT_OF_ORDER_STORE_BARRIER() macro to ensure proper memory visibility during core handshakes and emergency shutdowns across multiple executor components. Additionally, it defines the __CPU_SIM macro for simulation builds and adds a validation check to enforce a 1:1 mapping between logical and physical core IDs in simulation mode. I have no feedback to provide.
8f2cc0d to
f5b95a7
Compare
f5b95a7 to
87b2f05
Compare
Handshake Struct Memory Ordering FixesBackgroundThe Struct definition (using struct Handshake {
volatile uint32_t aicpu_ready; // AICPU -> AICore
volatile uint64_t task; // AICPU -> AICore (init only)
volatile CoreType core_type; // AICore -> AICPU
volatile uint64_t perf_records_addr; // AICPU -> AICore (profiling)
volatile uint32_t physical_core_id; // AICore -> AICPU
volatile uint32_t aicpu_regs_ready; // AICPU -> AICore
volatile uint32_t aicore_regs_ready; // AICore -> AICPU
volatile uint32_t aicore_done; // AICore -> AICPU
...
} __attribute__((aligned(64)));The handshake protocol has four phases:
Problem ManifestationExcerpt from one runtime log:
FixesFix 1: Missing store-store barrier in Phase 1 (AICPU side,
|
Five fixes to the AICPU-AICore handshake protocol, applied to both a2a3 and a5:
AICore side (Phase 2a): add OUT_OF_ORDER_STORE_BARRIER() between writing physical_core_id and aicore_regs_ready=1 in all aicore_executor.cpp files. Without it, aarch64 weak memory ordering allows the signal store to become visible before physical_core_id, causing AICPU to index registers with an uninitialized value.
AICPU side (Phase 1): add OUT_OF_ORDER_STORE_BARRIER() between writing task and aicpu_ready=1 in handshake_all_cores() for aicpu_build_graph and tensormap_and_ringbuffer (a2a3 and a5). Without it, AICore may cache a null payload pointer immediately after seeing aicpu_ready=1, causing all subsequent kernel dispatches to dereference a null pointer.
AICPU emergency_shutdown: add OUT_OF_ORDER_STORE_BARRIER() before aicpu_regs_ready=1 in all three a2a3 runtimes (aicpu_build_graph, tensormap_and_ringbuffer, host_build_graph) to ensure any preceding stores are globally visible before the signal is observed by AICore.
Sim bootstrap assertion: add #ifdef __CPU_SIM check that physical_core_id == worker_id in handshake_all_cores(). Sim assigns a 1:1 mapping; a mismatch indicates a broken platform bootstrap. __CPU_SIM is added to sim/aicpu/CMakeLists.txt for AICPU sim builds.
memory_barrier.h: define OUT_OF_ORDER_STORE_BARRIER() (dmb ishst) for both a2a3 and a5 platforms so AICPU code can use the same macro as AICore.