-
Notifications
You must be signed in to change notification settings - Fork 38
[Bug] Two ring-buffer allocator defects in pto_ring_buffer.h: heap wrap-around off-by-one and DepListPool sentinel collision #429
Description
Platform
a2a3 (Ascend 910B/C hardware)
Runtime Variant
tensormap_and_ringbuffer
Description
Two related allocation logic defects in src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_ring_buffer.h:
Bug 1 — Heap wrap-around: strict > should be >= in try_bump_heap (line ~270)
PTO2TaskAllocator::try_bump_heap uses tail > alloc_size (strict greater-than) when checking whether to wrap the heap pointer to the beginning of the ring. When tail == alloc_size there is exactly enough space at [0, alloc_size), but the condition incorrectly rejects it, so the allocator spins until deadlock.
Fix: change tail > alloc_size → tail >= alloc_size.
Bug 2 — DepListPool: top % capacity returns the NULL sentinel slot (index 0)
PTO2DepListPool::alloc() computes the physical index as idx = top % capacity (line ~444). When top is a multiple of capacity (e.g. top=8, capacity=8) the result is 0 — the same slot that init() reserved as the NULL sentinel. Handing out &entries_[0] overwrites the sentinel, breaking pto2_dep_pool_get() for callers that rely on a nullptr chain terminator at index 0.
Fix: offset the usable range so index 0 is never handed out (e.g. idx = (top % capacity) + 1 with capacity reduced by 1, or start top at 1).
Steps to Reproduce
- Run the ring-buffer unit tests that exercise allocation wrap-around and dep-list pool full-cycle allocation (added in PR Add: unit tests for C++ runtime modules and Python utilities #427):
ctest -R test_ring_buffer - The
HeapWrapAroundtest case deadlocks (Bug 1). - The
DepListPoolSentinelCollisiontest case fails with sentinel corruption (Bug 2).
Expected Behavior
try_bump_heapshould wrap the heap pointer to the start of the ring whentail == alloc_size, allowing allocation to succeed.PTO2DepListPool::alloc()should never return&entries_[0](the NULL sentinel slot); the sentinel must remain intact for the lifetime of the pool.
Actual Behavior
- Bug 1: When
tail == alloc_sizethe allocator fails to wrap, enters an infinite spin-wait, and the test times out / deadlocks. - Bug 2: When the pool has been fully cycled (
topreaches a multiple ofcapacity),alloc()returns&entries_[0], overwriting the NULL sentinel and corrupting the dep-list chain termination logic.
Git Commit ID
Host Platform
Linux (aarch64)
Additional Context
Both defects were discovered while writing unit tests for the ring buffer subsystem. Related PR: #427
Metadata
Metadata
Assignees
Labels
Type
Projects
Status