Skip to content

[Bug] Two ring-buffer allocator defects in pto_ring_buffer.h: heap wrap-around off-by-one and DepListPool sentinel collision #429

@chenshengxin2026

Description

@chenshengxin2026

Platform

a2a3 (Ascend 910B/C hardware)

Runtime Variant

tensormap_and_ringbuffer

Description

Two related allocation logic defects in src/a2a3/runtime/tensormap_and_ringbuffer/runtime/pto_ring_buffer.h:

Bug 1 — Heap wrap-around: strict > should be >= in try_bump_heap (line ~270)

PTO2TaskAllocator::try_bump_heap uses tail > alloc_size (strict greater-than) when checking whether to wrap the heap pointer to the beginning of the ring. When tail == alloc_size there is exactly enough space at [0, alloc_size), but the condition incorrectly rejects it, so the allocator spins until deadlock.

Fix: change tail > alloc_sizetail >= alloc_size.

Bug 2 — DepListPool: top % capacity returns the NULL sentinel slot (index 0)

PTO2DepListPool::alloc() computes the physical index as idx = top % capacity (line ~444). When top is a multiple of capacity (e.g. top=8, capacity=8) the result is 0 — the same slot that init() reserved as the NULL sentinel. Handing out &entries_[0] overwrites the sentinel, breaking pto2_dep_pool_get() for callers that rely on a nullptr chain terminator at index 0.

Fix: offset the usable range so index 0 is never handed out (e.g. idx = (top % capacity) + 1 with capacity reduced by 1, or start top at 1).

Steps to Reproduce

  1. Run the ring-buffer unit tests that exercise allocation wrap-around and dep-list pool full-cycle allocation (added in PR Add: unit tests for C++ runtime modules and Python utilities #427):
    ctest -R test_ring_buffer
    
  2. The HeapWrapAround test case deadlocks (Bug 1).
  3. The DepListPoolSentinelCollision test case fails with sentinel corruption (Bug 2).

Expected Behavior

  • try_bump_heap should wrap the heap pointer to the start of the ring when tail == alloc_size, allowing allocation to succeed.
  • PTO2DepListPool::alloc() should never return &entries_[0] (the NULL sentinel slot); the sentinel must remain intact for the lifetime of the pool.

Actual Behavior

  • Bug 1: When tail == alloc_size the allocator fails to wrap, enters an infinite spin-wait, and the test times out / deadlocks.
  • Bug 2: When the pool has been fully cycled (top reaches a multiple of capacity), alloc() returns &entries_[0], overwriting the NULL sentinel and corrupting the dep-list chain termination logic.

Git Commit ID

9aa9681

Host Platform

Linux (aarch64)

Additional Context

Both defects were discovered while writing unit tests for the ring buffer subsystem. Related PR: #427

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions