Skip to content

IBGDA companion QP + counter infrastructure#989

Closed
dmwu wants to merge 3 commits intometa-pytorch:mainfrom
dmwu:export-D94937308
Closed

IBGDA companion QP + counter infrastructure#989
dmwu wants to merge 3 commits intometa-pytorch:mainfrom
dmwu:export-D94937308

Conversation

@dmwu
Copy link
Contributor

@dmwu dmwu commented Mar 8, 2026

Summary:
Add companion QP and counter infrastructure to the IBGDA transport layer. Each peer gets a companion QP (core_direct=true for WAIT WQE support) used for NIC completion tracking via loopback RDMA atomics. A zero-based sink buffer (ibv_reg_mr_iova2 with iova=0) discards RDMA atomic return values.

Device-side additions to P2pIbgdaTransportDevice:

  • signal_remote(): RDMA atomic to caller-provided remote signal buffer
  • signal_remote_with_fence(): same with IBV_SEND_FENCE for data ordering
  • put_signal_counter_remote(): compound data + signal + counter operation
  • signal_counter_remote(): signal + counter without data write
  • wait_local(work, timeout): CQ poll at specific WQE index with optional timeout
  • All signal/counter buffer addresses are caller-provided (window-owned)

Reviewed By: siyengar

Differential Revision: D94937308

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 8, 2026
dmwu added 3 commits March 9, 2026 09:05
Summary: Pull Request resolved: meta-pytorch#987

Differential Revision: D94633071
Summary:

[pipes-ibgda] Signal ownership: move signal buffers from transport to caller

Refactor the IBGDA transport layer so that signal buffers are owned by the
window layer (HostWindow/DeviceWindow), not by P2pIbgdaTransportDevice.
This separation is foundational for the unified dual NVL+IBGDA window design,
where the window allocates, RDMA-registers, and exchanges all signal/counter
buffers across peers, while the transport provides only QPs for RDMA operations.

Changes to P2pIbgdaTransportDevice:
- Remove signal buffer members; all signal APIs (put_signal, wait_signal,
  read_signal, reset_signal) now take explicit IbgdaLocalBuffer/IbgdaRemoteBuffer
  arguments so the caller controls buffer lifetime and placement.
- reset_signal() rewritten as an RDMA inline write (doca_gpu_dev_verbs_p<uint64_t>)
  to a remote signal buffer, replacing the old local volatile store. Signals are
  only ever updated by remote RDMA atomics, so resetting must also go through RDMA.
- Add kDefaultDeviceTimeoutCycles (~5-7s on typical GPU clocks) as a compile-time
  default for device-side wait operations to prevent indefinite hangs.

Callers updated:
- MultipeerIbgdaDeviceTransport, MultipeerIbgdaTransportCuda: pass explicit
  signal buffers through kernel launch wrappers.
- MultipeerIbgdaTransportTest: all existing test cases updated to allocate and
  pass signal buffers explicitly.
- IbgdaBenchmark: all existing benchmarks updated to pass signal buffers.

Reviewed By: siyengar

Differential Revision: D94937310
Summary:

Add companion QP and counter infrastructure to the IBGDA transport layer. Each peer gets a companion QP (core_direct=true for WAIT WQE support) used for NIC completion tracking via loopback RDMA atomics. A zero-based sink buffer (ibv_reg_mr_iova2 with iova=0) discards RDMA atomic return values.

Device-side additions to P2pIbgdaTransportDevice:
- signal_remote(): RDMA atomic to caller-provided remote signal buffer
- signal_remote_with_fence(): same with IBV_SEND_FENCE for data ordering
- put_signal_counter_remote(): compound data + signal + counter operation
- signal_counter_remote(): signal + counter without data write
- wait_local(work, timeout): CQ poll at specific WQE index with optional timeout
- All signal/counter buffer addresses are caller-provided (window-owned)

Reviewed By: siyengar

Differential Revision: D94937308
@meta-codesync
Copy link
Contributor

meta-codesync bot commented Mar 9, 2026

This pull request has been merged in ecf383a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. fb-exported Merged meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant