Skip to content

Conversation

@hylin2002
Copy link
Collaborator

Summary

This PR introduces configurations for various Ironwood topologies (ranging from 2x2x1 to 4x4x4) and updates the time extraction logic for accurate benchmarking.

Key Changes

  • New Topology Configurations: Added 3-in-1 configurations (all-gather, psum (all-reduce), and all-to-all) for multiple topologies in Ironwood/configs/collectives/.
  • Updated Time Extraction Logic: Modified Ironwood/src/benchmark_utils.py to correctly accumulate execution time for split psum operations.

Reason for Logic Change

Previously, a single JAX psum operation generated two distinct HLOs (psum and ffi_call). The old logic treated these as two separate collective operations (e.g., if num_runs=5, it extracted 10 events).

This caused a ZeroDivisionError because the ffi_call completes in near 0ms. From a collectives perspective, these two HLOs represent a single logical step, so their execution times are now aggregated to prevent errors and ensure accurate timing.

@hylin2002 hylin2002 force-pushed the add-collectives-config branch from a9815ad to e8526fb Compare January 21, 2026 06:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant