Skip to content

Conversation

@linamy85
Copy link
Collaborator

@linamy85 linamy85 commented Jan 20, 2026

Baseline benchmarking for host-device transfer. This benchmark work is WIP so the implementation/result may change in the future.

Example output:

+ export TPU_VISIBLE_CHIPS=0
+ TPU_VISIBLE_CHIPS=0
+ bash ./Ironwood/scripts/run_host_device_benchmark.sh --config Ironwood/configs/host_device/host_device_single_chip.yaml
--- Starting Host-Device Transfer Benchmark (H2D/D2H) ---
*****************************************************
*  WARNING: THIS BENCHMARK IS A WORK IN PROGRESS    *
*  Results may be unstable or subject to change.    *
*****************************************************
Interleaved: false
--- Running Config: Ironwood/configs/host_device/host_device_single_chip.yaml ---

==============================Starting benchmark 'host_device'==============================

Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 1, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_0'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 1 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 16, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_1'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 16 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 128, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_2'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 128 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 256, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_3'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 256 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 512, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_4'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 512 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 1024, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_5'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 1024 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 2048, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_6'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 2048 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 4096, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_7'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 4096 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 8192, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_8'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 8192 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 16384, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_9'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 16384 MB on 2 devices, 20 runs
Running benchmark: host_device with params: {'num_devices': 2, 'smart_chunking': False, 'data_size_mb': 32768, 'num_runs': 20, 'trace_dir': '../microbenchmarks/host_device/single_chip/trace/benchmark_10'}
tpu_devices [TpuDevice(id=0, process_index=0, coords=(0,0,0), core_on_chip=0), TpuDevice(id=1, process_index=0, coords=(0,0,0), core_on_chip=1)]
Simple Benchmark: 32768 MB on 2 devices, 20 runs
Metrics written to CSV at ../microbenchmarks/host_device/single_chip/t_host_device_MV1MSGY5MO.tsv.
--- Finished Config: Ironwood/configs/host_device/host_device_single_chip.yaml ---

--- All Benchmarks Finished ---

@chishuen chishuen self-requested a review January 20, 2026 07:10
@chishuen
Copy link
Collaborator

Could you add a short description of what this PR enables? Thanks

@linamy85 linamy85 force-pushed the feature/simple-host-device-baseline branch 3 times, most recently from 1292a3a to c8eafc0 Compare January 20, 2026 09:26
@linamy85 linamy85 force-pushed the feature/simple-host-device-baseline branch from c8eafc0 to 007d810 Compare January 20, 2026 09:41
@linamy85 linamy85 force-pushed the feature/simple-host-device-baseline branch from 007d810 to 9720b56 Compare January 20, 2026 09:48
Copy link
Collaborator

@hylin2002 hylin2002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@linamy85 linamy85 merged commit a497d26 into AI-Hypercomputer:main Jan 20, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants