Any Plan to Support FlashAttention3？

Hi, thanks for the great work!

I'd like to ask if there are plans to add FlashAttention-3–style Hopper kernels (WGMMA + TMA) to the block-sparse attention backend.

FA3 gives a large speedup on H100/H200, but current block-sparse kernels seem closer to FA2, so there’s still a noticeable gap on Hopper even at high sparsity.

Thanks!