Hello,
I noticed that the current implementation of SageAttention3 lacks support for the backward pass. This is somewhat confusing, as the SageAttention3 paper includes graphs and data explicitly benchmarking the backward pass performance.
Since many users (myself included) intend to use this library for training and fine-tuning models rather than just inference, the lack of backward gradients is a major blocker.
Could you please prioritize adding backward pass support to align the repository with the capabilities presented in the paper?
Thank you!
