Skip to content
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
db44fc2
[PyTorch][CP] Fix THD AllGather CP: offset-based approach with proper…
sudhakarsingh27 Apr 7, 2026
1a5ca4c
[PyTorch][CP] Enable THD+all_gather tests in test_attention_with_cp
sudhakarsingh27 Apr 7, 2026
b4db9eb
[PyTorch][Fused Attn] Fix max_logit masking for non-zero-starting cu_…
sudhakarsingh27 Apr 7, 2026
7491ab6
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 7, 2026
b957725
some cleanup of ag+thd impl and gate e e te test for flash+ag+thd
sudhakarsingh27 Apr 10, 2026
c89173c
Merge branch 'cp_thd_swa_with_ag' of github.com:sudhakarsingh27/Trans…
sudhakarsingh27 Apr 10, 2026
18e41bd
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into cp_th…
sudhakarsingh27 Apr 10, 2026
0b48746
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 10, 2026
608106d
improve the logic and remvoe for loop from the code
sudhakarsingh27 Apr 13, 2026
4b95130
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 13, 2026
15af3af
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into cp_th…
sudhakarsingh27 Apr 13, 2026
5bec5b3
Merge branch 'cp_thd_swa_with_ag' of github.com:sudhakarsingh27/Trans…
sudhakarsingh27 Apr 13, 2026
89b1066
AG+THD SWA: extend KV visibility for right window and rename a2a-spec…
sudhakarsingh27 Apr 16, 2026
55fc2cd
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 16, 2026
f499f59
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into cp_th…
sudhakarsingh27 Apr 20, 2026
2569a65
Merge branch 'cp_thd_swa_with_ag' of github.com:sudhakarsingh27/Trans…
sudhakarsingh27 Apr 20, 2026
4e4212f
resolved merge conflicts with main
sudhakarsingh27 Apr 23, 2026
10e4cfc
[PyTorch] Add pad_between_seqs support for FlashAttention 3 with CP
sudhakarsingh27 Apr 24, 2026
2a49dee
[PyTorch] Add pad_between_seqs tests for CP and non-CP FlashAttention
sudhakarsingh27 Apr 24, 2026
34e3d62
[QA] Add CP deterministic tests to L3 and support TE_PATH in FA test
sudhakarsingh27 Apr 24, 2026
4745f98
[PyTorch] Fix FA3 deterministic gate to match upstream backward const…
sudhakarsingh27 Apr 24, 2026
4be004f
[PyTorch] Disable FlashAttention 4 for pad_between_seqs with THD
sudhakarsingh27 Apr 24, 2026
c476f15
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 Apr 24, 2026
a2b0f1b
[QA] Fix cutlass-dsl utils shadow in FA versions test
sudhakarsingh27 Apr 25, 2026
0ee22c7
merge conflicts with main
sudhakarsingh27 Apr 26, 2026
dfc1472
Merge branch 'main' of https://github.com/NVIDIA/TransformerEngine in…
sudhakarsingh27 Apr 26, 2026
ac38d4f
merge flash attn pad bw seqs
sudhakarsingh27 Apr 26, 2026
b94e175
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 Apr 28, 2026
7ebe3d9
fixes after merging with flash_attn_pad_bw_seqs branchj
sudhakarsingh27 Apr 28, 2026
ddaa196
Merge branch 'main' of https://github.com/NVIDIA/TransformerEngine in…
sudhakarsingh27 Apr 28, 2026
fc9182f
skip tests which OOM in deterministic+backward+hopper+large_configs a…
sudhakarsingh27 Apr 29, 2026
636666f
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 Apr 29, 2026
7928bc9
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 29, 2026
1585ebb
Merge branch 'flash_attn_pad_bw_seqs' of github.com:sudhakarsingh27/T…
sudhakarsingh27 Apr 29, 2026
7ecad01
[PyTorch][CP] Replace Python-loop THD reorder with kernel-backed perm…
sudhakarsingh27 Apr 29, 2026
d8bf5c5
Merge remote-tracking branch 'sudhakar_repo/flash_attn_pad_bw_seqs' i…
sudhakarsingh27 Apr 29, 2026
cc104d3
[PyTorch][CP] Fix AllGather SBHD forward: set cu_seqlens_kv_per_step
sudhakarsingh27 Apr 29, 2026
2464f43
make cp det and nondet tests run in parallel whenever possible
sudhakarsingh27 Apr 30, 2026
26e9f6f
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Apr 30, 2026
611d876
[PyTorch][CP] Fix THD AllGather forward stream race on k_ag/v_ag
sudhakarsingh27 Apr 30, 2026
0aae820
Merge branch 'cp_thd_swa_with_ag' of github.com:sudhakarsingh27/Trans…
sudhakarsingh27 Apr 30, 2026
789ccf0
Merge branch 'main' into flash_attn_pad_bw_seqs
sudhakarsingh27 May 1, 2026
0a32185
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into flash…
sudhakarsingh27 May 4, 2026
c33cf2d
Merge branch 'flash_attn_pad_bw_seqs' of github.com:sudhakarsingh27/T…
sudhakarsingh27 May 4, 2026
353361a
Merge remote-tracking branch 'sudhakar_repo/flash_attn_pad_bw_seqs' i…
sudhakarsingh27 May 4, 2026
a1062d9
Merge branch 'main' of github.com:NVIDIA/TransformerEngine into cp_th…
sudhakarsingh27 May 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 0 additions & 4 deletions tests/pytorch/attention/test_attention_with_cp.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,6 @@ def test_cp_with_flash_attention(dtype, model, qkv_format, cp_comm_type):
if cp_comm_type == "all_gather" and config.attn_bias_type != "no_bias":
pytest.skip("CP implementation with KV all-gather does not support bias yet!")
if qkv_format == "thd":
if cp_comm_type == "all_gather":
pytest.skip("CP implementation with KV all-gather does not support THD format yet!")
if cp_comm_type == "a2a+p2p":
pytest.skip(
"CP implementation with QKVO A2A+P2P (Hierarchical A2A) does not support THD format"
Expand Down Expand Up @@ -267,8 +265,6 @@ def test_cp_with_fused_attention(
if qkv_format == "thd" and config.attn_bias_type == "post_scale_bias":
pytest.skip("THD format does not support post_scale_bias yet!")
if qkv_format == "thd":
if cp_comm_type == "all_gather":
pytest.skip("CP implementation with KV all-gather does not support THD format yet!")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general comment - please run the CP file with "test_essential=False" offline because the essential tests may not cover everything.

if cp_comm_type == "a2a+p2p":
pytest.skip(
"CP implementation with QKVO A2A+P2P (Hierarchical A2A) does not support THD format"
Expand Down
Loading