Test Mosaic Tutorial Post 2.11 by sekyondaMeta · Pull Request #3809 · pytorch/tutorials

sekyondaMeta · 2026-03-30T18:42:54Z

Test mosaic tutorial post 2.11 release

Disabled GPT-2 dropout (resid_pdrop=0, attn_pdrop=0, embd_pdrop=0) in run_training_ac() to work around a PyTorch 2.11 bug where the CUDA dropout kernel crashes during gradient checkpointing recomputation (#3774). Dropout has no impact on this tutorial's purpose of memory profiling with Mosaic.

pytorch-bot · 2026-03-30T18:42:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3809

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2eaef42 with merge base 084b358 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Switches gradient_checkpointing_enable() to use non-reentrant checkpointing, which properly preserves dropout RNG state during recomputation and resolves the SystemError during loss.backward(). Issue: #3774

This reverts commit 6e486af.

Disable dropout (resid_pdrop=0, attn_pdrop=0, embd_pdrop=0) in the run_training_ac function to avoid SystemError from _VF.dropout returning NULL during backward recomputation of GPT2Block. Dropout is irrelevant to the memory profiling purpose of this tutorial. Issue: #3774

sekyondaMeta · 2026-04-01T19:01:33Z

@basilwong here is an attempt to get this to work. It is a bandaid at best, still needs an actual fix

Test Mosaic Tutorial Post 2.11

1cf72d3

meta-cla bot added the cla signed label Mar 30, 2026

sekyondaMeta added 5 commits March 30, 2026 19:00

Update validate_tutorials_built.py

3c8fc59

Fix activation checkpointing crash by using use_reentrant=False

6e486af

Switches gradient_checkpointing_enable() to use non-reentrant checkpointing, which properly preserves dropout RNG state during recomputation and resolves the SystemError during loss.backward(). Issue: #3774

Merge branch 'main' into test_mosaic_tutorial_2.11

c473f6a

Revert "Fix activation checkpointing crash by using use_reentrant=False"

f92d01b

This reverts commit 6e486af.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Mosaic Tutorial Post 2.11#3809

Test Mosaic Tutorial Post 2.11#3809
sekyondaMeta wants to merge 6 commits intomainfrom
test_mosaic_tutorial_2.11

sekyondaMeta commented Mar 30, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 30, 2026 •

edited

Loading

Uh oh!

sekyondaMeta commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sekyondaMeta commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/tutorials/3809

✅ No Failures

Uh oh!

sekyondaMeta commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sekyondaMeta commented Mar 30, 2026 •

edited

Loading

pytorch-bot bot commented Mar 30, 2026 •

edited

Loading