Set Default Behavior to Stop Training Upon Convergence by michaelmckinsey1 · Pull Request #16 · LBANN/ScaFFold

michaelmckinsey1 · 2026-02-06T02:36:09Z

Set the default behavior of the benchmark to stop training after reaching a target dice score (default of 0.95) instead of training for a certain number of epochs.
Create testing config that only runs for 10 epochs

PatrickRMiles · 2026-02-19T19:13:38Z

ScaFFold/utils/trainer.py

-                        f"val_score of {val_score} is > threshold of 0.95. Benchmark run complete. Wrapping up..."
-                    )
-                    return 0
+                dice_score_train = dice_sum


I think we should actually use val_score here in place of dice_sum. dice_sum is the per-batch dice score, whereas val_score is averaged over all batches in an epoch. So just replace this line with dice_score_train = val_score and then this PR is good to go.

michaelmckinsey1 added 7 commits January 29, 2026 14:55

init

02487d5

debug

40d7760

testing

2f18e4d

Enable configuring n_categories

370c20e

set checkpoint interval

2cc6b3f

Merge remote-tracking branch 'origin/checkpoint-interval' into procruns

0c8155f

cleanup

75a1b87

michaelmckinsey1 self-assigned this Feb 6, 2026

michaelmckinsey1 requested a review from PatrickRMiles February 6, 2026 02:36

lint

07c295c

michaelmckinsey1 changed the title ~~Enable Checkpoint Interval and Set Default Behavior to Stop Training Upon Convergence~~ Set Default Behavior to Stop Training Upon Convergence Feb 6, 2026

Merge remote-tracking branch 'upstream/main' into procruns

c9ea4c1

PatrickRMiles requested changes Feb 19, 2026

View reviewed changes

Update trainer.py

8202ef4

PatrickRMiles approved these changes Feb 19, 2026

View reviewed changes

Create benchmark_testing.yml

51211f3

michaelmckinsey1 merged commit a1f1bef into LBANN:main Feb 19, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Set Default Behavior to Stop Training Upon Convergence#16

Set Default Behavior to Stop Training Upon Convergence#16
michaelmckinsey1 merged 11 commits intoLBANN:mainfrom
michaelmckinsey1:procruns

michaelmckinsey1 commented Feb 6, 2026 •

edited

Loading

Uh oh!

PatrickRMiles Feb 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

michaelmckinsey1 commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PatrickRMiles Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

michaelmckinsey1 commented Feb 6, 2026 •

edited

Loading