Skip to content

[Auto-Recovery] Add checkpoint tests#1920

Open
yf225 wants to merge 1 commit intoyf225/stack/96from
yf225/stack/90
Open

[Auto-Recovery] Add checkpoint tests#1920
yf225 wants to merge 1 commit intoyf225/stack/96from
yf225/stack/90

Conversation

@yf225
Copy link
Copy Markdown
Contributor

@yf225 yf225 commented Apr 2, 2026

Stacked PRs:


[Auto-Recovery] Add checkpoint tests

Add test_autotuner_checkpoint.py with 17 tests covering checkpoint
save/load cycles for all search algorithms (PatternSearch,
LFBOPatternSearch, LFBOTreeSearch, DE, DESurrogateHybrid).

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Apr 2, 2026
@yf225 yf225 force-pushed the yf225/stack/90 branch 3 times, most recently from 2c1c3e3 to c104092 Compare April 2, 2026 20:18
yf225 added a commit that referenced this pull request Apr 2, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 3, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
…oint

Fixes #1330. Internal customers had a lot of pain with IMA errors and they also feel that spawn mode is too much overhead causing autotuning time to be extra long. This PR stack adds an auto-recovery feature by checkpointing regularly (which is by itself useful for server crash scenarios mentioned in #1330) and then automatically start a new autotune process using previously saved checkpoint if there is an IMA error (next PR).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
Add test_autotuner_checkpoint.py with 17 tests covering checkpoint
save/load cycles for all search algorithms (PatternSearch,
LFBOPatternSearch, LFBOTreeSearch, DE, DESurrogateHybrid).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
Add test_autotuner_checkpoint.py with 17 tests covering checkpoint
save/load cycles for all search algorithms (PatternSearch,
LFBOPatternSearch, LFBOTreeSearch, DE, DESurrogateHybrid).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
Add test_autotuner_checkpoint.py with 17 tests covering checkpoint
save/load cycles for all search algorithms (PatternSearch,
LFBOPatternSearch, LFBOTreeSearch, DE, DESurrogateHybrid).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
Add test_autotuner_checkpoint.py with 17 tests covering checkpoint
save/load cycles for all search algorithms (PatternSearch,
LFBOPatternSearch, LFBOTreeSearch, DE, DESurrogateHybrid).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
Add test_autotuner_checkpoint.py with 17 tests covering checkpoint
save/load cycles for all search algorithms (PatternSearch,
LFBOPatternSearch, LFBOTreeSearch, DE, DESurrogateHybrid).

stack-info: PR: #1920, branch: yf225/stack/90
@yf225 yf225 force-pushed the yf225/stack/90 branch 2 times, most recently from e6d7085 to 99ee67e Compare April 4, 2026 04:25
@yf225 yf225 changed the title [Autotuner] Auto-checkpoint feature and ability to resume from checkpoint [Auto-Recovery] Add checkpoint tests Apr 4, 2026
@yf225 yf225 changed the base branch from main to yf225/stack/96 April 4, 2026 04:25
yf225 added a commit that referenced this pull request Apr 4, 2026
Add test_autotuner_checkpoint.py with 17 tests covering checkpoint
save/load cycles for all search algorithms (PatternSearch,
LFBOPatternSearch, LFBOTreeSearch, DE, DESurrogateHybrid).

stack-info: PR: #1920, branch: yf225/stack/90
yf225 added a commit that referenced this pull request Apr 4, 2026
Add test_autotuner_checkpoint.py with 17 tests covering checkpoint
save/load cycles for all search algorithms (PatternSearch,
LFBOPatternSearch, LFBOTreeSearch, DE, DESurrogateHybrid).

stack-info: PR: #1920, branch: yf225/stack/90
@yf225 yf225 changed the base branch from yf225/stack/96 to main April 4, 2026 21:20
@yf225 yf225 changed the base branch from main to yf225/stack/96 April 4, 2026 21:20
@yf225 yf225 changed the base branch from yf225/stack/96 to main April 4, 2026 21:58
@yf225 yf225 changed the base branch from main to yf225/stack/96 April 4, 2026 21:58
@yf225 yf225 changed the base branch from yf225/stack/96 to main April 4, 2026 21:58
@yf225 yf225 changed the base branch from main to yf225/stack/96 April 4, 2026 21:58
Add test_autotuner_checkpoint.py with 17 tests covering checkpoint
save/load cycles for all search algorithms (PatternSearch,
LFBOPatternSearch, LFBOTreeSearch, DE, DESurrogateHybrid).

stack-info: PR: #1920, branch: yf225/stack/90
@yf225 yf225 changed the base branch from yf225/stack/96 to main April 4, 2026 22:06
@yf225 yf225 changed the base branch from main to yf225/stack/96 April 4, 2026 22:06
@yf225 yf225 changed the base branch from yf225/stack/96 to main April 4, 2026 23:06
@yf225 yf225 changed the base branch from main to yf225/stack/96 April 4, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant