Skip to content

Regression stability + new regression tests#2480

Draft
scarlehoff wants to merge 10 commits into
masterfrom
regression_stability
Draft

Regression stability + new regression tests#2480
scarlehoff wants to merge 10 commits into
masterfrom
regression_stability

Conversation

@scarlehoff

@scarlehoff scarlehoff commented Jun 3, 2026

Copy link
Copy Markdown
Member
  • Add a test for vp-setupfit
  • Improve the stability of the regression tests
  • Include a theory covmat test
  • Generate the data in a dedicated worker
  • Set the tolerance with respect to said worker

Fixes #2464

@scarlehoff scarlehoff added the redo-regressions Recompute the regression data label Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from b2fd1ad to b1fc97f Compare June 3, 2026 13:20
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from b1fc97f to e3ae1d8 Compare June 3, 2026 13:42
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from a78efb4 to 9979d66 Compare June 3, 2026 14:07
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from a727524 to 0b6df3c Compare June 3, 2026 18:59
@scarlehoff scarlehoff added buildmaster redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data buildmaster labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from ef46748 to bd1f82d Compare June 3, 2026 19:38
@scarlehoff scarlehoff added redo-regressions Recompute the regression data devtools Build, automation and workflow and removed redo-regressions Recompute the regression data labels Jun 3, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from 882ce46 to 32b2e7a Compare June 4, 2026 07:04
@scarlehoff scarlehoff added devtools Build, automation and workflow and removed devtools Build, automation and workflow labels Jun 4, 2026
@scarlehoff scarlehoff force-pushed the regression_stability branch from 22afeea to 3dd49bb Compare June 4, 2026 10:05
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed devtools Build, automation and workflow redo-regressions Recompute the regression data labels Jun 4, 2026
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 4, 2026
@scarlehoff scarlehoff added redo-regressions Recompute the regression data and removed redo-regressions Recompute the regression data labels Jun 4, 2026
@scarlehoff

Copy link
Copy Markdown
Member Author

Ok, so the good news is that after many label-unlabel I have now the two samples that produce different results and make the test fail.
The failure seems to be coming from workers in different Azure regions. And there is no way to choose among them. Also, no guarantee that within the same region all workers are the same.

There are two options:

  1. Run the regression in our own hardware.
  2. Use those two samples to set the tolerances

I think the solution I'll go for is:

  • Regenerate the regression in our own hardware (if our github plan allows for it...)
  • Use the two samples to select a threshold compatible with the drift with respect to our own hardware.

This should be the most robust option.

Sorry for the spam whoever is still subscribed to this issue/PR but making the CI do it for me in a loop was the lowest-effort way to fish for the errors 😅

@scarlehoff scarlehoff marked this pull request as draft June 4, 2026 19:59
@Radonirinaunimi

Copy link
Copy Markdown
Member

Regenerate the regression in our own hardware (if our github plan allows for it...)

For this, I think we can host our own runner in the Nikhef cluster (or somewhere else) as we did with pineko for the regression test.

@scarlehoff

Copy link
Copy Markdown
Member Author

Ah, fantastic. We should be able to use the same one I think. I was worried the custom runner was a paid feature, I didn't realize we were already using it.

@Radonirinaunimi

Copy link
Copy Markdown
Member

Btw, I will be waiting for this before resuming #2478.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

redo-regressions Recompute the regression data

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve reproducibility of regression tests (and add one with a theory covmat)

2 participants