Feat calibration diagnostic by JemmaLDaniel · Pull Request #207 · instadeepai/winnow

JemmaLDaniel · 2026-05-24T15:31:00Z

Description

This PR adds a tail calibration diagnostic workflow for non-parametric FDR (winnow diagnose-calibration), and documentation. On a labelled holdout, Winnow calibrates scores, derives the operating confidence cutoff at the nominal FDR target, estimates sTECE and TECE on the tail via isotonic regression, writes a JSON report, and optionally saves a tail-only reliability diagram.

…nostic

JemmaLDaniel · 2026-05-24T15:40:52Z

Tests pass locally but will not run remotely due to new restrictions on GitHub Actions. I will take a look and fix this separately

BioGeek · 2026-05-24T18:51:29Z

 1. `winnow train` – Performs confidence calibration on a dataset of annotated PSMs, outputting the fitted model checkpoint.
 2. `winnow compute-features` – Computes and outputs the feature set for a dataset of PSMs.
 3. `winnow predict` – Performs confidence calibration using a fitted model checkpoint (defaults to a pretrained general model from Hugging Face), estimates and controls FDR using the calibrated confidence scores.
+4. `winnow diagnose-calibration` – On a labelled holdout, estimates tail calibration error (sTECE/TECE) at the FDR operating threshold and writes a reliability diagram.


Explain the terms (or link to an explanation of the terms) sTECE/TECE.

BioGeek · 2026-05-25T13:47:18Z

+        iso=iso,
+    )
+    stece_empirical = signed_tail_ece_empirical(
+        np.asarray(scores, dtype=float), labels_f, conf_cutoff


run_calibration_diagnostic passes full-length scores together with tail-filtered labels_f into signed_tail_ece_empirical, so labels[mask] raises IndexError whenever any PSM falls below conf_cutoff.

Any realistic input where the conf_cutoff actually filters anything (e.g. 1000 PSMs, 200 above cutoff): inside signed_tail_ece_empirical, mask = scores >= conf_cutoff has length 1000, but labels_f has length 200. labels_f[mask] raises IndexError: boolean index did not match indexed array along axis 0; size of axis is 200 but size of corresponding boolean axis is 1000. The unit tests miss this because their fixtures (uniform(conf_cutoff, 1.0)) keep every score above the cutoff so lengths happen to match.

To reproduce:

import numpy as np from winnow.calibration.diagnostics import run_calibration_diagnostic # Realistic: scores spread above and below the cutoff scores = np.linspace(0.1, 1.0, 1000) labels = (np.random.default_rng(0).uniform(size=1000) < scores).astype(bool) run_calibration_diagnostic( scores=scores, labels=labels, conf_cutoff=0.6, nominal_fdr=0.05, tolerance=0.005, label_source="sequence", label_column="correct", min_tail_psms=50, ) # IndexError: boolean index did not match indexed array along axis 0; # size of axis is 445 but size of corresponding boolean axis is 1000

Fix: pass the already-filtered tail arrays, not the full ones.

Suggested change

np.asarray(scores, dtype=float), labels_f, conf_cutoff

tail_scores, labels_f, conf_cutoff

And add a regression test where len(tail) < len(scores).

BioGeek · 2026-05-25T13:56:09Z

+    if len(df) > 0 and isinstance(df["prediction"].iloc[0], str):
+        df["prediction"] = df["prediction"].apply(metrics._split_peptide)
+
+    def _row_correct(row: pd.Series) -> bool:


For data_loader=mztab with diagnostics.label_source=sequence, Polars list columns reach pandas as numpy.ndarray values, but _row_correct() only accepts Python list. As a result, fully matching MZTab sequence/prediction pairs are marked incorrect and the calibration diagnostic is computed against all-false labels.

To reproduce:

import numpy as np import pandas as pd from winnow.calibration.diagnostics import compute_correct_from_sequence masses = {"A": 71.037114, "G": 57.021464} meta = pd.DataFrame({ "sequence": [np.array(["A", "G"], dtype=object)], "prediction": [np.array(["A", "G"], dtype=object)], }) print(compute_correct_from_sequence(meta, masses).tolist())

Current result:

[False]

Expected:

[True]

Suggested fix: normalize sequence-like values before the type check, or accept both list and np.ndarray.

def _as_token_list(value: object) -> list[str] | None: if isinstance(value, np.ndarray): return value.tolist() if isinstance(value, list): return value return None def _row_correct(row: pd.Series) -> bool: sequence = _as_token_list(row["sequence"]) prediction = _as_token_list(row["prediction"]) if sequence is None or prediction is None: return False num_matches = metrics._novor_match(sequence, prediction) return num_matches == len(sequence) == len(prediction)

And add a regression test with np.array(["A", "G"], dtype=object) for both sequence and prediction, asserting [True].

JemmaLDaniel added 3 commits May 24, 2026 17:26

feat: add calibration tail error diagnostic command

560c29a

test: add tests for calibration diagnostic

01413cb

docs: update CLI, configuration docs and README with calibration diag…

eb6cbe4

…nostic

JemmaLDaniel requested a review from BioGeek May 24, 2026 15:31

JemmaLDaniel self-assigned this May 24, 2026

JemmaLDaniel added the enhancement New feature or request label May 24, 2026

chore: add config

4ab3bc5

BioGeek requested changes May 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat calibration diagnostic#207

Feat calibration diagnostic#207
JemmaLDaniel wants to merge 4 commits into
mainfrom
feat-calibration-diagnostic

JemmaLDaniel commented May 24, 2026

Uh oh!

JemmaLDaniel commented May 24, 2026 •

edited

Loading

Uh oh!

BioGeek May 24, 2026

Uh oh!

BioGeek May 25, 2026

Uh oh!

BioGeek May 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	np.asarray(scores, dtype=float), labels_f, conf_cutoff
	tail_scores, labels_f, conf_cutoff

Conversation

JemmaLDaniel commented May 24, 2026

Description

Uh oh!

JemmaLDaniel commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BioGeek May 24, 2026

Choose a reason for hiding this comment

Uh oh!

BioGeek May 25, 2026

Choose a reason for hiding this comment

Uh oh!

BioGeek May 25, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JemmaLDaniel commented May 24, 2026 •

edited

Loading