Add w parameter to progressive_val_score and iter_progressive_val_score by satishkc7 · Pull Request #1762 · online-ml/river

satishkc7 · 2026-03-23T06:21:58Z

Summary

Adds per-sample weight support to progressive_val_score and iter_progressive_val_score by allowing the dataset to yield (x, y, w) triples instead of the usual (x, y) pairs.

Previously, learn_one was always called with the default weight (w=1.0). This change lets users supply a per-sample w directly from the dataset iterator, which is more practical for real-world use cases like time-decayed weighting or cost-sensitive learning.

Usage

# Dataset yields (x, y, w) triples — w is per-sample
weighted_dataset = (
    (x, y, weight)
    for (x, y), weight in zip(datasets.Phishing(), sample_weights)
)

evaluate.progressive_val_score(
    model=model,
    dataset=weighted_dataset,
    metric=metrics.ROCAUC(),
    print_every=200,
)

Samples that don't include a weight (plain (x, y) pairs) default to w=1.0, so existing code is fully backward compatible.

Changes

_progressive_validation wraps the dataset with _iter_dataset() which detects 3-tuples and stores per-sample weights keyed by sample index
The stored weight is forwarded to learn_one when the ground truth is revealed, for models that accept a w parameter
Removed the static w: float = 1.0 parameter from all three public/private functions
Added Notes section to docstrings explaining the (x, y, w) API
Added tests/test_progressive_validation_weights.py with 4 tests covering plain pairs, weighted triples, mixed input, and models without a w param

Backward compatibility

Fully backward compatible — datasets yielding plain (x, y) pairs continue to work unchanged.

Closes #1502

Exposes the sample weight parameter w to the progressive validation evaluation functions so users can pass instance weights when calling learn_one. The parameter defaults to 1.0 to preserve existing behavior. Closes online-ml#1502

Use inspect.signature to check once before the evaluation loop whether the model's learn_one accepts a w parameter. Only pass w=w when the model supports it, so models without a w argument (e.g. AMFClassifier) are not broken.

MaxHalford · 2026-03-24T13:59:57Z

Hey there!

I'm not sure I follow. When would it be useful to always use the same weight for each sample? Wouldn't it be more practical if the dataset iterator could include a w variable?

satishkc7 · 2026-03-24T15:39:38Z

That's a great point; a static weight for all samples isn't very useful. Would the right approach be to have the dataset yield (x, y, w) tuples where w is a per-sample float? Or did you have a different pattern in mind? Happy to update the PR accordingly.

MaxHalford · 2026-03-25T19:20:47Z

Yes that's what I have in mind. Though I'm not exactly what the API would look like. But you can take a stab at it! May I ask why you opened this PR in the first place? Do you have a usecase?

satishkc7 · 2026-03-25T20:17:49Z

The use case I had in mind is datasets where samples have unequal importance, e.g. time-decayed weighting or class imbalance where minority samples should count more in the metric. I'll prototype the (x, y, w) approach where the dataset optionally yields a third element and progressive_val_score detects and passes it through to the metric's update call. I'll update the PR once I have something working!

MaxHalford · 2026-03-26T13:10:48Z

Thanks for the details. I was curious whether you had a real-life usecase available too :)

… validation Replace the static w=1.0 parameter with per-sample weight support. The dataset can now yield (x, y, w) triples instead of (x, y) pairs. Weights are extracted before passing to simulate_qa and forwarded to learn_one for models that accept a w parameter. - Remove static w param from _progressive_validation, iter_progressive_val_score, and progressive_val_score - Add _iter_dataset() wrapper that detects (x, y, w) triples and stores sample_weights keyed by simulate_qa's sample index - Add Notes section to docstrings explaining the (x, y, w) API - Add tests/test_progressive_validation_weights.py with 4 tests covering plain pairs, weighted triples, mixed input, and models without w param Closes online-ml#1502

satishkc7 · 2026-03-27T16:07:37Z

Done! Here's the approach I went with:

API: dataset can now yield either (x, y) pairs or (x, y, w) triples. The weight w is a per-sample float that is forwarded to learn_one for models that accept a w parameter. Samples that don't supply a weight default to w=1.0.

No new parameters were added to progressive_val_score or iter_progressive_val_score — the weight is picked up transparently from the dataset iterator.

Example:

from river import datasets, evaluate, linear_model, metrics, preprocessing

model = preprocessing.StandardScaler() | linear_model.LogisticRegression()

# Wrap a dataset to yield (x, y, w) triples with time-decayed weights
def decayed_phishing():
    dataset = list(datasets.Phishing())
    n = len(dataset)
    for i, (x, y) in enumerate(dataset):
        w = (i + 1) / n  # later samples get higher weight
        yield x, y, w

evaluate.progressive_val_score(
    model=model,
    dataset=decayed_phishing(),
    metric=metrics.ROCAUC(),
    print_every=200,
)

Implementation details:

_iter_dataset() unwraps triples, stores w in a sample_weights dict keyed by sample index
At learn time, sample_weights.pop(i, 1.0) retrieves the weight before calling learn_one
Models whose learn_one signature has no w parameter receive the call without it

Added 4 unit tests in tests/test_progressive_validation_weights.py covering: plain pairs defaulting to 1.0, triples with per-sample weights, mixed datasets, and models that don't accept w.

JiwaniZakir

The sample_weights dict in _progressive_validation is populated for every sample in _iter_dataset() but only cleared via pop on line ~110 when use_label is True. In delayed or look-ahead scenarios where many samples are queued before answers arrive, this dict can grow to hold the entire dataset in memory with no upper bound — worth either documenting the memory implication or using a collections.deque/bounded structure.

There's also a subtle collision risk on line ~109: if kwargs (forwarded from simulate_qa) happens to contain a "w" key — for instance, if someone passes extra stream metadata — then model.learn_one(x, y, w=w, **kwargs) will raise TypeError: got multiple values for keyword argument 'w'. The _model_accepts_w guard doesn't protect against this; you'd want to either explicitly remove w from kwargs before the call or document that w is a reserved key in the extra-kwargs convention.

The test's _fake_simulate_qa in test_progressive_validation_weights.py assumes simulate_qa enumerates the wrapped iterator starting at 0 sequentially, which is how _iter_dataset's own enumerate assigns keys to sample_weights. If stream.simulate_qa ever resets its index or introduces gaps (e.g., skipping flagged samples), the index alignment silently breaks and pop falls back to 1.0 with no error — a test exercising a non-trivial delay scenario would make this contract explicit.

- Replace sample_weights dict with a deque; weight is popped on the question event and stored inside preds[i] alongside the prediction. Memory is now bounded by the delay window, not the whole dataset. - Strip 'w' from simulate_qa kwargs before learn_one to prevent TypeError when stream metadata contains a reserved 'w' key. - Add tests for full-delay weight matching and kwargs collision.

satishkc7 · 2026-03-28T20:04:15Z

Thanks for the thorough review! I've addressed all three points:

Memory leak - Replaced the sample_weights dict with a collections.deque. The weight is now popped on the question event and stored inside preds[i] alongside the prediction, so memory is bounded by the delay window rather than the full dataset.
kwargs collision - Added a learn_kwargs = {k: v for k, v in kwargs.items() if k != 'w'} guard before every learn_one call. The metadata w is silently stripped; the sample weight takes precedence.
Fragile index alignment - Since weights now travel with preds[i] rather than a separate dict, index alignment is no longer an issue. Added two new tests: test_weights_correct_under_full_delay (all answers arrive after all questions) and test_kwargs_w_key_does_not_cause_collision.

MaxHalford · 2026-03-28T21:23:34Z

river/evaluate/progressive_validation.py

+    sample_weights: dict[int, float] = {}
+
+    def _iter_dataset():
+        for idx, item in enumerate(dataset):
+            if len(item) == 3:
+                x, y, w = item
+                sample_weights[idx] = w
+            else:
+                x, y = item
+                sample_weights[idx] = 1.0
+            yield x, y


Is this really necessary? Can't you .pop w from *kwargs instead?

Expose per-sample weighting as a first-class parameter on both progressive_val_score and iter_progressive_val_score. Users can now pass w=callable(x, y) -> float directly instead of wrapping the dataset to yield (x, y, w) triples. Tuple weights still take precedence so mixed datasets behave predictably. Also fixes a variable-shadowing bug: the local sample weight stored in preds[i] was named 'w', which silently overwrote the w callable parameter in the enclosing scope on every Case-2 iteration. Renamed to 'sample_w' throughout. Two new tests cover the callable path and the tuple-takes-precedence rule.

satishkc7 · 2026-03-28T21:29:35Z

Updated the PR with the w parameter added directly to progressive_val_score and iter_progressive_val_score (and the internal _progressive_validation).

API:

evaluate.progressive_val_score(
    model=model,
    dataset=dataset,
    metric=metrics.ROCAUC(),
    w=lambda x, y: (x['timestamp'] + 1) / n,  # time-decay example
)

Rules:

w is an optional callable(x, y) -> float
Dataset (x, y, w) triples take precedence over the callable, so mixed datasets behave predictably
Also fixed a variable-shadowing bug: the local sample weight stored in preds[i] was named w, silently overwriting the callable on each answer event. Renamed to sample_w throughout.

Two new tests: test_w_callable_applied_to_xy_pairs and test_tuple_weight_takes_precedence_over_w_callable.

MaxHalford

Getting there! You'll have to fix some conflicts when you rebase.

MaxHalford · 2026-04-01T09:00:07Z

river/evaluate/progressive_validation.py

    measure_time=False,
    measure_memory=False,
    yield_predictions=False,
+    w: typing.Callable[[dict, typing.Any], float] | None = None,


I'd prefer it if you renamed the parameter to weights

MaxHalford · 2026-04-01T09:43:05Z

river/evaluate/progressive_validation.py

+    # avoids any reliance on index alignment between _iter_dataset and simulate_qa.
+    weight_queue: collections.deque[float] = collections.deque()
+
+    def _iter_dataset():


This is not needed if _model_accepts_w is False and w is None, right?

- Rename w parameter to weights in _progressive_validation, iter_progressive_val_score, and progressive_val_score - Add _needs_weights guard so weight_queue and _iter_dataset are only created when the model accepts a w parameter or a weights callable is provided, avoiding unnecessary overhead on the default path - Integrate weight support into upstream fast path (no delay/moment) - Rebase onto upstream main which refactored predict/learn closures and added a fast path

satishkc7 requested review from MaxHalford and smastelini as code owners March 23, 2026 06:21

satish-official added 2 commits March 23, 2026 01:37

Fix w parameter only passed to models that support it

8c3d95f

Use inspect.signature to check once before the evaluation loop whether the model's learn_one accepts a w parameter. Only pass w=w when the model supports it, so models without a w argument (e.g. AMFClassifier) are not broken.

style: fix ruff-format line wrapping in progressive_validation.py

29f0fac

JiwaniZakir reviewed Mar 28, 2026

View reviewed changes

MaxHalford reviewed Mar 28, 2026

View reviewed changes

MaxHalford reviewed Apr 1, 2026

View reviewed changes

Uh oh!

Conversation

satishkc7 commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Usage

Changes

Backward compatibility

Uh oh!

MaxHalford commented Mar 24, 2026

Uh oh!

satishkc7 commented Mar 24, 2026

Uh oh!

MaxHalford commented Mar 25, 2026

Uh oh!

satishkc7 commented Mar 25, 2026

Uh oh!

MaxHalford commented Mar 26, 2026

Uh oh!

satishkc7 commented Mar 27, 2026

Uh oh!

JiwaniZakir left a comment

Choose a reason for hiding this comment

Uh oh!

satishkc7 commented Mar 28, 2026

Uh oh!

MaxHalford Mar 28, 2026

Choose a reason for hiding this comment

Uh oh!

satishkc7 commented Mar 28, 2026

Uh oh!

MaxHalford left a comment

Choose a reason for hiding this comment

Uh oh!

MaxHalford Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

MaxHalford Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

satishkc7 commented Mar 23, 2026 •

edited

Loading