Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
9d30f49
feat: try to migrate changes to RatioPath from RationAI masks
AdamBajger Jan 9, 2026
139a616
refactor: split into files
AdamBajger Jan 9, 2026
d081bf7
Update ratiopath/masks/write_big_tiff.py
vejtek Jan 9, 2026
e8bcb45
fix: add numpy and jaxtyping imports to mask builder modules
AdamBajger Jan 9, 2026
6b64a1e
Update ratiopath/masks/mask_builders/receptive_field_manipulation.py
AdamBajger Jan 9, 2026
8abdde4
fix: overlap naming convention
AdamBajger Jan 9, 2026
888f25c
fix: all imports correction
AdamBajger Jan 9, 2026
8dda4d3
refactor: move __all__ exports to top in mask builder module __init__
AdamBajger Jan 9, 2026
dbcf3cc
fix: typo bracket
AdamBajger Jan 9, 2026
f3fb817
chore: ruff check and format
AdamBajger Jan 9, 2026
47591fb
fix: test filenames
AdamBajger Jan 9, 2026
fca93c9
fix: update geopandas dependency version and refactor GeoJSONParser l…
Adames4 Jan 9, 2026
1561609
fix: update lock + freeze sync
matejpekar Jan 10, 2026
d9d48a4
fix: mypy
matejpekar Jan 10, 2026
689c3b3
fix: unlink overlaps file in tests which was previously left linked
AdamBajger Jan 13, 2026
8454458
Update tests/test_mask_builders.py
AdamBajger Jan 15, 2026
9397277
Update ratiopath/masks/mask_builders/__init__.py
AdamBajger Jan 15, 2026
e5a501e
Initial plan
Copilot Jan 15, 2026
85571e6
fix: correct module and class names in example code
Copilot Jan 15, 2026
07a0342
fix: update docstring Args to match constructor signature
Copilot Jan 15, 2026
42f5415
Initial plan
Copilot Jan 15, 2026
bdb27b6
fix: correct typo in test docstring (SImple → Simple)
Copilot Jan 15, 2026
8e5f852
Apply suggestions from code review
AdamBajger Jan 15, 2026
260fee9
Update tests/test_mask_builders.py
AdamBajger Jan 15, 2026
7a20461
Initial plan
Copilot Jan 15, 2026
66bb90f
chore: run ruff format to fix linting issues
Copilot Jan 15, 2026
f513f0c
docs: fix example code
AdamBajger Jan 19, 2026
a97d28e
refactor: remove obsolete field
AdamBajger Jan 19, 2026
24957fa
chore: replace ellipsis by pass
AdamBajger Jan 21, 2026
6a9a2a1
Initial plan
Copilot Jan 21, 2026
39394b7
fix: correct docstring example - remove duplicate import, add numpy, …
Copilot Jan 21, 2026
cd39060
fix: update all mask builder docstring examples with correct API sign…
Copilot Jan 21, 2026
f046eba
docs: add mask builders documentation and run ruff format
Copilot Jan 21, 2026
f3e0d08
docs: validate mkdocs builds successfully
Copilot Jan 21, 2026
6280a7d
chore: add site/ to gitignore and remove from git
Copilot Jan 21, 2026
89179e0
docs: clarify generate_tiles_from_slide is a placeholder function
Copilot Jan 21, 2026
6126326
fix: remove unnecessary pass to satisfylinter ruff
AdamBajger Jan 21, 2026
a07286a
Initial plan
Copilot Jan 21, 2026
e131b37
chore: run ruff format to fix linting errors
Copilot Jan 21, 2026
3b39943
fix: add explicit dtype parameter
AdamBajger Jan 22, 2026
490297c
docs: add docstrings
AdamBajger Jan 22, 2026
45882e7
fix: inheritance param mismatches
AdamBajger Jan 22, 2026
348694e
fix: ruff formatting and linting
AdamBajger Jan 22, 2026
8ddbc51
docs: fix OpenSLide level_dimensions use in examples
AdamBajger Jan 23, 2026
5cd85ad
fix: enhance memory setup in AutoScalingAveragingClippingNumpyMemMapM…
AdamBajger Jan 23, 2026
d65d8be
chore: ruff format
AdamBajger Jan 23, 2026
84635ce
docs: add a short remark about the memmap tempfile behaviour
AdamBajger Feb 2, 2026
9a245f1
feat: implement a Factory class for composing mask builders dynamically
AdamBajger Feb 2, 2026
0e44bc6
refactor: naming and typing
AdamBajger Feb 3, 2026
aa78fdd
feat: debugging tests
AdamBajger Feb 3, 2026
ee3a39d
exp: wip refactor slightly, fix tests on Windows platform
AdamBajger Feb 3, 2026
a46b207
fix: fix errors, update names
AdamBajger Feb 3, 2026
2ca441a
chore: roll back random change I dont remember making
AdamBajger Feb 3, 2026
2ffb1ae
refactor: use better variable name
AdamBajger Feb 3, 2026
b8fb06e
fix: memmap deletion logic to not raise exceptions during garbage col…
AdamBajger Feb 3, 2026
5c4b751
fix: overlap counter dtype
AdamBajger Feb 3, 2026
44c6332
fix: minor code redundancy
AdamBajger Feb 3, 2026
55e80e5
docs: update docs and formatting using ruff
AdamBajger Feb 4, 2026
35bfcc8
fix: mypy warnings
AdamBajger Feb 4, 2026
b806fff
chore: remove trailing space to make ruff happy
AdamBajger Feb 4, 2026
1fca80f
fix: mypy warnings for code someone else did
AdamBajger Feb 4, 2026
1b4ad69
refactor: code structure for improved readability and maintainability
matejpekar Mar 13, 2026
312c818
fix: documentation
matejpekar Mar 13, 2026
c46e1de
fix: ruff
matejpekar Mar 13, 2026
3b81f0b
fix: improved storage + docstrings
matejpekar Mar 13, 2026
6e85a13
Update docs/reference/masks/mask_builders.md
matejpekar Mar 13, 2026
594c398
Update tests/test_mask_builders.py
matejpekar Mar 13, 2026
d3646bb
fix: tests and naming
matejpekar Mar 13, 2026
7570184
fix: update version
matejpekar Mar 13, 2026
141ba3b
fix: version
matejpekar Mar 13, 2026
3f9a3c4
fix: negative values
matejpekar Mar 13, 2026
f0c3c6d
refactor: simplify Aggregator interface to single-tile updates and re…
matejpekar Mar 31, 2026
6772577
refactor: remove unused safely_instantiate utility function
matejpekar Mar 31, 2026
6b5eb79
chore: remove outdated file
matejpekar Mar 31, 2026
411aedc
chore: merge branch 'main' into mask-builders
matejpekar Mar 31, 2026
2bc50a8
fix: minor mistakes
matejpekar Mar 31, 2026
858a065
Apply suggestions from code review
matejpekar Mar 31, 2026
3550b69
Strip ICC profile data before saving TIFF
matejpekar Apr 9, 2026
2d2717c
fix: edge clipping
matejpekar Apr 17, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -19,4 +19,7 @@ wheels/
.mypy_cache/

# VS Code
.vscode/
.vscode/

# MkDocs
site/
2 changes: 1 addition & 1 deletion .ruff.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
fix = true
line-length = 88
target-version = "py311"
target-version = "py312"

[format]
docstring-code-format = true # Enable reformatting of code snippets in docstrings.
Expand Down
181 changes: 181 additions & 0 deletions docs/reference/masks/mask_builders.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,181 @@
# Mask Builders

Mask builders are tools for assembling feature masks from neural network predictions or other tile-level data. They handle the complexity of combining overlapping tiles, scaling between coordinate spaces, and managing memory for large output masks using a flexible strategy-based architecture.

## Overview

When processing whole-slide images with neural networks, you often need to:

1. Extract tiles from a slide
2. Run inference to get predictions or features for each tile
3. Assemble these predictions back into a full-resolution mask

Mask builders automate step 3, handling:

- **Coordinate scaling**: Converting from source WSI coordinates to mask coordinates — including automatic GCD-based compression when tiles and strides share common factors.
- **Overlap handling**: Averaging or taking the maximum when tiles overlap.
- **Memory management**: Using in-memory arrays or memory-mapped files for large masks.
- **Scalar expansion**: Broadcasting scalar per-tile predictions `(B, C)` into spatial tiles automatically.
- **Edge clipping**: Removing border artifacts from model output tiles at update time.

## MaskBuilder

::: ratiopath.masks.mask_builders.MaskBuilder

The `MaskBuilder` is the central orchestrator. You configure it by providing:

- `source_extents`: Spatial dimensions of the source WSI (H, W, ...).
- `source_tile_extent`: Spatial dimensions of the model input tiles.
- `output_tile_extent`: Spatial dimensions of the model output tiles (can differ from input due to pooling/stride).
- `stride`: Stride between tiles in source resolution.
- `storage`: Where the mask is stored — `"inmemory"` (RAM) or `"memmap"` (disk-backed).
- `aggregation`: How overlapping tiles are merged — `MeanAggregator` (default) or `MaxAggregator`.

The mask shape is computed automatically from the source extents, tile extents, and stride using GCD-based compression for efficient memory use.

## Components

### Storage Strategies

::: ratiopath.masks.mask_builders.InMemory
::: ratiopath.masks.mask_builders.MemMap

### Aggregation Strategies

::: ratiopath.masks.mask_builders.MeanAggregator
::: ratiopath.masks.mask_builders.MaxAggregator

## Examples

### Averaging Scalar Predictions

**Use case**: You have scalar predictions (e.g., class probabilities) for each tile. Each prediction is uniformly expanded to fill the tile's footprint, and overlapping regions are averaged.

```python
import numpy as np
import openslide
from ratiopath.masks.mask_builders import MaskBuilder, MeanAggregator
import matplotlib.pyplot as plt

# Set up tiling parameters
LEVEL = 3
tile_extents = (512, 512)
tile_strides = (256, 256)
slide = openslide.OpenSlide("path/to/slide.mrxs")
slide_w, slide_h = slide.level_dimensions[LEVEL]

# output_tile_extent=(1, 1) means scalar data — the builder
# broadcasts (B, C) → (B, C, 1, 1) and upscales automatically.
mask_builder = MaskBuilder(
source_extents=(slide_h, slide_w),
source_tile_extent=tile_extents,
output_tile_extent=(1, 1),
stride=tile_strides,
n_channels=1,
storage="inmemory",
aggregation=MeanAggregator,
dtype=np.float32,
)

# Process tiles
for tiles, xs, ys in generate_tiles_from_slide(slide, LEVEL, tile_extents, tile_strides):
features = model.predict(tiles) # features shape: (B, 1)
coords_batch = np.stack([ys, xs], axis=1) # shape: (B, 2)
mask_builder.update_batch(features, coords_batch)

# Finalize — MeanAggregator returns {"mask": ..., "overlap_counter": ...}
results = mask_builder.finalize()
assembled_mask = results["mask"]
overlap_counter = results["overlap_counter"]

plt.imshow(assembled_mask[0], cmap="gray")
plt.show()

# Always clean up to release storage resources
mask_builder.cleanup()
```

---

### Max Aggregation with Edge Clipping (MemMap)

**Use case**: You have high-resolution feature maps. You want to preserve the maximum signal where tiles overlap, remove border pixels from each tile edge to avoid artifacts, and use disk storage because the mask is very large.

```python
import numpy as np
from ratiopath.masks.mask_builders import MaskBuilder, MaxAggregator

# Dense output — output tiles match input tiles in spatial size
mask_builder = MaskBuilder(
source_extents=(10000, 10000),
source_tile_extent=(512, 512),
output_tile_extent=(512, 512),
stride=(256, 256),
n_channels=3,
storage="memmap",
aggregation=MaxAggregator,
dtype=np.float32,
filename="large_mask.npy", # persisted to disk
)

for tiles, coords in tile_generator:
predictions = model.predict(tiles) # (B, 3, 512, 512)
# edge_clipping=4 removes 4px from each edge of every tile
mask_builder.update_batch(predictions, coords, edge_clipping=4)

# MaxAggregator returns the accumulator NDArray directly
assembled_mask = mask_builder.finalize()
mask_builder.cleanup()
```

---

### Auto-Scaling Coordinates (Different Input/Output Resolution)

**Use case**: Your model's output tiles have different spatial dimensions than the input tiles (e.g., due to stride or pooling). The builder auto-scales coordinates between source and mask resolution.

```python
import numpy as np
from ratiopath.masks.mask_builders import MaskBuilder, MeanAggregator

# Model takes 512×512 input tiles, produces 128×128 output tiles (4× downsampled)
mask_builder = MaskBuilder(
source_extents=(2000, 2000),
source_tile_extent=(512, 512),
output_tile_extent=(128, 128),
stride=(256, 256),
n_channels=1,
storage="inmemory",
aggregation=MeanAggregator,
dtype=np.float32,
)

# Coordinates are always in SOURCE resolution — the builder
# handles the conversion to mask resolution internally.
for tiles, coords in tile_generator:
predictions = model.predict(tiles) # (B, 1, 128, 128)
mask_builder.update_batch(predictions, coords)

results = mask_builder.finalize()
mask_builder.cleanup()
```
Comment thread
matejpekar marked this conversation as resolved.

## Coordinate System Notes

All mask builders expect coordinates in the format `(B, N)` where:

- `B` is the batch size.
- `N` is the number of spatial dimensions (typically 2 for height and width).

Note the order: `[ys, xs]` not `[xs, ys]`, as the first dimension represents height (y) and the second represents width (x), matching the NumPy `(C, H, W)` convention used by the builder.

## Lifecycle

Always call `cleanup()` when you are done with a `MaskBuilder` to release storage resources (especially important for `MemMap` storage which holds file handles):

```python
mask_builder = MaskBuilder(...)
# ... update_batch calls ...
results = mask_builder.finalize()
mask_builder.cleanup()
```
3 changes: 3 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,12 @@ nav:
- read_slide_tile: reference/tiling/read_slide_tile.md
- tilers: reference/tiling/tilers.md
- utils: reference/tiling/utils.md
- Masks:
- mask_builders: reference/masks/mask_builders.md
- Parsers:
- ASAPParser: reference/parsers/asap.md
- GeoJSONParser: reference/parsers/geojson.md
- Darwin7JSONParser: reference/parsers/darwin.md
- Augmentations:
- estimate_stain_vectors: reference/augmentations/estimate_stain_vectors.md
- StainAugmentor: reference/augmentations/stain_augmentor.md
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "ratiopath"
version = "1.3.1"
version = "1.4.0"
description = "A library for efficient processing and analysis of whole-slide pathology images."
authors = [
{ name = "Matěj Pekár", email = "matejpekar@mail.muni.cz" },
Expand Down
3 changes: 2 additions & 1 deletion ratiopath/masks/__init__.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from ratiopath.masks.tissue_mask import tissue_mask
from ratiopath.masks.write_big_tiff import write_big_tiff


__all__ = ["tissue_mask"]
__all__ = ["tissue_mask", "write_big_tiff"]
17 changes: 17 additions & 0 deletions ratiopath/masks/mask_builders/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
from ratiopath.masks.mask_builders.aggregation import (
Aggregator,
MaxAggregator,
MeanAggregator,
)
from ratiopath.masks.mask_builders.mask_builder import MaskBuilder
from ratiopath.masks.mask_builders.storage import InMemory, MemMap


__all__ = [
"Aggregator",
"InMemory",
"MaskBuilder",
"MaxAggregator",
"MeanAggregator",
"MemMap",
]
136 changes: 136 additions & 0 deletions ratiopath/masks/mask_builders/aggregation.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
from __future__ import annotations

from abc import ABC, abstractmethod
from pathlib import Path
from typing import TYPE_CHECKING, Any, TypedDict, cast

import numpy as np
from numpy.typing import NDArray


if TYPE_CHECKING:
from collections.abc import Callable
Comment thread
matejpekar marked this conversation as resolved.
Dismissed


class Aggregator[DType: np.generic, R](ABC):
"""Abstract base class for aggregation strategies."""

def __init__(self, storage: NDArray[DType], **kwargs: Any) -> None:
return

@abstractmethod
def update(
self, accumulator: NDArray[DType], sample: np.ndarray, coords: NDArray[np.int64]
) -> None:
"""Update the accumulator with a single tile sample."""

@abstractmethod
def finalize(self, accumulator: NDArray[DType]) -> R:
"""Finalize the mask assembly and return the result."""

def cleanup(self) -> None:
"""Optional cleanup method to release resources if needed."""
return

def _get_acc_slices(
self, coords: NDArray[np.int64], mask_tile_extents: NDArray[np.int64]
) -> tuple[slice, ...]:
"""Compute slice objects for accumulator indexing.

Args:
coords: Array of shape (N,) with top-left coordinates in N dimensions.
mask_tile_extents: Array of shape (N,) with tile size in mask space for each dimension.

Returns:
Tuple containing N slice objects for indexing into the accumulator.
"""
tile_end_coords = coords + mask_tile_extents
return tuple(
slice(int(start), int(end))
for start, end in zip(coords, tile_end_coords, strict=True)
)


class MeanAggregatorResults[Dtype: np.generic](TypedDict):
mask: NDArray[Dtype]
overlap_counter: NDArray[np.uint16]


class MeanAggregator[DType: np.generic](
Aggregator[DType, MeanAggregatorResults[DType]]
):
"""Aggregator that implements averaging aggregation for overlapping tiles.

This aggregator accumulates tiles by addition and tracks the overlap count at each pixel.
During finalization, the accumulated values are divided by the overlap count to compute
the average value at each position. This is useful for:
- Smoothly blending overlapping tile predictions
- Reducing edge artifacts in sliding window processing
- Computing ensemble averages from multiple passes

The aggregator allocates an additional `overlap_counter` accumulator with shape (1, *SpatialDims)
to track how many tiles contributed to each pixel position.
"""

def __init__(
self,
storage: NDArray[DType],
filename: Path | str | None = None,
overlap_counter_filename: Path | str | None = None,
**kwargs: Any,
) -> None:
overlap_filename = overlap_counter_filename
if overlap_filename is None and filename is not None:
path = Path(filename)
overlap_filename = path.with_suffix(f".overlaps{path.suffix}")

storage_cls = cast("Callable[..., NDArray[np.uint16]]", type(storage))
self.overlap_counter = storage_cls(
filename=overlap_filename,
shape=(1, *storage.shape[1:]),
dtype=np.uint16,
**kwargs,
)

def update(
self, accumulator: NDArray[DType], sample: np.ndarray, coords: NDArray[np.int64]
) -> None:
mask_tile_extents = np.asarray(sample.shape[1:], dtype=np.int64)
acc_slices = self._get_acc_slices(coords, mask_tile_extents)
accumulator[:, *acc_slices] += sample # type: ignore[misc]
self.overlap_counter[:, *acc_slices] += 1

Comment thread
matejpekar marked this conversation as resolved.
def finalize(self, accumulator: NDArray[DType]) -> MeanAggregatorResults[DType]:
accumulator /= self.overlap_counter.clip(min=1) # type: ignore[misc]
return {
"mask": accumulator,
"overlap_counter": self.overlap_counter,
}

def cleanup(self) -> None:
if hasattr(self, "overlap_counter"):
if hasattr(self.overlap_counter, "close"):
self.overlap_counter.close()
del self.overlap_counter
Comment thread
matejpekar marked this conversation as resolved.
Comment thread
matejpekar marked this conversation as resolved.


class MaxAggregator[DType: np.generic](Aggregator[DType, NDArray[DType]]):
"""Aggregator that implements maximum aggregation for overlapping tiles.

This aggregator keeps only the maximum value at each pixel position when tiles overlap.
No additional storage is required, and finalization is a no-op since the accumulator
already contains the final max values. This is useful for:
- Maximum intensity projection
- Keeping the highest confidence prediction across overlapping tiles
- Peak detection across multiple scales
"""

def update(
self, accumulator: NDArray[DType], sample: np.ndarray, coords: NDArray[np.int64]
) -> None:
mask_tile_extents = np.asarray(sample.shape[1:], dtype=np.int64)
acc_slices = self._get_acc_slices(coords, mask_tile_extents)
accumulator[:, *acc_slices] = np.maximum(accumulator[:, *acc_slices], sample)

def finalize(self, accumulator: NDArray[DType]) -> NDArray[DType]:
return accumulator
Loading
Loading