-
Notifications
You must be signed in to change notification settings - Fork 1
Mask builders #42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Mask builders #42
Changes from all commits
Commits
Show all changes
79 commits
Select commit
Hold shift + click to select a range
9d30f49
feat: try to migrate changes to RatioPath from RationAI masks
AdamBajger 139a616
refactor: split into files
AdamBajger d081bf7
Update ratiopath/masks/write_big_tiff.py
vejtek e8bcb45
fix: add numpy and jaxtyping imports to mask builder modules
AdamBajger 6b64a1e
Update ratiopath/masks/mask_builders/receptive_field_manipulation.py
AdamBajger 8abdde4
fix: overlap naming convention
AdamBajger 888f25c
fix: all imports correction
AdamBajger 8dda4d3
refactor: move __all__ exports to top in mask builder module __init__
AdamBajger dbcf3cc
fix: typo bracket
AdamBajger f3fb817
chore: ruff check and format
AdamBajger 47591fb
fix: test filenames
AdamBajger fca93c9
fix: update geopandas dependency version and refactor GeoJSONParser l…
Adames4 1561609
fix: update lock + freeze sync
matejpekar d9d48a4
fix: mypy
matejpekar 689c3b3
fix: unlink overlaps file in tests which was previously left linked
AdamBajger 8454458
Update tests/test_mask_builders.py
AdamBajger 9397277
Update ratiopath/masks/mask_builders/__init__.py
AdamBajger e5a501e
Initial plan
Copilot 85571e6
fix: correct module and class names in example code
Copilot 07a0342
fix: update docstring Args to match constructor signature
Copilot 42f5415
Initial plan
Copilot bdb27b6
fix: correct typo in test docstring (SImple → Simple)
Copilot 8e5f852
Apply suggestions from code review
AdamBajger 260fee9
Update tests/test_mask_builders.py
AdamBajger 7a20461
Initial plan
Copilot 66bb90f
chore: run ruff format to fix linting issues
Copilot f513f0c
docs: fix example code
AdamBajger a97d28e
refactor: remove obsolete field
AdamBajger 24957fa
chore: replace ellipsis by pass
AdamBajger 6a9a2a1
Initial plan
Copilot 39394b7
fix: correct docstring example - remove duplicate import, add numpy, …
Copilot cd39060
fix: update all mask builder docstring examples with correct API sign…
Copilot f046eba
docs: add mask builders documentation and run ruff format
Copilot f3e0d08
docs: validate mkdocs builds successfully
Copilot 6280a7d
chore: add site/ to gitignore and remove from git
Copilot 89179e0
docs: clarify generate_tiles_from_slide is a placeholder function
Copilot 6126326
fix: remove unnecessary pass to satisfylinter ruff
AdamBajger a07286a
Initial plan
Copilot e131b37
chore: run ruff format to fix linting errors
Copilot 3b39943
fix: add explicit dtype parameter
AdamBajger 490297c
docs: add docstrings
AdamBajger 45882e7
fix: inheritance param mismatches
AdamBajger 348694e
fix: ruff formatting and linting
AdamBajger 8ddbc51
docs: fix OpenSLide level_dimensions use in examples
AdamBajger 5cd85ad
fix: enhance memory setup in AutoScalingAveragingClippingNumpyMemMapM…
AdamBajger d65d8be
chore: ruff format
AdamBajger 84635ce
docs: add a short remark about the memmap tempfile behaviour
AdamBajger 9a245f1
feat: implement a Factory class for composing mask builders dynamically
AdamBajger 0e44bc6
refactor: naming and typing
AdamBajger aa78fdd
feat: debugging tests
AdamBajger ee3a39d
exp: wip refactor slightly, fix tests on Windows platform
AdamBajger a46b207
fix: fix errors, update names
AdamBajger 2ca441a
chore: roll back random change I dont remember making
AdamBajger 2ffb1ae
refactor: use better variable name
AdamBajger b8fb06e
fix: memmap deletion logic to not raise exceptions during garbage col…
AdamBajger 5c4b751
fix: overlap counter dtype
AdamBajger 44c6332
fix: minor code redundancy
AdamBajger 55e80e5
docs: update docs and formatting using ruff
AdamBajger 35bfcc8
fix: mypy warnings
AdamBajger b806fff
chore: remove trailing space to make ruff happy
AdamBajger 1fca80f
fix: mypy warnings for code someone else did
AdamBajger 1b4ad69
refactor: code structure for improved readability and maintainability
matejpekar 312c818
fix: documentation
matejpekar c46e1de
fix: ruff
matejpekar 3b81f0b
fix: improved storage + docstrings
matejpekar 6e85a13
Update docs/reference/masks/mask_builders.md
matejpekar 594c398
Update tests/test_mask_builders.py
matejpekar d3646bb
fix: tests and naming
matejpekar 7570184
fix: update version
matejpekar 141ba3b
fix: version
matejpekar 3f9a3c4
fix: negative values
matejpekar f0c3c6d
refactor: simplify Aggregator interface to single-tile updates and re…
matejpekar 6772577
refactor: remove unused safely_instantiate utility function
matejpekar 6b5eb79
chore: remove outdated file
matejpekar 411aedc
chore: merge branch 'main' into mask-builders
matejpekar 2bc50a8
fix: minor mistakes
matejpekar 858a065
Apply suggestions from code review
matejpekar 3550b69
Strip ICC profile data before saving TIFF
matejpekar 2d2717c
fix: edge clipping
matejpekar File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,4 +19,7 @@ wheels/ | |
| .mypy_cache/ | ||
|
|
||
| # VS Code | ||
| .vscode/ | ||
| .vscode/ | ||
|
|
||
| # MkDocs | ||
| site/ | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,181 @@ | ||
| # Mask Builders | ||
|
|
||
| Mask builders are tools for assembling feature masks from neural network predictions or other tile-level data. They handle the complexity of combining overlapping tiles, scaling between coordinate spaces, and managing memory for large output masks using a flexible strategy-based architecture. | ||
|
|
||
| ## Overview | ||
|
|
||
| When processing whole-slide images with neural networks, you often need to: | ||
|
|
||
| 1. Extract tiles from a slide | ||
| 2. Run inference to get predictions or features for each tile | ||
| 3. Assemble these predictions back into a full-resolution mask | ||
|
|
||
| Mask builders automate step 3, handling: | ||
|
|
||
| - **Coordinate scaling**: Converting from source WSI coordinates to mask coordinates — including automatic GCD-based compression when tiles and strides share common factors. | ||
| - **Overlap handling**: Averaging or taking the maximum when tiles overlap. | ||
| - **Memory management**: Using in-memory arrays or memory-mapped files for large masks. | ||
| - **Scalar expansion**: Broadcasting scalar per-tile predictions `(B, C)` into spatial tiles automatically. | ||
| - **Edge clipping**: Removing border artifacts from model output tiles at update time. | ||
|
|
||
| ## MaskBuilder | ||
|
|
||
| ::: ratiopath.masks.mask_builders.MaskBuilder | ||
|
|
||
| The `MaskBuilder` is the central orchestrator. You configure it by providing: | ||
|
|
||
| - `source_extents`: Spatial dimensions of the source WSI (H, W, ...). | ||
| - `source_tile_extent`: Spatial dimensions of the model input tiles. | ||
| - `output_tile_extent`: Spatial dimensions of the model output tiles (can differ from input due to pooling/stride). | ||
| - `stride`: Stride between tiles in source resolution. | ||
| - `storage`: Where the mask is stored — `"inmemory"` (RAM) or `"memmap"` (disk-backed). | ||
| - `aggregation`: How overlapping tiles are merged — `MeanAggregator` (default) or `MaxAggregator`. | ||
|
|
||
| The mask shape is computed automatically from the source extents, tile extents, and stride using GCD-based compression for efficient memory use. | ||
|
|
||
| ## Components | ||
|
|
||
| ### Storage Strategies | ||
|
|
||
| ::: ratiopath.masks.mask_builders.InMemory | ||
| ::: ratiopath.masks.mask_builders.MemMap | ||
|
|
||
| ### Aggregation Strategies | ||
|
|
||
| ::: ratiopath.masks.mask_builders.MeanAggregator | ||
| ::: ratiopath.masks.mask_builders.MaxAggregator | ||
|
|
||
| ## Examples | ||
|
|
||
| ### Averaging Scalar Predictions | ||
|
|
||
| **Use case**: You have scalar predictions (e.g., class probabilities) for each tile. Each prediction is uniformly expanded to fill the tile's footprint, and overlapping regions are averaged. | ||
|
|
||
| ```python | ||
| import numpy as np | ||
| import openslide | ||
| from ratiopath.masks.mask_builders import MaskBuilder, MeanAggregator | ||
| import matplotlib.pyplot as plt | ||
|
|
||
| # Set up tiling parameters | ||
| LEVEL = 3 | ||
| tile_extents = (512, 512) | ||
| tile_strides = (256, 256) | ||
| slide = openslide.OpenSlide("path/to/slide.mrxs") | ||
| slide_w, slide_h = slide.level_dimensions[LEVEL] | ||
|
|
||
| # output_tile_extent=(1, 1) means scalar data — the builder | ||
| # broadcasts (B, C) → (B, C, 1, 1) and upscales automatically. | ||
| mask_builder = MaskBuilder( | ||
| source_extents=(slide_h, slide_w), | ||
| source_tile_extent=tile_extents, | ||
| output_tile_extent=(1, 1), | ||
| stride=tile_strides, | ||
| n_channels=1, | ||
| storage="inmemory", | ||
| aggregation=MeanAggregator, | ||
| dtype=np.float32, | ||
| ) | ||
|
|
||
| # Process tiles | ||
| for tiles, xs, ys in generate_tiles_from_slide(slide, LEVEL, tile_extents, tile_strides): | ||
| features = model.predict(tiles) # features shape: (B, 1) | ||
| coords_batch = np.stack([ys, xs], axis=1) # shape: (B, 2) | ||
| mask_builder.update_batch(features, coords_batch) | ||
|
|
||
| # Finalize — MeanAggregator returns {"mask": ..., "overlap_counter": ...} | ||
| results = mask_builder.finalize() | ||
| assembled_mask = results["mask"] | ||
| overlap_counter = results["overlap_counter"] | ||
|
|
||
| plt.imshow(assembled_mask[0], cmap="gray") | ||
| plt.show() | ||
|
|
||
| # Always clean up to release storage resources | ||
| mask_builder.cleanup() | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### Max Aggregation with Edge Clipping (MemMap) | ||
|
|
||
| **Use case**: You have high-resolution feature maps. You want to preserve the maximum signal where tiles overlap, remove border pixels from each tile edge to avoid artifacts, and use disk storage because the mask is very large. | ||
|
|
||
| ```python | ||
| import numpy as np | ||
| from ratiopath.masks.mask_builders import MaskBuilder, MaxAggregator | ||
|
|
||
| # Dense output — output tiles match input tiles in spatial size | ||
| mask_builder = MaskBuilder( | ||
| source_extents=(10000, 10000), | ||
| source_tile_extent=(512, 512), | ||
| output_tile_extent=(512, 512), | ||
| stride=(256, 256), | ||
| n_channels=3, | ||
| storage="memmap", | ||
| aggregation=MaxAggregator, | ||
| dtype=np.float32, | ||
| filename="large_mask.npy", # persisted to disk | ||
| ) | ||
|
|
||
| for tiles, coords in tile_generator: | ||
| predictions = model.predict(tiles) # (B, 3, 512, 512) | ||
| # edge_clipping=4 removes 4px from each edge of every tile | ||
| mask_builder.update_batch(predictions, coords, edge_clipping=4) | ||
|
|
||
| # MaxAggregator returns the accumulator NDArray directly | ||
| assembled_mask = mask_builder.finalize() | ||
| mask_builder.cleanup() | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ### Auto-Scaling Coordinates (Different Input/Output Resolution) | ||
|
|
||
| **Use case**: Your model's output tiles have different spatial dimensions than the input tiles (e.g., due to stride or pooling). The builder auto-scales coordinates between source and mask resolution. | ||
|
|
||
| ```python | ||
| import numpy as np | ||
| from ratiopath.masks.mask_builders import MaskBuilder, MeanAggregator | ||
|
|
||
| # Model takes 512×512 input tiles, produces 128×128 output tiles (4× downsampled) | ||
| mask_builder = MaskBuilder( | ||
| source_extents=(2000, 2000), | ||
| source_tile_extent=(512, 512), | ||
| output_tile_extent=(128, 128), | ||
| stride=(256, 256), | ||
| n_channels=1, | ||
| storage="inmemory", | ||
| aggregation=MeanAggregator, | ||
| dtype=np.float32, | ||
| ) | ||
|
|
||
| # Coordinates are always in SOURCE resolution — the builder | ||
| # handles the conversion to mask resolution internally. | ||
| for tiles, coords in tile_generator: | ||
| predictions = model.predict(tiles) # (B, 1, 128, 128) | ||
| mask_builder.update_batch(predictions, coords) | ||
|
|
||
| results = mask_builder.finalize() | ||
| mask_builder.cleanup() | ||
| ``` | ||
|
|
||
| ## Coordinate System Notes | ||
|
|
||
| All mask builders expect coordinates in the format `(B, N)` where: | ||
|
|
||
| - `B` is the batch size. | ||
| - `N` is the number of spatial dimensions (typically 2 for height and width). | ||
|
|
||
| Note the order: `[ys, xs]` not `[xs, ys]`, as the first dimension represents height (y) and the second represents width (x), matching the NumPy `(C, H, W)` convention used by the builder. | ||
|
|
||
| ## Lifecycle | ||
|
|
||
| Always call `cleanup()` when you are done with a `MaskBuilder` to release storage resources (especially important for `MemMap` storage which holds file handles): | ||
|
|
||
| ```python | ||
| mask_builder = MaskBuilder(...) | ||
| # ... update_batch calls ... | ||
| results = mask_builder.finalize() | ||
| mask_builder.cleanup() | ||
| ``` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| from ratiopath.masks.tissue_mask import tissue_mask | ||
| from ratiopath.masks.write_big_tiff import write_big_tiff | ||
|
|
||
|
|
||
| __all__ = ["tissue_mask"] | ||
| __all__ = ["tissue_mask", "write_big_tiff"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| from ratiopath.masks.mask_builders.aggregation import ( | ||
| Aggregator, | ||
| MaxAggregator, | ||
| MeanAggregator, | ||
| ) | ||
| from ratiopath.masks.mask_builders.mask_builder import MaskBuilder | ||
| from ratiopath.masks.mask_builders.storage import InMemory, MemMap | ||
|
|
||
|
|
||
| __all__ = [ | ||
| "Aggregator", | ||
| "InMemory", | ||
| "MaskBuilder", | ||
| "MaxAggregator", | ||
| "MeanAggregator", | ||
| "MemMap", | ||
| ] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,136 @@ | ||
| from __future__ import annotations | ||
|
|
||
| from abc import ABC, abstractmethod | ||
| from pathlib import Path | ||
| from typing import TYPE_CHECKING, Any, TypedDict, cast | ||
|
|
||
| import numpy as np | ||
| from numpy.typing import NDArray | ||
|
|
||
|
|
||
| if TYPE_CHECKING: | ||
| from collections.abc import Callable | ||
|
matejpekar marked this conversation as resolved.
Dismissed
|
||
|
|
||
|
|
||
| class Aggregator[DType: np.generic, R](ABC): | ||
| """Abstract base class for aggregation strategies.""" | ||
|
|
||
| def __init__(self, storage: NDArray[DType], **kwargs: Any) -> None: | ||
| return | ||
|
|
||
| @abstractmethod | ||
| def update( | ||
| self, accumulator: NDArray[DType], sample: np.ndarray, coords: NDArray[np.int64] | ||
| ) -> None: | ||
| """Update the accumulator with a single tile sample.""" | ||
|
|
||
| @abstractmethod | ||
| def finalize(self, accumulator: NDArray[DType]) -> R: | ||
| """Finalize the mask assembly and return the result.""" | ||
|
|
||
| def cleanup(self) -> None: | ||
| """Optional cleanup method to release resources if needed.""" | ||
| return | ||
|
|
||
| def _get_acc_slices( | ||
| self, coords: NDArray[np.int64], mask_tile_extents: NDArray[np.int64] | ||
| ) -> tuple[slice, ...]: | ||
| """Compute slice objects for accumulator indexing. | ||
|
|
||
| Args: | ||
| coords: Array of shape (N,) with top-left coordinates in N dimensions. | ||
| mask_tile_extents: Array of shape (N,) with tile size in mask space for each dimension. | ||
|
|
||
| Returns: | ||
| Tuple containing N slice objects for indexing into the accumulator. | ||
| """ | ||
| tile_end_coords = coords + mask_tile_extents | ||
| return tuple( | ||
| slice(int(start), int(end)) | ||
| for start, end in zip(coords, tile_end_coords, strict=True) | ||
| ) | ||
|
|
||
|
|
||
| class MeanAggregatorResults[Dtype: np.generic](TypedDict): | ||
| mask: NDArray[Dtype] | ||
| overlap_counter: NDArray[np.uint16] | ||
|
|
||
|
|
||
| class MeanAggregator[DType: np.generic]( | ||
| Aggregator[DType, MeanAggregatorResults[DType]] | ||
| ): | ||
| """Aggregator that implements averaging aggregation for overlapping tiles. | ||
|
|
||
| This aggregator accumulates tiles by addition and tracks the overlap count at each pixel. | ||
| During finalization, the accumulated values are divided by the overlap count to compute | ||
| the average value at each position. This is useful for: | ||
| - Smoothly blending overlapping tile predictions | ||
| - Reducing edge artifacts in sliding window processing | ||
| - Computing ensemble averages from multiple passes | ||
|
|
||
| The aggregator allocates an additional `overlap_counter` accumulator with shape (1, *SpatialDims) | ||
| to track how many tiles contributed to each pixel position. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| storage: NDArray[DType], | ||
| filename: Path | str | None = None, | ||
| overlap_counter_filename: Path | str | None = None, | ||
| **kwargs: Any, | ||
| ) -> None: | ||
| overlap_filename = overlap_counter_filename | ||
| if overlap_filename is None and filename is not None: | ||
| path = Path(filename) | ||
| overlap_filename = path.with_suffix(f".overlaps{path.suffix}") | ||
|
|
||
| storage_cls = cast("Callable[..., NDArray[np.uint16]]", type(storage)) | ||
| self.overlap_counter = storage_cls( | ||
| filename=overlap_filename, | ||
| shape=(1, *storage.shape[1:]), | ||
| dtype=np.uint16, | ||
| **kwargs, | ||
| ) | ||
|
|
||
| def update( | ||
| self, accumulator: NDArray[DType], sample: np.ndarray, coords: NDArray[np.int64] | ||
| ) -> None: | ||
| mask_tile_extents = np.asarray(sample.shape[1:], dtype=np.int64) | ||
| acc_slices = self._get_acc_slices(coords, mask_tile_extents) | ||
| accumulator[:, *acc_slices] += sample # type: ignore[misc] | ||
| self.overlap_counter[:, *acc_slices] += 1 | ||
|
|
||
|
matejpekar marked this conversation as resolved.
|
||
| def finalize(self, accumulator: NDArray[DType]) -> MeanAggregatorResults[DType]: | ||
| accumulator /= self.overlap_counter.clip(min=1) # type: ignore[misc] | ||
| return { | ||
| "mask": accumulator, | ||
| "overlap_counter": self.overlap_counter, | ||
| } | ||
|
|
||
| def cleanup(self) -> None: | ||
| if hasattr(self, "overlap_counter"): | ||
| if hasattr(self.overlap_counter, "close"): | ||
| self.overlap_counter.close() | ||
| del self.overlap_counter | ||
|
matejpekar marked this conversation as resolved.
matejpekar marked this conversation as resolved.
|
||
|
|
||
|
|
||
| class MaxAggregator[DType: np.generic](Aggregator[DType, NDArray[DType]]): | ||
| """Aggregator that implements maximum aggregation for overlapping tiles. | ||
|
|
||
| This aggregator keeps only the maximum value at each pixel position when tiles overlap. | ||
| No additional storage is required, and finalization is a no-op since the accumulator | ||
| already contains the final max values. This is useful for: | ||
| - Maximum intensity projection | ||
| - Keeping the highest confidence prediction across overlapping tiles | ||
| - Peak detection across multiple scales | ||
| """ | ||
|
|
||
| def update( | ||
| self, accumulator: NDArray[DType], sample: np.ndarray, coords: NDArray[np.int64] | ||
| ) -> None: | ||
| mask_tile_extents = np.asarray(sample.shape[1:], dtype=np.int64) | ||
| acc_slices = self._get_acc_slices(coords, mask_tile_extents) | ||
| accumulator[:, *acc_slices] = np.maximum(accumulator[:, *acc_slices], sample) | ||
|
|
||
| def finalize(self, accumulator: NDArray[DType]) -> NDArray[DType]: | ||
| return accumulator | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.