Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion ratiopath/parsers/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from ratiopath.parsers.asap_parser import ASAPParser
from ratiopath.parsers.darwin7_json_parser import Darwin7JSONParser
from ratiopath.parsers.geojson_parser import GeoJSONParser
from ratiopath.parsers.json_multi_class_mask import JSONMultiClassMask


__all__ = ["ASAPParser", "Darwin7JSONParser", "GeoJSONParser"]
__all__ = ["ASAPParser", "Darwin7JSONParser", "GeoJSONParser", "JSONMultiClassMask"]
45 changes: 45 additions & 0 deletions ratiopath/parsers/json_multi_class_mask.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
import json
from pathlib import Path

import numpy as np
from PIL import Image, ImageDraw


class JSONMultiClassMask:
"""Parses JSON annotations and renders them into a multi-class mask array.

Expects class to value mapping on input.
"""

def __init__(
self,
json_path: Path,
mask_size: tuple[int, int],
scale_factor: float,
target_groups: dict[str, int],
):
self.json_path = json_path
self.mask_size = mask_size # (width, height)
self.scale = scale_factor
self.target_groups = target_groups

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good practice to validate input parameters to ensure they meet the expected criteria. The mask_size should be a tuple of two positive integers. Adding this check improves the robustness of the class by preventing potential errors later if invalid dimensions are provided.

Suggested change
if not (isinstance(mask_size, tuple) and len(mask_size) == 2 and all(isinstance(d, int) and d > 0 for d in mask_size)):
raise ValueError("mask_size must be a tuple of two positive integers (width, height).")
self.mask_size = mask_size # (width, height)

def __call__(self) -> np.ndarray:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The scale_factor should ideally be a positive value to ensure meaningful scaling. A non-positive scale factor could lead to unexpected or undefined behavior in the coordinate transformation. Adding a validation check here improves the robustness of the class.

Suggested change
def __call__(self) -> np.ndarray:
if scale_factor <= 0:
raise ValueError("scale_factor must be a positive value.")
self.scale = scale_factor

Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new parser/mask renderer introduces non-trivial behavior (JSON schema expectations, scaling, polygon rasterization, class ID mapping) but there are no tests covering it. Since the repo already has a dedicated tests/test_parsers.py suite for other parsers, please add unit tests for JSONMultiClassMask (e.g., basic polygon fill, multiple classes, scaling factor, and ignoring unknown classes).

Copilot uses AI. Check for mistakes.
Comment on lines +8 to +26
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add full Google-style API docstrings for the new public parser.

Line 8 and Line 26 expose a new public API, but the current docs are too thin for mkdocstrings and miss structured sections (Args, Returns, Raises, Examples).

✍️ Suggested docstring update
+"""JSON parser that rasterizes polygon annotations into multi-class masks."""
+
 class JSONMultiClassMask:
-    """Parses JSON annotations and renders them into a multi-class mask array.
-    
-    Expects class to value mapping on input.
-    """
+    """Parse JSON polygon annotations into a multi-class mask.
+
+    Args:
+        json_path: Path to the JSON annotation file.
+        mask_size: Output mask size as ``(width, height)``.
+        scale_factor: Multiplicative factor applied to polygon coordinates.
+        target_groups: Mapping from annotation class name to integer mask value.
+
+    Examples:
+        >>> from pathlib import Path
+        >>> from ratiopath.parsers import JSONMultiClassMask
+        >>> parser = JSONMultiClassMask(
+        ...     json_path=Path("sample.json"),
+        ...     mask_size=(1024, 1024),
+        ...     scale_factor=1.0,
+        ...     target_groups={"tumor": 1, "stroma": 2},
+        ... )
+        >>> mask = parser()
+    """
@@
     def __call__(self) -> np.ndarray:
+        """Render the configured JSON annotations into a NumPy mask.
+
+        Returns:
+            A 2D NumPy array containing class IDs per pixel.
+
+        Raises:
+            FileNotFoundError: If ``json_path`` does not exist.
+            json.JSONDecodeError: If the JSON file is malformed.
+            ValueError: If annotation payload contains invalid polygon data.
+        """
         mask_img = Image.new("L", self.mask_size, 0)

As per coding guidelines, **/*.py: "Add documentation for any new public functions, classes, or modules you create", "For API documentation, ensure docstrings are clear enough for mkdocstrings output", and "Use Google-style docstrings with clear Args:, Returns:, Yields:, Raises:, and Examples: sections when relevant".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/parsers/json_multi_class_mask.py` around lines 8 - 26, Add a full
Google-style API docstring for the public class JSONMultiClassMask (and its
__call__ behavior) that replaces the terse module docstring: describe the class
purpose, include an Args: section documenting json_path: Path, mask_size:
tuple[int, int] (width, height), scale_factor: float, target_groups: dict[str,
int]; a Returns: section describing the numpy.ndarray mask shape and dtype; a
Raises: section listing possible exceptions (e.g., FileNotFoundError,
ValueError) and when they are raised; and an Examples: snippet showing how to
instantiate JSONMultiClassMask and call it to obtain the mask. Ensure the
docstring is attached to the class definition (so mkdocstrings picks it up) and
mentions __call__ semantics rather than duplicating logic in __init__.

mask_img = Image.new("L", self.mask_size, 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For Pillow 11.1.0, what is the valid fill value range for Image mode "L" in ImageDraw.polygon, and what happens if fill is outside 0-255?

💡 Result:

For Pillow 11.1.0, the valid fill value range for Image mode "L" in ImageDraw.polygon is integers 0-255. "L" mode is 8-bit pixels, grayscale, where each pixel uses the full range 0-255.

Citations:


🏁 Script executed:

cat -n ratiopath/parsers/json_multi_class_mask.py | head -60

Repository: RationAI/ratiopath

Length of output: 1662


🌐 Web query:

PIL Pillow ImageDraw.polygon fill value out of range 8-bit mode L what happens truncation modulo

💡 Result:

In Pillow's ImageDraw.polygon for an 8-bit "L" mode image, if the fill value exceeds the valid range (0-255), it undergoes truncation to the lower 8 bits. This is equivalent to a modulo 256 operation for positive integers, as the C implementation casts the integer ink value to UINT8, discarding higher bits.

Citations:


Validate class IDs against 8-bit mask limits to prevent class collision.

Line 27 creates an "L" mode mask (8-bit grayscale, values 0–255), but Line 41 can pass arbitrary integer class values from target_groups without validation. Values outside 0–255 are truncated to the lower 8 bits (equivalent to value % 256), causing distinct class IDs to collapse into the same pixel value in the output mask.

For example, class IDs 256 and 0 would both render as 0; 257 and 1 would both render as 1.

Add validation in the constructor to ensure all target_groups values are in the valid range:

Suggested guard
     def __init__(
         self,
         json_path: Path,
         mask_size: tuple[int, int],
         scale_factor: float,
         target_groups: dict[str, int],
     ):
+        invalid = {k: v for k, v in target_groups.items() if not (0 <= v <= 255)}
+        if invalid:
+            raise ValueError(
+                f"target_groups values must be in [0, 255] for mode 'L'. Got: {invalid}"
+            )
         self.json_path = json_path
         self.mask_size = mask_size  # (width, height)
         self.scale = scale_factor
         self.target_groups = target_groups
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/parsers/json_multi_class_mask.py` at line 27, The mask is created
in "L" mode (8-bit) but target_groups can contain integers outside 0–255,
causing collisions; in the class constructor (the __init__ that accepts
target_groups) validate every value in target_groups is an int within 0..255 and
raise a clear ValueError listing any invalid IDs, or alternatively
normalize/check types before use; update the constructor that initializes
target_groups (referencing target_groups and mask_img creation) to perform this
guard so values written later (used in mask pixel assignment) cannot overflow
the 8-bit mask.

draw = ImageDraw.Draw(mask_img)
Comment on lines +27 to +28
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mask is created in PIL mode "L" (8-bit), so class IDs are limited to 0–255 and values outside that range will be clipped/wrapped when drawing and when converting to NumPy. If target_groups can contain IDs >255 (common for multi-class), consider using a higher-bit image mode (e.g., "I;16"/"I") and returning an appropriate NumPy dtype, or validate/enforce that all fill values are within 0–255.

Copilot uses AI. Check for mistakes.

with open(self.json_path) as f:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open(self.json_path) relies on the platform default encoding; other parsers in this repo open JSON as UTF-8 explicitly. Use encoding="utf-8" here as well to avoid mis-decoding non-ASCII annotation files on systems with a different default encoding.

Suggested change
with open(self.json_path) as f:
with open(self.json_path, encoding="utf-8") as f:

Copilot uses AI. Check for mistakes.
data = json.load(f)
Comment on lines +30 to +31
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To make the file loading more robust and user-friendly, it's beneficial to explicitly handle FileNotFoundError and json.JSONDecodeError. This provides clearer error messages to the user if the JSON file is missing or malformed, improving the debugging experience.

        try:
            with open(self.json_path) as f:
                data = json.load(f)
        except FileNotFoundError:
            raise FileNotFoundError(f"JSON file not found at {self.json_path}")
        except json.JSONDecodeError:
            raise ValueError(f"Invalid JSON format in file {self.json_path}")


for item in data.get("items", []):
class_name = item.get("name")

if class_name in self.target_groups:
coords = item.get("coordinates", [])
if not coords:
continue

fill_value = self.target_groups[class_name]
scaled_coords = [(c[0] * self.scale, c[1] * self.scale) for c in coords]
draw.polygon(scaled_coords, fill=fill_value)
Comment on lines +42 to +43
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden coordinate parsing to avoid crashing on malformed items.

Line 42 assumes every coordinate is an indexable 2-item numeric pair. A single malformed polygon point can raise TypeError/IndexError and abort the entire parse.

✅ Suggested robust parsing
-                scaled_coords = [(c[0] * self.scale, c[1] * self.scale) for c in coords]
+                try:
+                    scaled_coords = [
+                        (float(c[0]) * self.scale, float(c[1]) * self.scale)
+                        for c in coords
+                        if isinstance(c, (list, tuple)) and len(c) >= 2
+                    ]
+                except (TypeError, ValueError) as exc:
+                    raise ValueError(
+                        f"Invalid coordinates for class '{class_name}' in {self.json_path}"
+                    ) from exc
+
+                if len(scaled_coords) < 3:
+                    continue
                 draw.polygon(scaled_coords, fill=fill_value)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
scaled_coords = [(c[0] * self.scale, c[1] * self.scale) for c in coords]
draw.polygon(scaled_coords, fill=fill_value)
try:
scaled_coords = [
(float(c[0]) * self.scale, float(c[1]) * self.scale)
for c in coords
if isinstance(c, (list, tuple)) and len(c) >= 2
]
except (TypeError, ValueError) as exc:
raise ValueError(
f"Invalid coordinates for class '{class_name}' in {self.json_path}"
) from exc
if len(scaled_coords) < 3:
continue
draw.polygon(scaled_coords, fill=fill_value)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/parsers/json_multi_class_mask.py` around lines 42 - 43, The
comprehension building scaled_coords in json_multi_class_mask.py assumes every
item in coords is a 2-element numeric sequence and will crash on malformed
entries; update the parsing in the block that creates scaled_coords (and before
calling draw.polygon) to validate and sanitize each point: for each c in coords
ensure it is indexable with at least two elements and that c[0] and c[1] can be
converted to numbers (e.g., float), skip or log any invalid points, and only
include validated (x* self.scale, y* self.scale) pairs in scaled_coords so
draw.polygon receives a clean list and the parser doesn't raise
TypeError/IndexError on bad input.


return np.array(mask_img)
Loading