Skip to content

chore: JSONMultiClassMask#47

Closed
borisim wants to merge 3 commits into
mainfrom
JSONMultiClassMask
Closed

chore: JSONMultiClassMask#47
borisim wants to merge 3 commits into
mainfrom
JSONMultiClassMask

Conversation

@borisim
Copy link
Copy Markdown

@borisim borisim commented Mar 25, 2026

Summary by CodeRabbit

  • New Features
    • Added support for loading JSON annotation files and converting polygon data to rasterized image masks with multi-class support and customizable class mapping.

@borisim borisim requested review from a team and Copilot March 25, 2026 13:52
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new core utility for handling image segmentation annotations. It provides a robust way to convert structured JSON data containing polygon coordinates and class names into a numerical mask array, enabling downstream machine learning tasks to easily consume annotated data.

Highlights

  • New Feature: JSON Multi-Class Mask Generation: Introduced JSONMultiClassMask, a new utility class designed to parse JSON annotation files and render them into multi-class image masks.
  • Annotation Processing: The class processes polygon coordinates from JSON data, scales them, and draws them onto a mask image using specified class-to-value mappings.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 25, 2026

Warning

Rate limit exceeded

@borisim has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 18 minutes and 44 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ff31f826-afa9-47af-a40e-40338dd6359b

📥 Commits

Reviewing files that changed from the base of the PR and between b1b0ef2 and f888896.

📒 Files selected for processing (1)
  • ratiopath/parsers/json_multi_class_mask.py
📝 Walkthrough

Walkthrough

The changes introduce a new JSONMultiClassMask parser class that loads JSON annotation files, processes polygon coordinates with scaling, filters items by target groups, and rasterizes them into single-channel image masks returned as NumPy arrays. The class is exported from the parsers module.

Changes

Cohort / File(s) Summary
Module Export Updates
ratiopath/parsers/__init__.py
Added JSONMultiClassMask to module imports and __all__ export list for public accessibility.
JSON Multi-Class Mask Parser
ratiopath/parsers/json_multi_class_mask.py
New callable class that loads JSON annotations, filters by target groups, scales and rasterizes polygons into 8-bit single-channel image masks as NumPy arrays.

Sequence Diagram

sequenceDiagram
    actor User
    participant JSONMultiClassMask
    participant JSONFile as JSON File
    participant ImageMask as Image Mask
    participant Output as NumPy Array

    User->>JSONMultiClassMask: __call__()
    JSONMultiClassMask->>JSONFile: Load JSON from path
    JSONFile-->>JSONMultiClassMask: Annotation data
    JSONMultiClassMask->>JSONMultiClassMask: Filter items by target_groups
    JSONMultiClassMask->>JSONMultiClassMask: Extract & scale polygon coordinates
    JSONMultiClassMask->>ImageMask: Create "L" mode image (zeros)
    JSONMultiClassMask->>ImageMask: Rasterize polygons with mapped class values
    ImageMask-->>JSONMultiClassMask: Rasterized mask
    JSONMultiClassMask->>Output: Convert to NumPy array
    Output-->>User: Return mask array
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A JSON hop, coordinates to scale,
Polygons drawn across the pale,
Masks rasterized, class values bright,
From annotations to NumPy light! ✨
The parser leaps with masked delight!

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'chore: JSONMultiClassMask' is vague and lacks descriptive detail about what the change accomplishes. Clarify the title to describe the actual change, such as 'Add JSONMultiClassMask parser for rasterizing multi-class polygon annotations' or 'Introduce JSONMultiClassMask class for JSON annotation loading and masking'.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch JSONMultiClassMask

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new JSONMultiClassMask class designed to parse JSON annotations and render them into a multi-class mask array. The class initializes with a JSON file path, mask size, scale factor, and target group mappings, then processes the JSON data to draw scaled polygons onto an image, returning a NumPy array. Feedback suggests enhancing the class's robustness by adding input validation for mask_size and scale_factor in the constructor, and by implementing explicit error handling for FileNotFoundError and json.JSONDecodeError during JSON file loading.

self.mask_size = mask_size # (width, height)
self.scale = scale_factor
self.target_groups = target_groups

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It's good practice to validate input parameters to ensure they meet the expected criteria. The mask_size should be a tuple of two positive integers. Adding this check improves the robustness of the class by preventing potential errors later if invalid dimensions are provided.

Suggested change
if not (isinstance(mask_size, tuple) and len(mask_size) == 2 and all(isinstance(d, int) and d > 0 for d in mask_size)):
raise ValueError("mask_size must be a tuple of two positive integers (width, height).")
self.mask_size = mask_size # (width, height)

self.scale = scale_factor
self.target_groups = target_groups

def __call__(self) -> np.ndarray:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The scale_factor should ideally be a positive value to ensure meaningful scaling. A non-positive scale factor could lead to unexpected or undefined behavior in the coordinate transformation. Adding a validation check here improves the robustness of the class.

Suggested change
def __call__(self) -> np.ndarray:
if scale_factor <= 0:
raise ValueError("scale_factor must be a positive value.")
self.scale = scale_factor

Comment on lines +30 to +31
with open(self.json_path) as f:
data = json.load(f)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To make the file loading more robust and user-friendly, it's beneficial to explicitly handle FileNotFoundError and json.JSONDecodeError. This provides clearer error messages to the user if the JSON file is missing or malformed, improving the debugging experience.

        try:
            with open(self.json_path) as f:
                data = json.load(f)
        except FileNotFoundError:
            raise FileNotFoundError(f"JSON file not found at {self.json_path}")
        except json.JSONDecodeError:
            raise ValueError(f"Invalid JSON format in file {self.json_path}")

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new JSON-driven annotation rasterizer to the ratiopath.parsers area, intended to convert polygon annotations into a multi-class mask array for downstream tiling / ML workflows.

Changes:

  • Introduces JSONMultiClassMask to load a JSON file, scale polygon coordinates, and rasterize polygons into a class-ID mask.
  • Uses Pillow drawing APIs to render polygons and returns the result as a NumPy array.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +27 to +28
mask_img = Image.new("L", self.mask_size, 0)
draw = ImageDraw.Draw(mask_img)
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mask is created in PIL mode "L" (8-bit), so class IDs are limited to 0–255 and values outside that range will be clipped/wrapped when drawing and when converting to NumPy. If target_groups can contain IDs >255 (common for multi-class), consider using a higher-bit image mode (e.g., "I;16"/"I") and returning an appropriate NumPy dtype, or validate/enforce that all fill values are within 0–255.

Copilot uses AI. Check for mistakes.
mask_img = Image.new("L", self.mask_size, 0)
draw = ImageDraw.Draw(mask_img)

with open(self.json_path) as f:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

open(self.json_path) relies on the platform default encoding; other parsers in this repo open JSON as UTF-8 explicitly. Use encoding="utf-8" here as well to avoid mis-decoding non-ASCII annotation files on systems with a different default encoding.

Suggested change
with open(self.json_path) as f:
with open(self.json_path, encoding="utf-8") as f:

Copilot uses AI. Check for mistakes.
self.scale = scale_factor
self.target_groups = target_groups

def __call__(self) -> np.ndarray:
Copy link

Copilot AI Mar 25, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new parser/mask renderer introduces non-trivial behavior (JSON schema expectations, scaling, polygon rasterization, class ID mapping) but there are no tests covering it. Since the repo already has a dedicated tests/test_parsers.py suite for other parsers, please add unit tests for JSONMultiClassMask (e.g., basic polygon fill, multiple classes, scaling factor, and ignoring unknown classes).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ratiopath/parsers/json_multi_class_mask.py`:
- Around line 42-43: The comprehension building scaled_coords in
json_multi_class_mask.py assumes every item in coords is a 2-element numeric
sequence and will crash on malformed entries; update the parsing in the block
that creates scaled_coords (and before calling draw.polygon) to validate and
sanitize each point: for each c in coords ensure it is indexable with at least
two elements and that c[0] and c[1] can be converted to numbers (e.g., float),
skip or log any invalid points, and only include validated (x* self.scale, y*
self.scale) pairs in scaled_coords so draw.polygon receives a clean list and the
parser doesn't raise TypeError/IndexError on bad input.
- Around line 8-26: Add a full Google-style API docstring for the public class
JSONMultiClassMask (and its __call__ behavior) that replaces the terse module
docstring: describe the class purpose, include an Args: section documenting
json_path: Path, mask_size: tuple[int, int] (width, height), scale_factor:
float, target_groups: dict[str, int]; a Returns: section describing the
numpy.ndarray mask shape and dtype; a Raises: section listing possible
exceptions (e.g., FileNotFoundError, ValueError) and when they are raised; and
an Examples: snippet showing how to instantiate JSONMultiClassMask and call it
to obtain the mask. Ensure the docstring is attached to the class definition (so
mkdocstrings picks it up) and mentions __call__ semantics rather than
duplicating logic in __init__.
- Line 27: The mask is created in "L" mode (8-bit) but target_groups can contain
integers outside 0–255, causing collisions; in the class constructor (the
__init__ that accepts target_groups) validate every value in target_groups is an
int within 0..255 and raise a clear ValueError listing any invalid IDs, or
alternatively normalize/check types before use; update the constructor that
initializes target_groups (referencing target_groups and mask_img creation) to
perform this guard so values written later (used in mask pixel assignment)
cannot overflow the 8-bit mask.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: af80c26e-fe40-49ec-899f-ca035526253b

📥 Commits

Reviewing files that changed from the base of the PR and between 4f8c69b and b1b0ef2.

📒 Files selected for processing (2)
  • ratiopath/parsers/__init__.py
  • ratiopath/parsers/json_multi_class_mask.py

Comment on lines +8 to +26
class JSONMultiClassMask:
"""Parses JSON annotations and renders them into a multi-class mask array.

Expects class to value mapping on input.
"""

def __init__(
self,
json_path: Path,
mask_size: tuple[int, int],
scale_factor: float,
target_groups: dict[str, int],
):
self.json_path = json_path
self.mask_size = mask_size # (width, height)
self.scale = scale_factor
self.target_groups = target_groups

def __call__(self) -> np.ndarray:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Add full Google-style API docstrings for the new public parser.

Line 8 and Line 26 expose a new public API, but the current docs are too thin for mkdocstrings and miss structured sections (Args, Returns, Raises, Examples).

✍️ Suggested docstring update
+"""JSON parser that rasterizes polygon annotations into multi-class masks."""
+
 class JSONMultiClassMask:
-    """Parses JSON annotations and renders them into a multi-class mask array.
-    
-    Expects class to value mapping on input.
-    """
+    """Parse JSON polygon annotations into a multi-class mask.
+
+    Args:
+        json_path: Path to the JSON annotation file.
+        mask_size: Output mask size as ``(width, height)``.
+        scale_factor: Multiplicative factor applied to polygon coordinates.
+        target_groups: Mapping from annotation class name to integer mask value.
+
+    Examples:
+        >>> from pathlib import Path
+        >>> from ratiopath.parsers import JSONMultiClassMask
+        >>> parser = JSONMultiClassMask(
+        ...     json_path=Path("sample.json"),
+        ...     mask_size=(1024, 1024),
+        ...     scale_factor=1.0,
+        ...     target_groups={"tumor": 1, "stroma": 2},
+        ... )
+        >>> mask = parser()
+    """
@@
     def __call__(self) -> np.ndarray:
+        """Render the configured JSON annotations into a NumPy mask.
+
+        Returns:
+            A 2D NumPy array containing class IDs per pixel.
+
+        Raises:
+            FileNotFoundError: If ``json_path`` does not exist.
+            json.JSONDecodeError: If the JSON file is malformed.
+            ValueError: If annotation payload contains invalid polygon data.
+        """
         mask_img = Image.new("L", self.mask_size, 0)

As per coding guidelines, **/*.py: "Add documentation for any new public functions, classes, or modules you create", "For API documentation, ensure docstrings are clear enough for mkdocstrings output", and "Use Google-style docstrings with clear Args:, Returns:, Yields:, Raises:, and Examples: sections when relevant".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/parsers/json_multi_class_mask.py` around lines 8 - 26, Add a full
Google-style API docstring for the public class JSONMultiClassMask (and its
__call__ behavior) that replaces the terse module docstring: describe the class
purpose, include an Args: section documenting json_path: Path, mask_size:
tuple[int, int] (width, height), scale_factor: float, target_groups: dict[str,
int]; a Returns: section describing the numpy.ndarray mask shape and dtype; a
Raises: section listing possible exceptions (e.g., FileNotFoundError,
ValueError) and when they are raised; and an Examples: snippet showing how to
instantiate JSONMultiClassMask and call it to obtain the mask. Ensure the
docstring is attached to the class definition (so mkdocstrings picks it up) and
mentions __call__ semantics rather than duplicating logic in __init__.

self.target_groups = target_groups

def __call__(self) -> np.ndarray:
mask_img = Image.new("L", self.mask_size, 0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For Pillow 11.1.0, what is the valid fill value range for Image mode "L" in ImageDraw.polygon, and what happens if fill is outside 0-255?

💡 Result:

For Pillow 11.1.0, the valid fill value range for Image mode "L" in ImageDraw.polygon is integers 0-255. "L" mode is 8-bit pixels, grayscale, where each pixel uses the full range 0-255.

Citations:


🏁 Script executed:

cat -n ratiopath/parsers/json_multi_class_mask.py | head -60

Repository: RationAI/ratiopath

Length of output: 1662


🌐 Web query:

PIL Pillow ImageDraw.polygon fill value out of range 8-bit mode L what happens truncation modulo

💡 Result:

In Pillow's ImageDraw.polygon for an 8-bit "L" mode image, if the fill value exceeds the valid range (0-255), it undergoes truncation to the lower 8 bits. This is equivalent to a modulo 256 operation for positive integers, as the C implementation casts the integer ink value to UINT8, discarding higher bits.

Citations:


Validate class IDs against 8-bit mask limits to prevent class collision.

Line 27 creates an "L" mode mask (8-bit grayscale, values 0–255), but Line 41 can pass arbitrary integer class values from target_groups without validation. Values outside 0–255 are truncated to the lower 8 bits (equivalent to value % 256), causing distinct class IDs to collapse into the same pixel value in the output mask.

For example, class IDs 256 and 0 would both render as 0; 257 and 1 would both render as 1.

Add validation in the constructor to ensure all target_groups values are in the valid range:

Suggested guard
     def __init__(
         self,
         json_path: Path,
         mask_size: tuple[int, int],
         scale_factor: float,
         target_groups: dict[str, int],
     ):
+        invalid = {k: v for k, v in target_groups.items() if not (0 <= v <= 255)}
+        if invalid:
+            raise ValueError(
+                f"target_groups values must be in [0, 255] for mode 'L'. Got: {invalid}"
+            )
         self.json_path = json_path
         self.mask_size = mask_size  # (width, height)
         self.scale = scale_factor
         self.target_groups = target_groups
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/parsers/json_multi_class_mask.py` at line 27, The mask is created
in "L" mode (8-bit) but target_groups can contain integers outside 0–255,
causing collisions; in the class constructor (the __init__ that accepts
target_groups) validate every value in target_groups is an int within 0..255 and
raise a clear ValueError listing any invalid IDs, or alternatively
normalize/check types before use; update the constructor that initializes
target_groups (referencing target_groups and mask_img creation) to perform this
guard so values written later (used in mask pixel assignment) cannot overflow
the 8-bit mask.

Comment on lines +42 to +43
scaled_coords = [(c[0] * self.scale, c[1] * self.scale) for c in coords]
draw.polygon(scaled_coords, fill=fill_value)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Harden coordinate parsing to avoid crashing on malformed items.

Line 42 assumes every coordinate is an indexable 2-item numeric pair. A single malformed polygon point can raise TypeError/IndexError and abort the entire parse.

✅ Suggested robust parsing
-                scaled_coords = [(c[0] * self.scale, c[1] * self.scale) for c in coords]
+                try:
+                    scaled_coords = [
+                        (float(c[0]) * self.scale, float(c[1]) * self.scale)
+                        for c in coords
+                        if isinstance(c, (list, tuple)) and len(c) >= 2
+                    ]
+                except (TypeError, ValueError) as exc:
+                    raise ValueError(
+                        f"Invalid coordinates for class '{class_name}' in {self.json_path}"
+                    ) from exc
+
+                if len(scaled_coords) < 3:
+                    continue
                 draw.polygon(scaled_coords, fill=fill_value)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
scaled_coords = [(c[0] * self.scale, c[1] * self.scale) for c in coords]
draw.polygon(scaled_coords, fill=fill_value)
try:
scaled_coords = [
(float(c[0]) * self.scale, float(c[1]) * self.scale)
for c in coords
if isinstance(c, (list, tuple)) and len(c) >= 2
]
except (TypeError, ValueError) as exc:
raise ValueError(
f"Invalid coordinates for class '{class_name}' in {self.json_path}"
) from exc
if len(scaled_coords) < 3:
continue
draw.polygon(scaled_coords, fill=fill_value)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ratiopath/parsers/json_multi_class_mask.py` around lines 42 - 43, The
comprehension building scaled_coords in json_multi_class_mask.py assumes every
item in coords is a 2-element numeric sequence and will crash on malformed
entries; update the parsing in the block that creates scaled_coords (and before
calling draw.polygon) to validate and sanitize each point: for each c in coords
ensure it is indexable with at least two elements and that c[0] and c[1] can be
converted to numbers (e.g., float), skip or log any invalid points, and only
include validated (x* self.scale, y* self.scale) pairs in scaled_coords so
draw.polygon receives a clean list and the parser doesn't raise
TypeError/IndexError on bad input.

@Adames4 Adames4 closed this Mar 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants