Skip to content

feat(data)!: unify data.source and data.labels.source resolvers#128

Open
stanlrt wants to merge 5 commits into123-create-the-errorwarning-tracking-logicfrom
127-unify-data-source-resolvers
Open

feat(data)!: unify data.source and data.labels.source resolvers#128
stanlrt wants to merge 5 commits into123-create-the-errorwarning-tracking-logicfrom
127-unify-data-source-resolvers

Conversation

@stanlrt
Copy link
Copy Markdown
Collaborator

@stanlrt stanlrt commented May 8, 2026

Closes #127.

Summary

data.source and data.labels.source previously used different resolvers. Sample names like isic2018 worked for data.source (via _resolve_sample) but failed for data.labels.source (which only handled URL/local-path through get_source_path). This PR unifies the resolution path so both fields accept URL, local path, or sample name.

Changes

  • samples.py: introduce SAMPLE_LABELS registry; _resolve_sample now writes a labels.csv into the cache dir for samples that ship ground-truth labels (imagenet_samples only, with real ImageNet class indices). New resolve_sample_labels_path() helper.
  • data.py: get_source_path(source, kind="data" | "labels") falls through to sample lookup when the source matches a SAMPLE_SOURCES key. kind="data" returns the cache directory; kind="labels" returns the bundled labels.csv (raises if the sample has no labels). _load_labels calls it with kind="labels".
  • configs/data/imagenet_samples.yaml: ship default labels block pointing at the sample-bundled CSV.
  • configs/config.yaml: switch default data from isic2018 to imagenet_samples. Default raitap run no longer fires the "No ground-truth labels" warnings on metrics + robustness because the model (ViT-B/32, ImageNet-pretrained) now has matching ground-truth labels.
  • docs/modules/data/configuration.md: clarify both fields accept sample names.
  • Tests: cover the three resolver branches and the no-labels-for-this-sample error.

Notes

  • Schema unchanged. data.source and data.labels.source remain separate fields with distinct meanings — source resolves to input data, labels.source to a labels file. The unification is at the resolver layer only.
  • Built on top of Create the error/warning tracking logic #123 follow-up work; PR targets 123-create-the-errorwarning-tracking-logic rather than main.

Copilot AI review requested due to automatic review settings May 8, 2026 23:27
@stanlrt stanlrt self-assigned this May 8, 2026
@stanlrt stanlrt linked an issue May 8, 2026 that may be closed by this pull request
4 tasks
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Warning

The PR changes the default config (configs/config.yaml) from data: isic2018 to data: imagenet_samples, which changes out-of-the-box behavior. If this is intended as a breaking change, the PR title should include ! (e.g. feat(data)!: or feat(data/config)!:) or the description should explicitly call out BREAKING CHANGE.

This PR unifies resolution for data.source and data.labels.source so both accept URLs, local paths, and named demo samples, and adds sample-bundled ground-truth labels for imagenet_samples via a cached labels.csv.

Changes:

  • Add a per-sample labels registry and materialize labels.csv into the demo-sample cache for samples that have trustworthy ground-truth labels.
  • Extend get_source_path(..., kind="data"|"labels") to resolve named demo samples for both data and labels (with an error for samples that don’t ship labels).
  • Update default configs/docs and add tests for the new resolver branches.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
src/raitap/data/samples.py Adds SAMPLE_LABELS and writes labels.csv into the sample cache; exposes resolve_sample_labels_path().
src/raitap/data/data.py Routes data.labels.source through get_source_path(kind="labels") and extends get_source_path to support sample names for labels.
src/raitap/data/tests/test_data.py Adds tests covering sample-name resolution for data/labels and the no-labels error path.
src/raitap/configs/data/imagenet_samples.yaml Configures a default labels block pointing to the sample-bundled CSV.
src/raitap/configs/config.yaml Switches the default dataset from isic2018 to imagenet_samples.
docs/modules/data/configuration.md Documents that both source and labels.source accept URLs/paths/sample names and explains caching behavior.

Comment thread src/raitap/data/samples.py Outdated
Comment thread src/raitap/data/data.py
Comment thread src/raitap/data/data.py
Comment thread docs/modules/data/configuration.md
Comment thread src/raitap/data/tests/test_data.py
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.

Comment thread src/raitap/data/data.py Outdated
Comment thread src/raitap/data/data.py Outdated
Comment thread src/raitap/configs/config.yaml
@stanlrt stanlrt changed the title feat(data): unify data.source and data.labels.source resolvers feat(data)!: unify data.source and data.labels.source resolvers May 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unify data source resolvers (data.source vs data.labels.source)

2 participants