feat(data)!: unify data.source and data.labels.source resolvers#128
Open
stanlrt wants to merge 5 commits into123-create-the-errorwarning-tracking-logicfrom
Open
feat(data)!: unify data.source and data.labels.source resolvers#128stanlrt wants to merge 5 commits into123-create-the-errorwarning-tracking-logicfrom
stanlrt wants to merge 5 commits into123-create-the-errorwarning-tracking-logicfrom
Conversation
4 tasks
There was a problem hiding this comment.
Pull request overview
Warning
The PR changes the default config (configs/config.yaml) from data: isic2018 to data: imagenet_samples, which changes out-of-the-box behavior. If this is intended as a breaking change, the PR title should include ! (e.g. feat(data)!: or feat(data/config)!:) or the description should explicitly call out BREAKING CHANGE.
This PR unifies resolution for data.source and data.labels.source so both accept URLs, local paths, and named demo samples, and adds sample-bundled ground-truth labels for imagenet_samples via a cached labels.csv.
Changes:
- Add a per-sample labels registry and materialize
labels.csvinto the demo-sample cache for samples that have trustworthy ground-truth labels. - Extend
get_source_path(..., kind="data"|"labels")to resolve named demo samples for both data and labels (with an error for samples that don’t ship labels). - Update default configs/docs and add tests for the new resolver branches.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
src/raitap/data/samples.py |
Adds SAMPLE_LABELS and writes labels.csv into the sample cache; exposes resolve_sample_labels_path(). |
src/raitap/data/data.py |
Routes data.labels.source through get_source_path(kind="labels") and extends get_source_path to support sample names for labels. |
src/raitap/data/tests/test_data.py |
Adds tests covering sample-name resolution for data/labels and the no-labels error path. |
src/raitap/configs/data/imagenet_samples.yaml |
Configures a default labels block pointing to the sample-bundled CSV. |
src/raitap/configs/config.yaml |
Switches the default dataset from isic2018 to imagenet_samples. |
docs/modules/data/configuration.md |
Documents that both source and labels.source accept URLs/paths/sample names and explains caching behavior. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #127.
Summary
data.sourceanddata.labels.sourcepreviously used different resolvers. Sample names likeisic2018worked fordata.source(via_resolve_sample) but failed fordata.labels.source(which only handled URL/local-path throughget_source_path). This PR unifies the resolution path so both fields accept URL, local path, or sample name.Changes
samples.py: introduceSAMPLE_LABELSregistry;_resolve_samplenow writes alabels.csvinto the cache dir for samples that ship ground-truth labels (imagenet_samplesonly, with real ImageNet class indices). Newresolve_sample_labels_path()helper.data.py:get_source_path(source, kind="data" | "labels")falls through to sample lookup when the source matches aSAMPLE_SOURCESkey.kind="data"returns the cache directory;kind="labels"returns the bundledlabels.csv(raises if the sample has no labels)._load_labelscalls it withkind="labels".configs/data/imagenet_samples.yaml: ship defaultlabelsblock pointing at the sample-bundled CSV.configs/config.yaml: switch defaultdatafromisic2018toimagenet_samples. Defaultraitaprun no longer fires the "No ground-truth labels" warnings on metrics + robustness because the model (ViT-B/32, ImageNet-pretrained) now has matching ground-truth labels.docs/modules/data/configuration.md: clarify both fields accept sample names.Notes
data.sourceanddata.labels.sourceremain separate fields with distinct meanings —sourceresolves to input data,labels.sourceto a labels file. The unification is at the resolver layer only.123-create-the-errorwarning-tracking-logicrather thanmain.