Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@ After installation, the console scripts (`rate-extract`, `rate-evaluate`) are in
python -m rate_eval.cli.evaluate [OPTIONS]
```

## Tutorial

For a step-by-step walkthrough of feature extraction and evaluation (including how to add a custom dataset), see [TUTORIAL.md](TUTORIAL.md).

## Evaluate Pillar0 on Merlin Abdominal CT Dataset

```bash
Expand Down
277 changes: 277 additions & 0 deletions TUTORIAL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,277 @@
# RATE-Evals Tutorial

A step-by-step walkthrough of feature extraction and evaluation using the
RATE-Evals pipeline. By the end you will know how to:

1. Extract vision embeddings from a model using the CLI
2. Use the Python API to extract features directly
3. Run disease-finding evaluation on those embeddings
4. Add a custom dataset of your own

## Prerequisites

- Python 3.8+
- GPU recommended (CPU works but is slower)
- Install the project:

```bash
uv sync
```

> **Note**: You do **not** need the `rad-vision-engine` (`rve`) package for
> this tutorial. The `DummyDataset` used below is self-contained.
> If you hit an `rve` import error elsewhere, see
> [Troubleshooting](#troubleshooting).

## Setup

Generate the synthetic image and labels used throughout the tutorial:

```bash
python tutorial/setup_tutorial.py
```

This creates two files:

| File | Purpose |
|------|---------|
| `assets/CXR145_IM-0290-1001.png` | 1024x1024 grayscale synthetic chest X-ray (required by `DummyDataset`) |
| `tutorial/dummy_labels.json` | Labels for 100 dummy studies with 3 binary findings |

---

## Part 1: Extract Features (CLI)

Use `rate-extract` to compute embeddings for the dummy dataset:

```bash
# Extract training split
uv run rate-extract \
--model pillar0 \
--dataset dummy \
--split train \
--batch-size 4 \
--output-dir cache/pillar0_dummy \
--max-samples 100
```

**Flags explained:**

| Flag | Meaning |
|------|---------|
| `--model pillar0` | Use the Pillar-0 vision foundation model |
| `--dataset dummy` | Use the built-in `DummyDataset` (reads the synthetic image) |
| `--split train` | Process the training split |
| `--batch-size 4` | Images per GPU batch (lower this if you hit OOM) |
| `--output-dir cache/pillar0_dummy` | Where to save embedding `.npz` files |
| `--max-samples 100` | Limit to 100 samples (matches our labels) |

Now extract the test split too:

```bash
uv run rate-extract \
--model pillar0 \
--dataset dummy \
--split test \
--batch-size 4 \
--output-dir cache/pillar0_dummy \
--max-samples 100
```

**Verify the output:**

```bash
ls cache/pillar0_dummy/train/
ls cache/pillar0_dummy/test/
# You should see .npz files containing the cached embeddings.
```

---

## Part 2: Extract Features (Python API)

For programmatic access you can call the model directly:

```python
from rate_eval import create_model, create_dataset, setup_pipeline

# Load configuration
config = setup_pipeline()

# Create model and dataset
model = create_model("pillar0", config)
dataset = create_dataset("dummy", config, split="train")

print(f"Dataset size : {len(dataset)}")
print(f"Accession [0]: {dataset.get_accession(0)}")

# Get a single sample (two-view chest X-ray tensor)
sample = dataset[0]
print(f"Sample shape : {sample.shape}")
# -> torch.Size([1, 2, 1024, 1024]) (C, views, H, W)

# Extract features (must pass modality matching the dataset)
features = model.extract_features(sample.unsqueeze(0), modality="chest_xray_two_view")
print(f"Embedding dim: {features.shape}")
```

This is useful when you want to integrate RATE-Evals into a larger pipeline or
inspect intermediate representations. It also shows how the model converts an
image into a fixed-size embedding vector.

---

## Part 3: Run Evaluation

Once you have cached embeddings for both `train` and `test` splits, run
evaluation:

```bash
uv run rate-evaluate \
--checkpoint-dir cache/pillar0_dummy \
--dataset-name dummy \
--labels-json tutorial/dummy_labels.json \
--output-dir results/pillar0_dummy \
evaluation.use_wandb=false
```

> `evaluation.use_wandb=false` is a Hydra-style override (no `--` prefix).
> It disables Weights & Biases logging so you can run offline.

**What gets produced:**

| File | Contents |
|------|----------|
| `results/pillar0_dummy/detailed_results.csv` | Per-question metrics (AUC, F1, accuracy, etc.) |
| `results/pillar0_dummy/summary_stats.json` | Aggregated metrics across all findings |
| `results/pillar0_dummy/training_stats.json` | Class distribution statistics from training |
| `results/pillar0_dummy/exam_probabilities.csv` | Per-exam predicted probabilities |

**Inspect the results:**

```python
import json, pandas as pd

summary = json.load(open("results/pillar0_dummy/summary_stats.json"))
print(f"Average AUC: {summary['avg_auc']:.3f}")

detailed = pd.read_csv("results/pillar0_dummy/detailed_results.csv")
print(detailed[["question", "auc", "f1", "num_positive", "num_negative"]])
```

> **Note**: Because the dummy dataset repeats the same synthetic image, the
> embeddings carry no discriminative signal and metrics will be near-random.
> With real data, expect meaningful AUC scores.

---

## Part 4: Add a Custom Dataset

To evaluate on your own data, follow these four steps:

### Step A: Create a Dataset Class

Your class must implement the following interface:

```python
class MyDataset:
def __init__(self, config, split="train", transforms=None, model_preprocess=None):
...

def __getitem__(self, idx):
"""Return a tensor of shape (C, views, H, W) or (D, H, W) for 3-D volumes."""
...

def __len__(self):
"""Total number of samples."""
...

def get_accession(self, idx):
"""Return a unique string identifier (accession) for sample idx."""
...

def get_all_accessions(self):
"""Return a list of all accession strings (fast, no data loading)."""
...
```

Place the file in `rate_eval/datasets/` and import it in
`rate_eval/datasets/__init__.py`.

### Step B: Create a Config YAML

Create `configs/dataset/my_dataset.yaml`:

```yaml
# @package dataset
name: my_dataset
modality: chest_xray_two_view # or abdomen_ct, chest_ct, brain_ct, etc.

data:
root_dir: /path/to/images

processing:
target_size: [1024, 1024]
image_mode: L # L=grayscale, RGB=colour

labels:
labels_json: /path/to/labels.json
```

### Step C: Register in Config

Add an entry to `configs/config.yaml` under the `datasets:` section:

```yaml
datasets:
# ... existing entries ...
my_dataset:
class: MyDataset
config: configs/dataset/my_dataset.yaml
```

### Step D: Prepare Labels JSON

Create a JSON file where each key is an accession string and the value contains
`qa_results` with lists of question-answer dicts:

```json
{
"ACCESSION_001": {
"qa_results": {
"findings": [
{"Is there evidence of cardiomegaly?": "yes"},
{"Is there a pleural effusion?": "no"}
]
}
}
}
```

Answers should be `"yes"` or `"no"` (case-insensitive). The evaluator
discovers all unique questions automatically.

### Run It

```bash
uv run rate-extract --model pillar0 --dataset my_dataset --all-splits \
--batch-size 4 --output-dir cache/pillar0_my_dataset

uv run rate-evaluate --checkpoint-dir cache/pillar0_my_dataset \
--dataset-name my_dataset \
--labels-json /path/to/labels.json \
--output-dir results/pillar0_my_dataset \
evaluation.use_wandb=false
```

---

## Troubleshooting

| Problem | Solution |
|---------|----------|
| **`ModuleNotFoundError: rve`** | The `rve` (rad-vision-engine) package is only needed for RVE-format datasets. Use `DummyDataset` or NIfTI-based datasets instead, or install it: `git clone https://github.com/yalalab/rad-vision-engine ../rad-vision-engine && uv pip install -e ../rad-vision-engine` |
| **Out of memory (OOM)** | Reduce `--batch-size` (try 1 or 2) or use `--device cpu` |
| **NaN embeddings** | Add `--check-nan` to `rate-extract` for detailed diagnostics |
| **"Command not found"** | Add `~/.local/bin` to your `PATH` or use `uv run rate-extract` / `uv run rate-evaluate` |
| **HuggingFace auth errors** | Run `huggingface-cli login` for gated models (MedGemma, MedImageInsight) |
93 changes: 93 additions & 0 deletions tutorial/setup_tutorial.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
"""Generate synthetic data for the RATE-Evals tutorial.

Creates:
- assets/CXR145_IM-0290-1001.png Synthetic 1024x1024 grayscale chest X-ray
- tutorial/dummy_labels.json Labels for 100 dummy studies (3 binary findings)

Dependencies: Pillow (already a project dependency) + stdlib.

Usage:
python tutorial/setup_tutorial.py
"""

import json
import os
import random

from PIL import Image, ImageDraw, ImageFilter

SEED = 42
NUM_SAMPLES = 100
POSITIVE_RATE = 0.20 # ~20 % positive rate per finding

QUESTIONS = [
"Is there evidence of cardiomegaly?",
"Is there a pleural effusion?",
"Is there lung consolidation?",
]

ASSETS_DIR = os.path.join(os.path.dirname(__file__), "..", "assets")
LABELS_PATH = os.path.join(os.path.dirname(__file__), "dummy_labels.json")


def _create_synthetic_cxr(path: str, size: int = 1024) -> None:
"""Create a synthetic grayscale image that loosely resembles a chest X-ray."""
random.seed(SEED)
img = Image.new("L", (size, size), color=30)
draw = ImageDraw.Draw(img)

# Dark oval for the thoracic cavity
draw.ellipse(
[size // 6, size // 8, size - size // 6, size - size // 10],
fill=50,
)

# Lighter region in the centre (mediastinum)
cx, cy = size // 2, size // 2
draw.ellipse(
[cx - size // 8, cy - size // 4, cx + size // 8, cy + size // 4],
fill=80,
)

# Random speckle to give texture
for _ in range(2000):
x = random.randint(0, size - 1)
y = random.randint(0, size - 1)
v = random.randint(20, 90)
draw.point((x, y), fill=v)

img = img.filter(ImageFilter.GaussianBlur(radius=3))
os.makedirs(os.path.dirname(path), exist_ok=True)
img.save(path)
print(f"Created synthetic image: {path}")


def _create_dummy_labels(path: str) -> None:
"""Create labels JSON in qa_results format for dummy_study_0000..0099."""
random.seed(SEED)
labels = {}

for idx in range(NUM_SAMPLES):
accession = f"dummy_study_{idx:04d}"
findings = []
for q in QUESTIONS:
answer = "yes" if random.random() < POSITIVE_RATE else "no"
findings.append({q: answer})

labels[accession] = {"qa_results": {"findings": findings}}

os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "w") as f:
json.dump(labels, f, indent=2)
print(f"Created labels for {NUM_SAMPLES} studies: {path}")


def main() -> None:
image_path = os.path.join(ASSETS_DIR, "CXR145_IM-0290-1001.png")
_create_synthetic_cxr(image_path)
_create_dummy_labels(LABELS_PATH)
print("\nSetup complete! You can now follow TUTORIAL.md.")


if __name__ == "__main__":
main()