YalaLab · prachitbhike · Feb 6, 2026
diff --git a/README.md b/README.md
@@ -38,6 +38,10 @@ After installation, the console scripts (`rate-extract`, `rate-evaluate`) are in
    python -m rate_eval.cli.evaluate [OPTIONS]
    ```
 
+## Tutorial
+
+For a step-by-step walkthrough of feature extraction and evaluation (including how to add a custom dataset), see [TUTORIAL.md](TUTORIAL.md).
+
 ## Evaluate Pillar0 on Merlin Abdominal CT Dataset
 
 ```bash

diff --git a/TUTORIAL.md b/TUTORIAL.md
@@ -0,0 +1,277 @@
+# RATE-Evals Tutorial
+
+A step-by-step walkthrough of feature extraction and evaluation using the
+RATE-Evals pipeline. By the end you will know how to:
+
+1. Extract vision embeddings from a model using the CLI
+2. Use the Python API to extract features directly
+3. Run disease-finding evaluation on those embeddings
+4. Add a custom dataset of your own
+
+## Prerequisites
+
+- Python 3.8+
+- GPU recommended (CPU works but is slower)
+- Install the project:
+
+```bash
+uv sync
+```
+
+> **Note**: You do **not** need the `rad-vision-engine` (`rve`) package for
+> this tutorial. The `DummyDataset` used below is self-contained.
+> If you hit an `rve` import error elsewhere, see
+> [Troubleshooting](#troubleshooting).
+
+## Setup
+
+Generate the synthetic image and labels used throughout the tutorial:
+
+```bash
+python tutorial/setup_tutorial.py
+```
+
+This creates two files:
+
+| File | Purpose |
+|------|---------|
+| `assets/CXR145_IM-0290-1001.png` | 1024x1024 grayscale synthetic chest X-ray (required by `DummyDataset`) |
+| `tutorial/dummy_labels.json` | Labels for 100 dummy studies with 3 binary findings |
+
+---
+
+## Part 1: Extract Features (CLI)
+
+Use `rate-extract` to compute embeddings for the dummy dataset:
+
+```bash
+# Extract training split
+uv run rate-extract \
+    --model pillar0 \
+    --dataset dummy \
+    --split train \
+    --batch-size 4 \
+    --output-dir cache/pillar0_dummy \
+    --max-samples 100
+```
+
+**Flags explained:**
+
+| Flag | Meaning |
+|------|---------|
+| `--model pillar0` | Use the Pillar-0 vision foundation model |
+| `--dataset dummy` | Use the built-in `DummyDataset` (reads the synthetic image) |
+| `--split train` | Process the training split |
+| `--batch-size 4` | Images per GPU batch (lower this if you hit OOM) |
+| `--output-dir cache/pillar0_dummy` | Where to save embedding `.npz` files |
+| `--max-samples 100` | Limit to 100 samples (matches our labels) |
+
+Now extract the test split too:
+
+```bash
+uv run rate-extract \
+    --model pillar0 \
+    --dataset dummy \
+    --split test \
+    --batch-size 4 \
+    --output-dir cache/pillar0_dummy \
+    --max-samples 100
+```
+
+**Verify the output:**
+
+```bash
+ls cache/pillar0_dummy/train/
+ls cache/pillar0_dummy/test/
+# You should see .npz files containing the cached embeddings.
+```
+
+---
+
+## Part 2: Extract Features (Python API)
+
+For programmatic access you can call the model directly:
+
+```python
+from rate_eval import create_model, create_dataset, setup_pipeline
+
+# Load configuration
+config = setup_pipeline()
+
+# Create model and dataset
+model = create_model("pillar0", config)
+dataset = create_dataset("dummy", config, split="train")
+
+print(f"Dataset size : {len(dataset)}")
+print(f"Accession [0]: {dataset.get_accession(0)}")
+
+# Get a single sample (two-view chest X-ray tensor)
+sample = dataset[0]
+print(f"Sample shape : {sample.shape}")
+# -> torch.Size([1, 2, 1024, 1024])  (C, views, H, W)
+
+# Extract features (must pass modality matching the dataset)
+features = model.extract_features(sample.unsqueeze(0), modality="chest_xray_two_view")
+print(f"Embedding dim: {features.shape}")
+```
+
+This is useful when you want to integrate RATE-Evals into a larger pipeline or
+inspect intermediate representations. It also shows how the model converts an
+image into a fixed-size embedding vector.
+
+---
+
+## Part 3: Run Evaluation
+
+Once you have cached embeddings for both `train` and `test` splits, run
+evaluation:
+
+```bash
+uv run rate-evaluate \
+    --checkpoint-dir cache/pillar0_dummy \
+    --dataset-name dummy \
+    --labels-json tutorial/dummy_labels.json \
+    --output-dir results/pillar0_dummy \
+    evaluation.use_wandb=false
+```
+
+> `evaluation.use_wandb=false` is a Hydra-style override (no `--` prefix).
+> It disables Weights & Biases logging so you can run offline.
+
+**What gets produced:**
+
+| File | Contents |
+|------|----------|
+| `results/pillar0_dummy/detailed_results.csv` | Per-question metrics (AUC, F1, accuracy, etc.) |
+| `results/pillar0_dummy/summary_stats.json` | Aggregated metrics across all findings |
+| `results/pillar0_dummy/training_stats.json` | Class distribution statistics from training |
+| `results/pillar0_dummy/exam_probabilities.csv` | Per-exam predicted probabilities |
+
+**Inspect the results:**
+
+```python
+import json, pandas as pd
+
+summary = json.load(open("results/pillar0_dummy/summary_stats.json"))
+print(f"Average AUC: {summary['avg_auc']:.3f}")
+
+detailed = pd.read_csv("results/pillar0_dummy/detailed_results.csv")
+print(detailed[["question", "auc", "f1", "num_positive", "num_negative"]])
+```
+
+> **Note**: Because the dummy dataset repeats the same synthetic image, the
+> embeddings carry no discriminative signal and metrics will be near-random.
+> With real data, expect meaningful AUC scores.
+
+---
+
+## Part 4: Add a Custom Dataset
+
+To evaluate on your own data, follow these four steps:
+
+### Step A: Create a Dataset Class
+
+Your class must implement the following interface:
+
+```python
+class MyDataset:
+    def __init__(self, config, split="train", transforms=None, model_preprocess=None):
+        ...
+
+    def __getitem__(self, idx):
+        """Return a tensor of shape (C, views, H, W) or (D, H, W) for 3-D volumes."""
+        ...
+
+    def __len__(self):
+        """Total number of samples."""
+        ...
+
+    def get_accession(self, idx):
+        """Return a unique string identifier (accession) for sample idx."""
+        ...
+
+    def get_all_accessions(self):
+        """Return a list of all accession strings (fast, no data loading)."""
+        ...
+```
+
+Place the file in `rate_eval/datasets/` and import it in
+`rate_eval/datasets/__init__.py`.
+
+### Step B: Create a Config YAML
+
+Create `configs/dataset/my_dataset.yaml`:
+
+```yaml
+# @package dataset
+name: my_dataset
+modality: chest_xray_two_view   # or abdomen_ct, chest_ct, brain_ct, etc.
+
+data:
+  root_dir: /path/to/images
+
+processing:
+  target_size: [1024, 1024]
+  image_mode: L                 # L=grayscale, RGB=colour
+
+labels:
+  labels_json: /path/to/labels.json
+```
+
+### Step C: Register in Config
+
+Add an entry to `configs/config.yaml` under the `datasets:` section:
+
+```yaml
+datasets:
+  # ... existing entries ...
+  my_dataset:
+    class: MyDataset
+    config: configs/dataset/my_dataset.yaml
+```
+
+### Step D: Prepare Labels JSON
+
+Create a JSON file where each key is an accession string and the value contains
+`qa_results` with lists of question-answer dicts:
+
+```json
+{
+  "ACCESSION_001": {
+    "qa_results": {
+      "findings": [
+        {"Is there evidence of cardiomegaly?": "yes"},
+        {"Is there a pleural effusion?": "no"}
+      ]
+    }
+  }
+}
+```
+
+Answers should be `"yes"` or `"no"` (case-insensitive). The evaluator
+discovers all unique questions automatically.
+
+### Run It
+
+```bash
+uv run rate-extract --model pillar0 --dataset my_dataset --all-splits \
+    --batch-size 4 --output-dir cache/pillar0_my_dataset
+
+uv run rate-evaluate --checkpoint-dir cache/pillar0_my_dataset \
+    --dataset-name my_dataset \
+    --labels-json /path/to/labels.json \
+    --output-dir results/pillar0_my_dataset \
+    evaluation.use_wandb=false
+```
+
+---
+
+## Troubleshooting
+
+| Problem | Solution |
+|---------|----------|
+| **`ModuleNotFoundError: rve`** | The `rve` (rad-vision-engine) package is only needed for RVE-format datasets. Use `DummyDataset` or NIfTI-based datasets instead, or install it: `git clone https://github.com/yalalab/rad-vision-engine ../rad-vision-engine && uv pip install -e ../rad-vision-engine` |
+| **Out of memory (OOM)** | Reduce `--batch-size` (try 1 or 2) or use `--device cpu` |
+| **NaN embeddings** | Add `--check-nan` to `rate-extract` for detailed diagnostics |
+| **"Command not found"** | Add `~/.local/bin` to your `PATH` or use `uv run rate-extract` / `uv run rate-evaluate` |
+| **HuggingFace auth errors** | Run `huggingface-cli login` for gated models (MedGemma, MedImageInsight) |
diff --git a/tutorial/setup_tutorial.py b/tutorial/setup_tutorial.py
@@ -0,0 +1,93 @@
+"""Generate synthetic data for the RATE-Evals tutorial.
+
+Creates:
+  - assets/CXR145_IM-0290-1001.png   Synthetic 1024x1024 grayscale chest X-ray
+  - tutorial/dummy_labels.json        Labels for 100 dummy studies (3 binary findings)
+
+Dependencies: Pillow (already a project dependency) + stdlib.
+
+Usage:
+    python tutorial/setup_tutorial.py
+"""
+
+import json
+import os
+import random
+
+from PIL import Image, ImageDraw, ImageFilter
+
+SEED = 42
+NUM_SAMPLES = 100
+POSITIVE_RATE = 0.20  # ~20 % positive rate per finding
+
+QUESTIONS = [
+    "Is there evidence of cardiomegaly?",
+    "Is there a pleural effusion?",
+    "Is there lung consolidation?",
+]
+
+ASSETS_DIR = os.path.join(os.path.dirname(__file__), "..", "assets")
+LABELS_PATH = os.path.join(os.path.dirname(__file__), "dummy_labels.json")
+
+
+def _create_synthetic_cxr(path: str, size: int = 1024) -> None:
+    """Create a synthetic grayscale image that loosely resembles a chest X-ray."""
+    random.seed(SEED)
+    img = Image.new("L", (size, size), color=30)
+    draw = ImageDraw.Draw(img)
+
+    # Dark oval for the thoracic cavity
+    draw.ellipse(
+        [size // 6, size // 8, size - size // 6, size - size // 10],
+        fill=50,
+    )
+
+    # Lighter region in the centre (mediastinum)
+    cx, cy = size // 2, size // 2
+    draw.ellipse(
+        [cx - size // 8, cy - size // 4, cx + size // 8, cy + size // 4],
+        fill=80,
+    )
+
+    # Random speckle to give texture
+    for _ in range(2000):
+        x = random.randint(0, size - 1)
+        y = random.randint(0, size - 1)
+        v = random.randint(20, 90)
+        draw.point((x, y), fill=v)
+
+    img = img.filter(ImageFilter.GaussianBlur(radius=3))
+    os.makedirs(os.path.dirname(path), exist_ok=True)
+    img.save(path)
+    print(f"Created synthetic image: {path}")
+
+
+def _create_dummy_labels(path: str) -> None:
+    """Create labels JSON in qa_results format for dummy_study_0000..0099."""
+    random.seed(SEED)
+    labels = {}
+
+    for idx in range(NUM_SAMPLES):
+        accession = f"dummy_study_{idx:04d}"
+        findings = []
+        for q in QUESTIONS:
+            answer = "yes" if random.random() < POSITIVE_RATE else "no"
+            findings.append({q: answer})
+
+        labels[accession] = {"qa_results": {"findings": findings}}
+
+    os.makedirs(os.path.dirname(path), exist_ok=True)
+    with open(path, "w") as f:
+        json.dump(labels, f, indent=2)
+    print(f"Created labels for {NUM_SAMPLES} studies: {path}")
+
+
+def main() -> None:
+    image_path = os.path.join(ASSETS_DIR, "CXR145_IM-0290-1001.png")
+    _create_synthetic_cxr(image_path)
+    _create_dummy_labels(LABELS_PATH)
+    print("\nSetup complete! You can now follow TUTORIAL.md.")
+
+
+if __name__ == "__main__":
+    main()