Supervised anomaly detection for the ANM2026 leaderboard assignment.
Best result: Leaderboard F1 = 0.5537 — submissions/submission_b_t0.30.json
For assignment context and submission format rules, see assignment_instructions.md.
A single RandomForestClassifier trained globally across all 1000 windows on 33 causal (past-only) features per timestep. The model is trained on each window's full train.npy and predicts against the hidden test.npy.
Model config: n_estimators=200, max_depth=None, min_samples_leaf=3, class_weight='balanced'
Threshold: 0.30 (predict anomaly if P(anomaly) > 0.30)
| Group | Features |
|---|---|
| Global | raw value, global z-score (signed + abs), first difference (signed + abs) |
| Rolling z-score | causal windows 16, 32, 64, 128, 256 |
| Rolling robust z-score | causal windows 16, 32, 64, 128, 256 (median/MAD) |
| Rolling mean + std | causal windows 16, 32, 64, 128, 256 |
| Lag | lag-1 global z-score, lag-1 abs-difference |
| Change-point | mean-shift + robust-shift for k = 16, 64, 128 |
All rolling windows use edge-padding so the first timestep is never zero-padded. Global mean/std are computed from train.npy only and applied to both train and test.
pip install -r requirements.txtRequires Python 3.10+. Extract the dataset to student_dataset/ in the project root.
src/evaluate.py uses the baseline feature set (src/features.py, 19 features). Chronological 80/20 split, no shuffling.
python src/evaluate.pyExpected output: mean val F1 ≈ 0.1967
The best result uses the 33-feature set defined in experiments/next_round.py (Experiment B: base 29 features + w256 rolling stats). To run all experiments A–E and reproduce the best submission (~10–15 min):
python experiments/next_round.pyExperiment B wins the quality gate and automatically writes + validates submissions/submission_b_t0.30.json.
Expected local val F1 for Experiment B: ≈ 0.2147
The best submission is already present in the repo. To confirm it is valid without retraining:
python src/validate_submission.py submissions/submission_b_t0.30.json --dataset_dir student_datasetAll 7 checks should pass (encoding, JSON structure, 1000 keys, array lengths, value ranges).
python src/generate_submission.py --output submissions/my_submission.json# Step 1: standard JSON check
python -m json.tool submissions/<file>.json > /dev/null
# Step 2: full check (encoding, array lengths, value ranges)
python src/validate_submission.py submissions/<file>.json --dataset_dir student_dataset├── src/
│ ├── features.py # 19-feature causal feature computation (baseline)
│ ├── models.py # RF config, cache building
│ ├── evaluate.py # Local 80/20 validation
│ ├── train.py # Full-data training + feature importance
│ ├── generate_submission.py # Baseline submission generator (auto-validates)
│ ├── validate_submission.py # Submission validator (checks encoding, lengths, values)
│ └── utils.py # Shared constants and I/O helpers
│
├── submissions/
│ ├── submission_supervised_v1.json # RF baseline, 19 feats, t=0.30 (LB: 0.44)
│ ├── submission_rf_changepoint_w128.json # + changepoint + w128, t=0.30 (LB: 0.5235)
│ └── submission_b_t0.30.json # + w256, 33 feats, t=0.30 (LB: 0.5537) ← best
│
├── experiments/
│ ├── rf_changepoint_w128.py # Added changepoint + w128 features (+0.0076 val)
│ ├── next_round.py # Experiments A–E: w256, CP scales, slope, HGBC
│ ├── ensemble_rf_hgbc.py # RF+HGBC ensemble (hurt LB, documented)
│ ├── rf_calibration_sweep.py # Final msl/n_estimators/threshold sweep
│ └── *.csv # All experiment results
│
├── submission_simple.json # Class demo: test-only z-score baseline
├── submission_trained.json # Class demo: training-based z-score baseline
├── report.md # Part B report source
├── assignment_instructions.md # Assignment spec and submission format
└── requirements.txt
| Submission | Config | Val F1 | LB Score |
|---|---|---|---|
submission_trained.json |
Global z-score threshold | 0.1618 | 0.30 |
submission_supervised_v1.json |
RF, 19 feats, t=0.30 | 0.1950 | 0.44 |
submission_rf_changepoint_w128.json |
RF, 29 feats, t=0.30 | 0.2076 | 0.5235 |
submission_b_t0.30.json |
RF, 33 feats, t=0.30 | 0.2147 | 0.5537 |
Val→LB scaling ratio: ~2.6×