feat: override-learned trust calibration (#88, v1) by harrymunro · Pull Request #122 · Aspegio/nelson

harrymunro · 2026-05-15T21:14:48Z

Summary

Implements issue Confidence-weighted dynamic trust calibration #88's override-learning alternative: per-(task_type, ship_class) trust calibration learned from admiralty decisions (approved / modified / rejected) rather than self-reported agent confidence.
New nelson_data_calibration.py module with incremental update at stand-down, full rebuild from cmd_index, and stderr advisory at plan-approved when the historical sample meets MIN_DECISIONS_FOR_ADVISORY = 3. Falls back from the bucket to the by_task_type rollup when the precise bucket is under-sampled.
New CLI: nelson-data.py admiralty-decision (writes admiralty_action_completed event with decision_type, task_type, ship_class) and nelson-data.py trust-report (text + --json).
Optional --task-type flag on the existing task subcommand; backwards compatible — existing missions keep working with task_type=null and the calibration store stays empty.
v1 is advisory-only: no station_tier mutation, no significance gating. Auto-elevation, FM-synthesized prose, and Fisher's exact gating are explicitly out of scope (issue Confidence-weighted dynamic trust calibration #88 follow-up).

Test plan

pytest skills/nelson/scripts/test_nelson_data_calibration.py -v — 17 new tests covering admiralty-decision CLI, store aggregation/idempotency/edge cases, plan-approved advisory threshold + rollup fallback, and trust-report text/JSON/filter.
pytest skills/nelson/scripts/ — full suite, 370 tests, all green.
End-to-end smoke: 3 missions with --task-type auth_refactor + admiralty-decision --decision-type modified populate the bucket; the 4th mission's plan-approved prints Trust advisory: task 1 (auth_refactor on frigate) — historical override rate 100% (n=3). Consider raising station_tier. on stderr.
index --rebuild reconstructs the calibration store from existing missions.

Replaces the binary admiralty-action-required model with an override-learning trust calibration store. After each mission, admiralty decisions (approved / modified / rejected) are aggregated per (task_type, ship_class) bucket; at plan-approved time, stderr advisories surface tasks whose history meets the sample threshold. Advisory-only — no station_tier mutation, all schema additions optional and backwards compatible. Adds the admiralty-decision and trust-report subcommands, threads an optional --task-type through the task command, and wires the new memory store into stand-down (incremental) and index (rebuild).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: override-learned trust calibration (#88, v1)#122

feat: override-learned trust calibration (#88, v1)#122
harrymunro wants to merge 1 commit into
mainfrom
worktree-tidy-churning-swan

harrymunro commented May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harrymunro commented May 15, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant