feat: learn standing orders from mission patterns (#87) by harrymunro · Pull Request #120 · Aspegio/nelson

harrymunro · 2026-05-11T13:39:25Z

Summary

Adds a Darwin Gödel Machine-inspired pipeline (skills/nelson/scripts/nelson_data_patterns.py) that mines avoid patterns from accumulated mission data, scores them with Fisher's exact + log-odds, filters against existing orders and a dismissed archive, and synthesises candidate standing orders for human review (with a heuristic-stub fallback when no FM client is wired).
Wires three new CLI subcommands into nelson-data.py: detect-patterns, promote-candidate, dismiss-candidate. Promotion writes a new .md under references/standing-orders/ with an audit-lineage comment and adds a row to the SKILL.md lookup table; dismissal archives the fingerprint so re-runs cannot resurface it.
Surfaces CANDIDATE STANDING ORDERS (awaiting review): N in the Intelligence Brief (text + JSON) when the queue is non-empty.
35 new tests in test_nelson_data_patterns.py; full Python suite remains green (338 passing).

Closes #87.

Design notes

Ranking is confidence × (1 + novelty) rather than the plan's sigmoid(confidence) × (1 + novelty). Sigmoid maps [0, 1] confidence into [0.5, 0.73] — too compressed to discriminate when novelty's range is [1, 2]. Documented inline in _review_score.
Novelty uses token containment (fraction of a pattern's tokens covered by an existing order) rather than TF-IDF Jaccard — better fit for short avoid-phrases vs. full standing-order documents.
Add-only invariant: candidates may never modify or remove existing orders — mitigates the objective-hacking failure mode in DGM Appendix H (node 114 deleting hallucination-detection tokens to game the metric).
detect-patterns skips writing an empty queue file on first runs to avoid littering the memory dir.

Test plan

pytest skills/nelson/scripts/test_nelson_data_patterns.py — 35/35 pass
Full suite pytest skills/nelson/scripts/ — 338/338 pass
ruff check clean on new code
Integration on real worktree data (1 mission, below min_missions=10) — 0 candidates, no crash, no empty file
Synthetic 12-mission dataset — 1 candidate detected at confidence 0.998
Dismiss → re-detect — pattern not resurfaced
Promote — writes well-formed .md (header / Trigger / Symptoms / Remedy / Related orders / audit lineage) and appends row to SKILL.md table
FM-malformed-response and FM-raises-exception paths both fall back to the heuristic stub
Brief surfaces the candidate count when present and omits the line cleanly when zero

Implements a Darwin Gödel Machine-inspired pipeline that detects candidate standing orders from accumulated mission data and surfaces them for human review. Pipeline (in scripts/nelson_data_patterns.py): - Mine — cluster `avoid` text patterns across missions via Jaccard - Score — Fisher's exact test for outcome correlation + log-odds ratio - Filter — drop low-confidence patterns, patterns covered by existing orders (token containment), and previously dismissed candidates - Synthesize — FM-assisted prose with heuristic-stub fallback if no client is wired or the response is malformed - Persist — append to .nelson/memory/candidate-standing-orders.json CLI: `nelson-data detect-patterns | promote-candidate | dismiss-candidate`. Promotion writes a new `.md` under references/standing-orders/ with an audit-lineage comment listing the evidence mission IDs, and appends a row to the SKILL.md lookup table. Dismissal moves the entry to a dismissed archive so re-runs cannot resurface it. Hard constraint: candidates may only ADD new standing orders, never modify or remove existing ones — mitigates the objective-hacking failure mode documented in DGM Appendix H. The Intelligence Brief now surfaces a `CANDIDATE STANDING ORDERS (awaiting review): N` line when the queue is non-empty. 35 new tests cover statistical primitives, mining, novelty, end-to-end detection, FM-call robustness, ranking, promotion, dismissal, CLI smoke, and brief surfacing. Full suite remains green (338 passing).

Multi-perspective code review of the original commit surfaced one CRITICAL, several HIGH-severity correctness/safety issues, missing regression tests for the headline DGM safety invariant, and undocumented CLI surface. This commit addresses them as a single fix pass — no scope changes. Pipeline correctness - Cluster fingerprint is now stable under input reordering. Replaced the greedy single-pass clusterer with an order-independent union-find pass and computed the fingerprint over the union of all variant tokens. The canonical text is now the shortest variant so reruns pick a deterministic representative. Without this, the dismissed-candidate archive could not guarantee that dismissed patterns stay dismissed. - Polarity guard at the candidacy gate. The filter now requires ``correlation < 0`` so a success-correlated avoid-text is never surfaced as an anti-pattern candidate (its remedy would be the inverse of what the data shows). - ``_mine_event_sequences`` and ``_score_pattern`` skip records missing ``mission_id`` instead of raising ``KeyError`` — one corrupt entry no longer takes down the whole brief surfacing path. Safety / file-system writes - ``promote_candidate`` re-slugifies the candidate title at the promotion boundary so a hand-edited queue entry like ``../../../tmp/pwn`` cannot escape the standing-orders directory. - The SKILL.md table insertion is now anchored to the ``## Standing Orders`` heading and only scans rows within that section, so a similar-shaped row elsewhere in SKILL.md (e.g. Damage Control) can no longer become the insertion point. - Trigger text is flattened before insertion so newlines / pipes cannot break out of the table cell into arbitrary markdown (extra rows, headings, prose) that Claude would later read as skill instructions. - The SKILL.md mutation is now pre-flighted: if the heading or table is missing, promotion fails before touching disk. If the SKILL.md write fails after the standing-order .md has been written, the .md is rolled back so promotion is transactional from the caller's perspective. - Both ``.md`` writes (standing-order body and SKILL.md) now go through an atomic tempfile + os.replace helper so a crash mid-write cannot leave a torn file behind. CLI consistency - ``detect-patterns`` / ``promote-candidate`` / ``dismiss-candidate`` now accept ``--missions-dir`` and derive ``memory_dir`` as ``{missions_dir}/../memory`` — the same rule that ``cmd_brief`` and ``nelson_data_memory`` use. Before, running ``detect-patterns`` from a non-default ``--missions-dir`` could write to a memory dir that ``brief`` would not read. - ``--candidate-id`` is now a named flag on promote and dismiss, matching the flag-only convention used by every other subcommand. - ``detect-patterns`` gains ``--json`` for machine-readable output. - ``cmd_brief`` narrows its bare ``except Exception`` around the candidate count to ``(OSError, ValueError, json.JSONDecodeError)`` so a real bug in the candidates module surfaces in tests instead of being swallowed. Tests - 15 new regression tests covering: cluster ID stability under reordering; polarity gate filters success-correlated patterns; path-traversal title is rejected at promotion; newline / pipe injection in trigger is neutralised; SKILL.md row is inserted under the right section; promotion is idempotent on second call; missing SKILL.md raises before writing the standing-order .md; missing table aborts atomically; the add-only invariant (existing standing-order files and SKILL.md rows are untouched) is verified; first-run with zero candidates does not litter the memory dir; ``--missions-dir`` derivation works from a non-project CWD; missing ``mission_id`` records are skipped; ``--json`` emits a structured summary. Full Python suite 338 → 353 passing. Documentation - ``references/structured-data.md`` documents all three new subcommands (``detect-patterns``, ``promote-candidate``, ``dismiss-candidate``) with arguments, exit semantics, and the new ``--missions-dir`` rule.

…e.md Replace the inline project-structure listing in both files with a short "Key references" section pointing at the new docs/project_structure.md and README.md. docs/project_structure.md captures the full repository layout in one place, including files previously missing from CLAUDE.md (circuit breakers, conflict radar, the full admiralty-templates and damage-control sets) and the learned-standing-orders pipeline from PR #120.

harrymunro added 2 commits May 11, 2026 10:39

harrymunro merged commit 9113af9 into main May 12, 2026
6 checks passed

harrymunro mentioned this pull request May 12, 2026

docs: extract project structure to docs/project_structure.md #121

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: learn standing orders from mission patterns (#87)#120

feat: learn standing orders from mission patterns (#87)#120
harrymunro merged 2 commits into
mainfrom
feat/issue-87-learned-standing-orders

harrymunro commented May 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

harrymunro commented May 11, 2026

Summary

Design notes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant