Skip to content

Self-repair Phase 3: contradiction detection on save (#347)#373

Merged
rockfordlhotka merged 4 commits into
mainfrom
self-repair-phase-3-contradictions
May 8, 2026
Merged

Self-repair Phase 3: contradiction detection on save (#347)#373
rockfordlhotka merged 4 commits into
mainfrom
self-repair-phase-3-contradictions

Conversation

@rockfordlhotka
Copy link
Copy Markdown
Member

@rockfordlhotka rockfordlhotka commented May 8, 2026

Closes #347.

Summary

  • Hot-path keyword detector resolves conflicting beliefs at memory-write time, narrowly scoped to claim/capability/* and feedback/* (other categories pay zero cost).
  • New MemoryEntry.SupersededBy marker hides superseded entries from SearchAsync/recall while keeping them on disk for audit; GetAsync still returns by id.
  • Dream service gains a per-cycle LLM-mediated contradiction sweep as a backstop for cases the keyword detector misses, with a guard that refuses to supersede a user-correction with a non-correction.

Acceptance criteria (issue #347)

  • Saving "calendar wrapper does pass arguments" supersedes the older "wrapper cannot pass arguments" claim — Phase3ContradictionEndToEndTests.Acceptance1_*.
  • Existing user-correction memories displace conflicting agent-self memories on the next dream sweep — DreamServiceContradictionSweepTests.Acceptance2_*.
  • Contradiction check is measurably narrow — no impact on saves outside claim/capability/* and feedback/*. Verified by MemoryContradictionDetectorTests.ResolveAsync_OutsideScopedCategories_ReturnsNoneWithoutScanningStore and Phase3ContradictionEndToEndTests.Acceptance3_*.

Resolution rules

  • Newer wins by default: older entries get SupersededBy = <new id>.
  • User correction always wins regardless of recency: the incoming entry is saved with SupersededBy = <existing id>. Identification: category prefix feedback/from-user/ or tag correction.

Commits

  1. d425b9b — base implementation: detector, storage, writer/tools wiring, dream sweep, directive, 21 unit/end-to-end tests.
  2. 94a8159 — bypass LLM extraction on scoped categories. Real-world test surfaced that MemoryTools.SaveMemory's LLM extraction was rewriting feedback/from-agent/... to agent-knowledge/..., defeating the scoping rule. Scoped categories are now contracts: saves under feedback/* or claim/capability/* skip extraction and persist verbatim. Tool description updated. 4 new tests.
  3. e019525 — stem plural-s and lower Jaccard threshold to 0.3. Real-world test showed "report" vs "reports" plus a trailing rationale clause yielded overlap of 0.33, below the original 0.4. Naive trailing-s stripping for tokens of length 4+ collapses singular/plural; threshold drop absorbs trailing clauses. Existing tests still pass; added a realistic-phrasing regression.
  4. 9401834 — relax the multi-candidate skip. Every entry in the candidate list already has the inverse valence of the incoming entry (filtered by the loop), so multi-match is not ambiguous — it just means the same rule was saved more than once before a reversal. The detector now supersedes them all. User-correction protection still trumps. Replaced the obsolete "ambiguous defers to sweep" test; added a regression test confirming user-correction protection in the multi-candidate path.

Production validation (deployed to staging K8s)

Five live confirmations against rockylhotka/rockbot-agent:0.10.40:

  • Hot-path supersession on opposite-valence feedback save: MemoryTools: marked <old> superseded by <new>
  • Scoped-category bypass preserves caller's category verbatim: scoped direct save ... (feedback/from-agent/status-reports, ...)
  • No false positive on unrelated rule in same category: Status reports should be timestamped saved with superseded=no while the existing TL;DR contradiction stayed undisturbed ✅
  • Dream contradiction sweep pass loaded directive at startup and ran on cadence (early-exit log on empty corpus): DreamService: contradiction sweep — only 0 claim/feedback entry/entries; skipping
  • Singular/plural overlap fix verified end-to-end (the very phrasing that originally failed) ✅

Out of scope

General-purpose contradiction detection across all memory categories — explicit non-goal in the design.

Behaviour change to note

This PR turns on the contradiction sweep dream pass by default (DreamOptions.ContradictionSweepEnabled = true). On the next dream cycle, it will scan claim/capability/* and feedback/* and may mark entries superseded based on the LLM's judgment. User-correction protection is enforced post-LLM, so a user correction can never be superseded by an agent-self entry.

Test plan

  • dotnet test RockBot.slnx — 1,733 passed, 0 failed (4 RabbitMQ + 11 Python integration tests skipped as expected).
  • CI: both build-and-test workflows green.
  • Live K8s validation against the real MemoryTools.SaveMemory LLM-callable surface.

🤖 Generated with Claude Code

rockfordlhotka and others added 4 commits May 8, 2026 16:00
Resolves conflicting beliefs at memory-write time, narrowly scoped to
capability claims (claim/capability/*) and feedback memories (feedback/*).
Saves outside those subtrees pay zero detection cost.

Hot path (deterministic, keyword-based):
- Capability claims: same (server, tool), opposite valence by negation
  marker set ("cannot|does not|blocked|not supported|...").
- Feedback: same rule subject (category subtree leaf), opposite directive
  (negation dominates affirmative when both keywords appear), Jaccard
  token overlap >= 0.4.
- User-tagged corrections (feedback/from-user/* or tag "correction")
  always win regardless of recency.
- Ambiguous matches defer to the dream sweep.

Storage:
- New MemoryEntry.SupersededBy init-only string?, round-trips through
  JSON serialisation.
- FileMemoryStore.SearchAsync hides superseded entries by default;
  MemorySearchCriteria.IncludeSuperseded opt-in for audit and the dream
  sweep itself. GetAsync still returns by id.

Wiring:
- CapabilityClaimWriter calls the detector before save and applies
  SupersededBy markers (or marks the incoming entry superseded when an
  existing user-correction wins).
- MemoryTools.SaveMemoryBackgroundAsync calls the detector only when the
  resolved entry's category starts with "feedback/", enforcing the
  narrow-scope rule on the LLM-callable save path.

Dream backstop:
- New per-cycle contradiction sweep pass loads only claim/capability/*
  and feedback/* entries and asks the LLM (via agent/contradiction-sweep.md)
  to flag remaining pairs. Refuses to supersede a user-correction with a
  non-correction. Application logic is extracted to
  ApplyContradictionSweepResultAsync for direct unit testing.

Tests: 21 new (detector unit, file store exclusion + round-trip, end-to-end
acceptance for issue #347 criteria 1-3, dream sweep). Full suite: 1,728
passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real-world test of #347 surfaced that MemoryTools.SaveMemory's LLM
extraction pass freely rewrites the caller-supplied category. A save
with category=feedback/from-agent/status-reports landed under
agent-knowledge/status-reports, so the contradiction detector — gated
on category.StartsWith("feedback/") — never ran.

Fix: when the caller supplies a scoped category (feedback/* or
claim/capability/*), bypass extraction and persist the content verbatim
under the supplied category and tags. Scoped categories are contracts,
not hints. The contradiction detector still runs, so opposing-directive
saves now correctly supersede prior entries.

Tool description updated to surface the reserved-category contract to
the LLM. Non-scoped saves keep the existing extraction behaviour
unchanged (regression test added).

Tests: 4 new MemoryToolsTests covering bypass, category preservation,
supersession wiring, and the non-scoped regression. Full suite: 1,732
passed, 0 failed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real-world test surfaced that "report" vs "reports" with a trailing
clause ("they should be concise without one") yielded a Jaccard overlap
of 0.33 — below the original 0.4 threshold — so the detector returned
no contradiction even though the directives clearly oppose each other.

Two narrow tightenings:
- TokenizeNonStopwords strips a trailing 's' from tokens of length 4+
  before hashing, collapsing report/reports, list/lists, etc. Naive
  but adequate for a rule-subject overlap signal; over-merging only
  improves recall on a comparison that is symmetric.
- MinFeedbackOverlap drops from 0.4 to 0.3 to absorb trailing rationale
  clauses that dilute the union without contributing to intersection.

Existing tests still pass — they used identical phrasings on either
side so already had Jaccard well above either threshold. Added a
realistic-phrasing regression test mirroring the deployed prompt.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The "skip when more than one candidate matches" rule was overly defensive.
Every entry in `contradicted` already shares the inverse valence of the
incoming entry — that's how it got into the candidate list — so multiple
matches are not ambiguous. They just mean the same affirmative rule was
saved more than once before a single negative reversal arrived.

Real-world test surfaced this: a leftover stale "Always include TL;DR"
entry plus a fresh "Always include TL;DR" matched a single "Never include
TL;DR" save. The detector deferred to the dream sweep instead of doing
the obvious thing — supersede both prior entries.

The change supersedes all candidates instead of skipping. The user-
correction protection at the top of the resolver still trumps everything,
so a correction among the candidates still wins. Genuine ambiguity (rule
subjects that don't actually overlap) is filtered out earlier by the
Jaccard + category-leaf gates.

Tests: replaced the now-obsolete "ambiguous defers to sweep" test with a
"all same-valence matches are superseded" test, and added a regression
test confirming user-correction protection still wins when one of the
multiple candidates is a correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rockfordlhotka rockfordlhotka merged commit 2d316fd into main May 8, 2026
2 checks passed
@rockfordlhotka rockfordlhotka deleted the self-repair-phase-3-contradictions branch May 8, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Self-repair Phase 3: contradiction detection on save

1 participant