Skip to content

feat(memory): reconcile stale exact-lock updates#2335

Open
huangruiteng wants to merge 26 commits into
volcengine:mainfrom
huangruiteng:feat/memory-file-lock-patch-rewrite-pr1
Open

feat(memory): reconcile stale exact-lock updates#2335
huangruiteng wants to merge 26 commits into
volcengine:mainfrom
huangruiteng:feat/memory-file-lock-patch-rewrite-pr1

Conversation

@huangruiteng
Copy link
Copy Markdown
Contributor

@huangruiteng huangruiteng commented May 30, 2026

Summary

This PR adds an opt-in exact file-lock apply path for concurrent OpenViking Memory V2 writes. The validated PR-1 scope is agent memory extraction for trajectories and experiences.

The goal is to let memory extraction run concurrently without relying on broad schema tree locks, while keeping write correctness at apply time:

  • lock the exact target file before apply
  • read the latest file under that lock
  • merge against the latest content using the read-time base value
  • rewrite stale string SEARCH/REPLACE patches when configured
  • synthesize stale string replacements when configured
  • fail fast, or record an explicit OCC skip, instead of silently producing a partial corpus

The new behavior is disabled by default behind:

  • memory.memory_apply_exact_file_lock_enabled
  • memory.stale_patch_rewrite_enabled

What Changed

Runtime memory apply:

  • Adds runtime base envelopes: StrPatchWithBase and ReplaceValueWithBase.
  • Applies memory upserts under exact file locks, then re-reads the latest target file before merge.
  • Rewrites stale StrPatch SEARCH/REPLACE patches against the latest field value.
  • Synthesizes stale string replace proposals against the latest field value.
  • Treats legacy plain-string patch in the agent exact path as a base-aware full replacement, not as a substring patch.
  • Records per-field apply_trace rows with URI, field, merge op, input shape, wrapper shape, stale/rewrite status, and applied/failed/skipped status.
  • Handles stale deletes and create-vs-existing races with explicit skip states instead of resurrecting or overwriting files.

Scope gates and validation plumbing:

  • Keeps exact apply scoped to agent trajectories / experiences.
  • Keeps standard long-term user memory, tools, skills, and session_skills on the existing tree-lock path.
  • Adds read-only memory graph health inspection for link/backlink consistency, source-link coverage, schema-heading checks, and lightweight source-density / near-duplicate diagnostics.
  • Extends TAU corpus/eval runner gates with commit concurrency, cache identity checks, concrete scoped search/read checks, graph health, retrieval trace coverage, completed scoreboard checks, and corpus provenance.

Scope

Included:

  • Same-URI patch/update/delete correctness under exact file locks.
  • Stale StrPatch rewrite for string SEARCH/REPLACE patches.
  • Stale string replacement synthesis.
  • Agent-only validation config for trajectories / experiences.
  • Deterministic transcript-local tool/skill SUM counter deltas as supporting cleanup.
  • Per-phase memory_diff_<phase>.json archives with apply traces.
  • Graph/schema/retrieval/eval gates needed to validate the write primitive.

Not included:

  • Exact apply for standard long-term user memory.
  • Exact apply for tool/skill memory or session_skills.
  • Non-string replacement synthesis.
  • Cross-URI create-new admission or semantic duplicate consolidation.
  • Session skill asset lifecycle validation.
  • Failure-experience retrieval or prompt changes.
  • A TAU reward uplift claim.

Validation

Current pushed head: 20ca50ed.

Local targeted validation:

OPENVIKING_CONFIG_FILE=/private/tmp/pr2335_pr1_pytest_ov.conf \
  .venv/bin/python -m pytest -q -o addopts='' -m 'not integration' \
  tests/session/memory/test_memory_updater.py \
  tests/session/memory/test_merge_ops.py \
  tests/session/memory/test_compressor_v2.py \
  tests/unit/session/test_compressor_v2_schema_gate.py \
  tests/unit/tool_skill/test_tool_skill_utils.py \
  tests/benchmark/tau2/test_run_memory_v2_eval_gates.py \
  tests/benchmark/tau2/test_analyze_memory_v2_corpus_quality.py \
  tests/client/test_rebuild_clients.py

Result:

  • 166 passed, 1 deselected, 6 warnings
  • git diff --check: passed

Client regeneration check:

.venv/bin/python -m pytest -q -o addopts='' tests/client/test_rebuild_clients.py

Result:

  • 16 passed, 5 warnings

GitHub checks on 20ca50ed:

  • check-deps: pass
  • lint / lint: pass
  • API & CLI Integration Tests (ubuntu-24.04): pass
  • pr_review: pass
  • build: skipped

Corpus Build Speed

The clean review-facing denominator is exact C4 vs safe serial tree C1. A same-concurrency tree C4 run was attempted but failed the strict write gate before producing valid manifests, so this PR does not claim exact C4 vs tree C4 speedup.

Domain Exact C4 wall Safe tree C1 wall Exact / tree
retail 3595.885s 13096.090s 0.275x
airline 584.458s 1545.286s 0.378x
total 4189.586s 14642.055s 0.286x

Both valid runs used the same cached train results and produced valid retail + airline corpus manifests. The comparison is conservative: it measures the PR-1 concurrent exact path against the safe serial tree-lock baseline, not a lock-manager microbenchmark.

Full-Train Corpus Health

All four full-train corpora passed strict_corpus_gate with claim_valid=true.

Hard gates were clean in all four corpora:

  • source-linkless experiences: 0
  • broken endpoints: 0
  • missing backlinks: 0
  • missing forward links: 0
  • parse errors: 0
  • missing required headings: 0
  • scoped search/read probes: non-empty
  • session_skills_extracted_total: 0

Corpus shape:

Corpus Sessions committed / skipped Memories extracted Experiences Trajectories Source avg Single-source
exact retail C4 63 / 11 691 78 180 5.90 29.49%
tree retail C1 63 / 11 670 62 174 7.11 33.87%
exact airline C4 26 / 4 136 25 45 3.44 36.00%
tree airline C1 26 / 4 133 32 44 2.75 50.00%

Quality boundary:

  • Exact retail creates more experience cards than tree and has more name-similar pairs.
  • Tree tends to fold later sessions into broader cards; exact tends to preserve narrower sibling cards.
  • This is a semantic consolidation boundary for a follow-up admission/consolidation PR, not a graph/schema correctness failure in this PR.

Full Held-Out Eval Gate

Protocol:

  • retail + airline
  • full held-out test split
  • 4 repeats
  • 8/8 cells completed
  • 240 simulations
  • exact and tree summaries had valid corpus/effect evidence
  • retrieval traces were non-zero and all injected reads were non-empty
Mode Retail reward / DB Airline reward / DB Task-weighted reward Task-weighted DB Retrieval aggregate
exact 0.81875 / 0.81875 0.78750 / 0.81250 0.80833 0.81667 513 events / 1026 injected / 1026 non-empty reads
tree 0.84375 / 0.84375 0.75000 / 0.76250 0.81250 0.81667 482 events / 964 injected / 964 non-empty reads

Same-protocol exact vs tree:

  • task-weighted reward delta: -0.0042
  • task-weighted DB delta: 0.0

This supports no catastrophic corpus, retrieval, or scoreboard regression for the exact path. It does not claim reward uplift or semantic relevance for every retrieved memory.

Apply Trace / Stale Handling

Full-train archive traces:

Mode Apply trace rows Applied Skipped stale-deleted Stale detected / rewrite attempted
exact C4 2062 2056 6 87 / 87
tree C1 1992 1992 0 0 / 0

The exact path records stale and delete conflicts explicitly. These rows are not hidden as successful writes.

Claim Boundary

This PR is a common server-side write-safety primitive for agent memory. It validates exact file-lock apply, stale rewrite/synthesis, graph/schema health, retrieval plumbing, and non-catastrophic full-eval regression for agent trajectories / experiences.

It does not claim:

  • exact-file-lock coverage for user memory, tools, skills, or session_skills
  • cross-URI create-new duplicate resolution
  • semantic applicability parity
  • TAU reward uplift
  • same-concurrency tree C4 speedup
  • production default enablement

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 30, 2026

PR Reviewer Guide 🔍

(Review updated until commit 20ca50e)

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: feat: exact file-lock write safety for memory updates

Relevant files:

  • tests/session/memory/test_memory_updater.py
  • openviking/session/memory/memory_updater.py
  • openviking/session/memory/merge_op/patch.py
  • openviking/session/memory/merge_op/replace.py
  • openviking/session/memory/merge_op/base.py

Sub-PR theme: feat: memory graph health inspection endpoint

Relevant files:

  • openviking/session/memory/graph_health.py
  • openviking/server/routers/stats.py
  • tests/server/test_api_stats_memory_graph.py

Sub-PR theme: feat: schema quality gates for corpus

Relevant files:

  • openviking/session/memory/schema_quality.py
  • tests/unit/session/test_compressor_v2_schema_gate.py

⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

PR Code Suggestions ✨

No code suggestions found for the PR.

@huangruiteng huangruiteng changed the title feat(memory): add exact-lock stale patch rewrite feat(memory): reconcile stale exact-lock updates May 31, 2026
@huangruiteng
Copy link
Copy Markdown
Contributor Author

Landing update for PR-1 (2026-06-02): refreshed the PR body for head 217fba14, explicitly kept PR-2 admission/consolidation out of scope, and added the local targeted non-integration result (150 passed, 1 deselected). The earlier Qodo duplicate _text_digest note is addressed by 42da5956; the StructuredLLM resilience/cost topic remains bounded as follow-up telemetry in the PR body. A same-concurrency tree C4 wall-time rerun is still running separately, so this PR does not claim a finalized full-train wall-time ratio.

@huangruiteng huangruiteng marked this pull request as ready for review June 2, 2026 07:16
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Persistent review updated to latest commit 217fba1

@huangruiteng huangruiteng marked this pull request as draft June 2, 2026 07:19
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

PR Code Suggestions ✨

No code suggestions found for the PR.

@huangruiteng
Copy link
Copy Markdown
Contributor Author

Draft-state correction: I moved #2335 back to draft because review should include a rigorous speed + quality analysis, not just code/CI readiness. Current quality gates and full r4 non-regression evidence are already summarized in the PR body; the missing piece is the same-concurrency tree C4 wall-time baseline. A fresh tree C4 run is still in progress, so the PR should stay draft until the final speed table / claim boundary is updated.

…-patch-rewrite-pr1

# Conflicts:
#	tests/client/test_rebuild_clients.py
@huangruiteng huangruiteng marked this pull request as ready for review June 2, 2026 12:21
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Persistent review updated to latest commit 20ca50e

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

PR Code Suggestions ✨

No code suggestions found for the PR.

@huangruiteng
Copy link
Copy Markdown
Contributor Author

Follow-up to my earlier draft-state note: the final review-facing speed denominator and quality/health/eval evidence are now in the PR body, head 20ca50ed checks are green, and #2335 is marked ready for review. The scope remains PR-1 only; PR-2 admission/consolidation stays out of this upstream PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

1 participant