Skip to content

Feature/copilot memory plugin plan#1923

Closed
jwayong wants to merge 32 commits intovolcengine:mainfrom
jwayong:feature/copilot-memory-plugin-plan
Closed

Feature/copilot memory plugin plan#1923
jwayong wants to merge 32 commits intovolcengine:mainfrom
jwayong:feature/copilot-memory-plugin-plan

Conversation

@jwayong
Copy link
Copy Markdown

@jwayong jwayong commented May 8, 2026

Description

Related Issue

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Refactoring (no functional changes)
  • Performance improvement
  • Test update

Changes Made

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Screenshots (if applicable)

Additional Notes

jwayong and others added 30 commits May 7, 2026 22:37
Sets up the workspace root + three packages (shared / vscode-extension /
cli-plugin) with TypeScript + Vitest. Each package has a stub source file
and smoke test wired to the shared package, so consumers fail at the
package boundary rather than via missing types.

`npm run typecheck` and `npm test` from examples/copilot/ exercise all
three workspaces.

Refs #3
Typed PluginConfig + loadConfig({agentIdDefault, hostOverrides}) +
isPluginEnabled. Resolution chain hostOverrides > env > ovcli.conf >
ov.conf > defaults preserves the Claude Code plugin's config contract so
one config file drives all three plugins.

ov.conf gains a new `copilot` block for tuning fields; the legacy
`claude_code` block remains a fall-through. envBool/num/str helpers live
in util/env.ts so other shared modules can reuse them.

27 unit tests cover precedence, copilot-vs-claude_code blocks,
bypass-glob expansion (CSV vs array vs hostOverrides), enable/disable
matrix, and the clamps/floors.

Refs #4
…daction

Append-only JSONL logger for the Copilot plugins. No-op when cfg.debug is
false so it's safe to wire into hot paths. Redacts apiKey / bearer /
token / secret / password / authorization values at any depth so nothing
sensitive lands on disk. Cycle-safe via a WeakSet visited tracker —
self-referential fields emit "[CIRCULAR]" instead of overflowing the
stack. Rotates at 10 MB by renaming to <path>.1 (overwriting any prior
backup).

Path resolution comes from cfg.debugLogPath, which is already
env-/host-overridable via OPENVIKING_DEBUG_LOG per #4.

12 unit tests cover disabled mode, JSONL line shape, child scope,
redaction (top-level / nested / array / case-insensitive), rotation, and
the cycle / BigInt fail-safes.

Refs #14
spawnDetached(opts) wraps child_process.spawn with detached:true, stdio
["pipe","ignore","ignore"], env merged on top of process.env, and
child.unref() so the parent doesn't wait. Worker errors hit the
debug-log via the optional logger; the parent never sees them. Sync
spawn failures (rare on POSIX) come back as {detached:false, error}.

runWriteTask picks the async or sync path based on cfg.writePathAsync,
falls back to the syncHandler when no asyncSpawn factory is provided or
the spawn returns detached:false. Always resolves — sync-handler errors
are caught and logged so the host hot path stays safe.

The shared package can't know where each host's worker entrypoint lives,
so the asyncSpawn(payload) factory is supplied per-host (the VS Code
extension and CLI plugin will provide their own worker scripts in later
phases). Naming + behaviour parity with the CC plugin's
scripts/lib/async-writer.mjs is intentional.

8 unit tests cover: stdin payload delivery to a real worker fixture, env
passthrough, nonexistent-command tolerance, sync handler invocation,
sync handler error swallow, async path returning in <150ms while worker
delays 200ms, spawn-failure fallback to sync, and async=true with no
asyncSpawn falling back to sync.

Refs #13
deriveSessionId(host, hostSessionId) returns
cp-<sha256(host + ':' + hostSessionId)>. The cp- prefix lets the
OpenViking server distinguish Copilot sessions from Claude Code (cc-)
sessions by URI alone, and using host as part of the digest input
guarantees the VS Code extension and the CLI plugin never collide on a
single OpenViking session even when they share an upstream session id.

Pure + deterministic: no clock, no env, no fs. The ':' separator
prevents collisions like ("ab","c") vs ("a","bc"). Hosts that don't
expose a stable session id (some CLI invocations) feed in a digest of
cwd + start-time as hostSessionId — the shared package doesn't dictate
how that's computed.

Eleven unit tests cover format (cp- prefix + 64 hex), determinism,
host- and hostSessionId-sensitivity, the empty-id edge case, the
':' separator collision-prevention, and three pinned SHA-256 vectors so
any accidental algorithm/prefix/separator change breaks the test loudly.

Refs #12
Typed wrapper over the OpenViking REST API the CC plugin's hooks
already exercise in production:
  GET    /health
  POST   /api/v1/search/find
  POST   /api/v1/sessions/{id}/messages   (looped per turn)
  POST   /api/v1/sessions/{id}/commit
  GET    /api/v1/sessions/{id}/context?token_budget=N

Headers (Authorization + X-OpenViking-{Account,User,Agent}) are sent
conditionally on the corresponding cfg field being non-empty, matching
ov-session.mjs's behaviour so the same ovcli.conf drives both plugins
identically.

Bypass at the client layer: cfg.bypassSession or a glob match in
cfg.bypassSessionPatterns against bypassContext.cwd /
bypassContext.hostSessionId short-circuits every method with a
synthetic ok result — [] for recall, null for archive overview,
{skipped:N} for writes — so the host hot path stays cheap and silent
for scratch sessions.

Errors are normalised: HTTP non-2xx, server-side body.status==='error',
and thrown fetch errors all collapse to {ok:false, error:{message,
status?}}. fetchArchiveOverview maps 404 to ok:true with null overview
since "session not yet on the server" is the resume-prime norm rather
than an error.

Timeout via AbortController, default cfg.timeoutMs with a per-call
override on recall. fetchImpl is injectable for tests; in production it
defaults to global fetch (Node 22+).

20 unit tests cover header injection (with + without auth), wire shape
of each endpoint, bucket flattening on recall, multi-turn append with
mid-batch failure, commit force flag, 404→null on archive, URL
encoding, error-mapping (HTTP / wrapped status / thrown), abort-on-
timeout via fake timers, bypass via session/cwd/host-id, and the
no-fetch construction guard.

Refs #5
Port of stripInjectedBlocks + sanitize from
examples/claude-code-memory-plugin/scripts/auto-capture.mjs, with two
additions: a new <copilot-context> marker (symmetric with
<openviking-context> for any host that surfaces a Copilot-side block)
and a public INJECTED_BLOCK_PATTERNS catalogue so Phase 0 spike
findings can append host-specific markers without touching the
sanitiser's call sites.

Two entry points by design:
- stripInjectedBlocks preserves whitespace (newlines, code fences) so
  output is safe to push back into OV as-is.
- sanitize additionally collapses whitespace; only suitable for
  classification (trigger detection, capture-or-skip), never for
  storage.

Six patterns covered: <relevant-memories>, <openviking-context>,
<copilot-context>, <system-reminder>, ^[Subagent Context]$ lines, and
NUL bytes.

19 unit tests cover marker removal (one per kind, multiple-per-line,
multi-line, all-mixed), whitespace preservation, idempotency, strict
sanitize collapsing, and a self-referential pollution scenario where a
recall block from turn N is embedded in turn N+1 — the stripped output
must contain zero marker bytes while the user's actual content
survives intact.

Refs #9
1:1 port of the ranking logic in
examples/claude-code-memory-plugin/scripts/auto-recall.mjs (originally
lifted from openclaw-plugin/memory-ranking.ts). Boost magnitudes,
stopword list, and tokenisation regex are preserved verbatim so the
Copilot plugins behave identically against the same fixtures.

Pipeline (rankRecallHits): filter by clampScore >= scoreThreshold →
sort desc by rankItem(profile) → dedupeItems → truncate to recallLimit.
Sort is V8-stable so ties preserve input order, which is the contract.

Boosts (rankItem):
- leaf +0.12 for level=2 or *.md URIs
- event +0.10 when query is temporal AND item is in events/
- preference +0.08 when query is preference AND item is preferences/
- lexical overlap up to +0.20, computed against the URI + abstract
  text, capped at the first 8 query tokens

Dedupe rule: events/cases dedupe by URI (so distinct events with
matching abstracts don't collapse); everything else dedupes by abstract
content with the URI as a fallback key.

recallTokenBudget is intentionally NOT enforced here — that lives in
recall/format.ts (#7), since budget logic needs per-item content
resolution (HTTP fetch for level=2 items) which doesn't fit a pure
function. Documented in the file header.

estimateTokens (chars/4) is exported from this module so the formatter
and ranker speak the same units.

27 unit tests cover clampScore, estimateTokens, buildQueryProfile,
lexicalOverlapBoost (with the 8-token slice cap), rankItem (each boost
path verified independently), isEventOrCaseItem + dedupeItems
(event-by-URI vs default-by-abstract), and the rankRecallHits pipeline
(threshold filter, limit truncation, query-aware sort, stable ordering
on tied ranks, dedupe-after-sort keeps highest-ranked, empty + negative
recallLimit edge cases).

Refs #6
formatRecallBlock(items, opts) emits the exact block shape the CC
plugin's auto-recall.mjs produces, so anything downstream that already
parses the CC output works against the Copilot output too — including
capture/sanitize.ts's stripper (round-trip verified by a sanitize-
based test).

Token budget: front-of-list items render with full content until the
budget is exhausted; subsequent items degrade to URI-only hints
instead of being dropped (preserves a useful pointer even when
content can't fit). The first content line always lands even if it
alone overflows the budget — recall returning one very long memory
is still better than an empty block. Mirrors openclaw spec §6.2.

Content resolution chain (resolveItemContent):
  - preferAbstract=true and abstract present → abstract
  - level=2 + fetchContent provided → fetchContent(uri); null/throw
    falls back to abstract → URI
  - otherwise → abstract → URI

maxContentChars caps each line; over-cap content is sliced + "..."
suffixed. Score rendered as whole-number percent. Type label comes
from item.type (set by OVClient when it flattens buckets); falls
back to "item" when missing.

Returns {block, contentCount, hintCount, budgetUsed} so the host can
log telemetry (mirrors the CC plugin's "injection_built" log line).
Empty input returns {block: null, ...} so the caller can skip
injection cleanly.

Also: small OVClient enhancement (still issue #5's contract — adding
a feature, not changing behaviour). flattenRecallBuckets now stamps
the singularised bucket name onto each hit as `type` ("memories" →
"memory") so the formatter can label items without the host
stitching the source label back. Server-set `type` fields are
preserved.

20 unit tests for formatRecallBlock cover block shape (open/close/
header verbatim, line format, score rounding, type fallback,
sanitize round-trip), token budget (URI-hint degradation, first-
item-always-included, budgetUsed accounting, tokenBudget=0 edge
case), content resolution (preferAbstract honoured, fetchContent
invoked, null/throw fallbacks, missing-abstract → URI, non-level-2
ignores fetchContent), and recallMaxContentChars truncation. Plus
2 OVClient tests for the type-tagging behaviour.

Refs #7
canonicaliseTranscript(turns, opts) is the shared core: sanitize each
turn via stripInjectedBlocks → drop empty → drop assistant turns when
captureAssistantTurns=false → drop turns whose sanitized text exceeds
captureMaxLength. Preserves input order, never mutates input.

captureMaxLength is intentionally a *rejection threshold*, not a
truncation cap, mirroring the CC plugin's auto-capture.mjs:shouldCapture
semantic. Tool I/O inlining can balloon a turn easily, and storing a
half-truncated message is worse than skipping it. The commit-queue (#10)
decides whether to retry skipped turns elsewhere.

Two host adapters:
- fromVSCodeChatHistory uses duck-typed structural interfaces
  (VSCodeChatRequestTurnLike, VSCodeChatResponseTurnLike) so the
  shared package never imports `vscode`. Discriminates user vs
  assistant by `prompt: string` vs `response: string`. Hosts
  pre-flatten the real ChatResponseTurn.response parts array to a
  string before handing the turn off. Unknown turn shapes silently
  skipped (forward-compatible against future VS Code API additions).
- fromCaptureToolArgs handles the upcoming CLI MCP openviking_capture
  tool's `{user, assistant?}` payload — produces one or two
  CanonicalTurnInputs and defers to the core.

21 unit tests cover sanitisation (injected-block strip, multi-marker
mix, whitespace preservation), empty handling (block-only, whitespace-
only, undefined-text defensive), captureAssistantTurns on/off,
captureMaxLength rejection semantics (drops overlong, keeps at-cap,
0 disables), input order + non-mutation, VS Code adapter
discrimination + forward-compatible skip + filter + strip, and CLI
adapter (user-only, paired, empty-assistant fallback, strip).

Refs #11
Per-session commit queue ties together OVClient (#5), async-writer
(#13), estimateTokens (#6), and the debug logger (#14). enqueue(turns)
appends to OV synchronously (so the server knows about the turns
before the next recall), accumulates a chars/4 token counter, and
when the counter crosses commitTokenThreshold dispatches a commit —
detached via runWriteTask when async=true + asyncSpawn provided,
awaited inline otherwise. flush() is the explicit force-commit for
SessionEnd / SubagentStop / PreCompact paths.

Failure modes:
- appendTurns failure → tokens NOT accumulated, no commit triggered
  (turns aren't on the server, so committing would archive nothing)
- commit failure → caught + logged, never thrown to host hot path;
  counter still resets eagerly so subsequent enqueues start fresh
  (data is on the server and the next commit catches it)

Double-commit guard: an in-flight dispatch sets flushInFlight which
short-circuits subsequent triggers until it resolves. Detached
dispatches resolve almost instantly; the guard mostly protects
against truly racing callers.

Queue takes client: CommitClient = Pick<OVClient, "appendTurns" |
"commit"> so tests can use a tiny mock without constructing a full
OVClient (no fetch stub needed).

14 unit tests cover append + token accumulation (empty no-op,
sub-threshold no commit, multi-enqueue accumulation, single-enqueue
threshold cross, exactly-at-threshold dispatch via >= comparison),
append-failure short-circuit, flush below threshold + with zero
pending, async-vs-sync dispatch (sync awaits 30ms inline, async with
asyncSpawn returns in <150ms while mock delays commit 200ms, async
without asyncSpawn falls back to inline per runWriteTask contract),
failure tolerance (commit fail doesn't throw + counter still
resets), and the double-commit guard via a re-entrant flush() during
an in-flight commit.

Refs #10
RecallCache amortises duplicate recall round-trips within a single
turn. In VS Code, the @openviking participant and the
openviking_recall LM tool can both fire on the same user prompt;
without the cache that's two HTTP calls to OV for an identical
(query, sessionId) pair. Default TTL 5s, default 64 entries.

Key shape: (query, sessionId, scope?) joined by Unit-Separator so
component fields with delimiters can't collide. JS Map preserves
insertion order, so LRU is implemented by delete + re-set on access
(promotes MRU end); eviction picks the first iterator entry. Expired
entries are deleted on read so size doesn't leak.

getOrFetch is a deliberate cache-miss-equals-no-cache path: it calls
the supplied fetch exactly once on miss and returns its result
verbatim. Successful results land in the cache; errors pass through
unchanged so the next turn can retry. Cache miss is therefore
bit-identical to wiring fetch directly — only side effect on a miss
is one extra cache.set after resolve.

ttlMs=0 disables caching entirely so hosts can flip the cache off
without rewriting call sites. Clock is injectable for deterministic
time tests. The TTL is fixed at write time — a hit promotes MRU
position but does NOT extend TTL (test pins this so future
optimisations don't silently change the contract).

17 unit tests cover defaults exposure, hit/miss + key discrimination
+ re-set TTL refresh, TTL expiry behaviour + hit-doesn't-extend
+ ttlMs=0 disable, LRU eviction (over-cap + hit-promotes-past-LRU
+ maxEntries=1), getOrFetch (no-fetch on hit, exactly-once on miss,
error not cached, miss-bit-identical-to-direct-fetch), and clear().

Refs #8

Phase 1 shared package is now feature-complete. Twelve modules:
config, env utils, debug logger, async-writer, session id, OVClient,
sanitize, rank, format, transcript, commit-queue, cache.
Real VS Code manifest with chatParticipants (openviking.memory),
languageModelTools (empty — populated by #22 + #26),
configuration (8 openviking.* properties; full ~20 in #19), and an
esbuild → vsce pipeline that produces openviking-copilot.vsix from
src/extension.ts. The bundle is CJS so VS Code's loader works
cleanly, with `vscode` marked external. The extension package no
longer carries `type: module` since the bundled output is CJS.

Two-file activation split:
- extension-core.ts (vscode-free) — buildActivationHandle reads
  PluginConfig with agentIdDefault: copilot-vscode, builds
  OVClient + DebugLogger, exposes a queue registry. Returns null
  when isPluginEnabled() is false so disabled installs cost
  nothing. registerCommitQueue is idempotent. runDeactivate
  parallel-flushes every registered queue with try/catch around
  each so one failing flush can't block the others.
- extension.ts (the vscode adapter) — thin: imports vscode, maps
  workspace settings to Partial<PluginConfig> overrides, hands off
  to extension-core, registers runDeactivate as a Disposable.

Splitting the core out of the adapter is the standard pattern for
testable VS Code extensions: extension-core.test.ts runs in plain
Vitest, no @vscode/test-electron needed for the activation logic
itself.

Tests: 13 new extension-core tests cover gating (disabled by
default, enabledOverride forces, env force-enable), config flow
(hostOverrides win, env precedence, agent-id default, logger +
client wired), and queue lifecycle (idempotent register, parallel
flush on deactivate, no-op for null handle, throwing flush
doesn't block siblings, empty registry completes cleanly).
Crucially, beforeEach points OPENVIKING_CONFIG_FILE /
OPENVIKING_CLI_CONFIG_FILE at /tmp paths that don't exist so
tests are deterministic regardless of the developer's real
~/.openviking/ovcli.conf.

`npm run package` produces a 9.05 KB .vsix containing the manifest
+ a 22.96 KB CJS bundle. .vscodeignore strips test/source/config
files from the artefact.

Refs #15
…ommands

Splits the chat participant across two files following the
extension-core pattern from #15:
- participant-core.ts (vscode-free) — buildRecallContext (recall +
  rank + format with cache short-circuit), runStore (enqueue + force
  flush), runForget (uri validation + DELETE). All return user-facing
  message strings; no throws.
- participant.ts — vscode adapter. registerOpenVikingParticipant
  derives the OV session id from the first workspace folder via
  deriveSessionId("copilot-vscode", ...), constructs ParticipantState
  with a per-session CommitQueue and RecallCache, registers the queue
  with the activation handle so it gets flushed on deactivate.

Manifest declares /recall, /store, /forget on the openviking.memory
participant. handleRequest dispatches on request.command:
  - /recall — buildRecallContext, render block in a code fence (or
    "_No relevant memories found._" when empty)
  - /store — runStore (rejects empty input, force-flushes after
    enqueue so the memory archives immediately)
  - /forget — runForget (rejects non-viking:// URIs without a network
    call, surfaces server errors as ⚠️)
  - default — buildRecallContext, push the block as the first stream
    chunk, delegate to request.model.sendRequest with the recalled
    context as a leading user message, pipe LM tokens into the
    stream, then captureTurn (sanitise via canonicaliseTranscript,
    enqueue against the per-session CommitQueue) once the stream
    completes. autoCapture=false short-circuits the capture path.

OVClient gains forget(uri, {recursive?}) — small enhancement to #5
that issues DELETE /api/v1/fs?uri=...&recursive=... with the same
bypass + tenant-header behaviour as the rest of the client. The
internal fetchJSON method-type widens to include "DELETE".

extension.ts wires registerOpenVikingParticipant after activation;
the participant disposable lands in context.subscriptions so VS Code
disposes it cleanly.

Tests: 14 new participant-core tests (recall short-circuits, hits
render, scoreThreshold drops below-floor hits, second call hits the
cache, transport error → empty result, store rejects empty +
appends + force-flushes + reports failure, forget validates uri +
issues DELETE + surfaces error) + 3 new OVClient.forget tests
(URL-encodes uri, appends &recursive=true, bypass short-circuit).
228/228 passing across all workspaces. Bundle still builds cleanly
to dist/extension.cjs at 45.5 KB.

Refs #16
Pulls the canonicalise + enqueue dance out of participant.ts into a
dedicated capture/on-response.ts module. captureChatTurn(opts) is the
single entry point that takes raw user/assistant text + the
participant's cfg/queue/logger, runs through canonicaliseTranscript
(sanitise → filter assistant if disabled → cap-overlong), and feeds
the surviving turns to queue.enqueue. Returns {enqueued, skipped,
triggeredCommit, pendingAfter} for telemetry; always resolves.

VS Code's current chat-extension API does not expose a global
"any participant produced a response" event — the participant's
request handler IS the subscription point. Phase 3 (#25) will plug
additional sources (e.g. default-chat events, contingent on Phase 0)
into this same entry point. Documented in the file header so the
abstraction's intent is obvious to whoever picks up #25.

Bypass is transparent: OVClient short-circuits at appendTurns +
commit when cfg.bypassSession is true or any bypassSessionPatterns
match. Tests confirm capturing in a bypassed session reports success
to the caller while issuing zero HTTP calls.

Async-detach: with cfg.writePathAsync=true + an asyncSpawn factory
on the CommitQueue, the actual commit RTT happens in a detached
worker — verified by a test that wraps client.commit in a forever-
blocking promise and asserts the total elapsed stays under 150ms.

8 new tests across 4 groups: gating (autoCapture=false /
empty-after-canonical → no enqueue / no network), happy paths
(both-roles enqueued, user-only mode drops assistant, strip-injected-
blocks pollution-test on BOTH user AND assistant text), bypass
transparency (bypassSession + bypassSessionPatterns), and the
async-detached path.

participant.ts now delegates captureTurn → captureChatTurn — no
behavioural change, just consolidation. Bundle still builds clean
(46.4 KB).

Refs #17
Wire OpenViking's HTTP MCP endpoint (`/mcp`) into Copilot Chat via
VS Code's runtime mcpServerDefinitionProviders API:

- package.json contributes mcpServerDefinitionProviders[id=openviking]
- src/mcp/manifest.ts is the pure builder: takes a resolved
  PluginConfig and emits {name, uri, headers}. Authorization +
  X-OpenViking-{Account,User,Agent} headers attached only when their
  cfg fields are non-empty, exactly matching OVClient.buildHeaders so
  MCP and REST traffic land with the same identity on the server.
  baseUrl trailing slashes are stripped before appending /mcp.
- src/mcp/register.ts is the vscode adapter:
  registerOpenVikingMcpProvider(handle) defensively feature-detects
  both vscode.lm.registerMcpServerDefinitionProvider and
  vscode.McpHttpServerDefinition. When either is missing on an older
  VS Code build, logs `mcp_provider_unavailable` and returns a no-op
  disposable so the extension still loads cleanly. The provider re-
  reads cfg every call so workspace-settings changes flow through on
  the next invocation.

Design choice (documented in the file header): runtime provider over
static .mcp.json. The connection details live in PluginConfig which
the extension already resolves with the env > host > ovcli.conf >
ov.conf chain — the dynamic provider injects *resolved* values so we
never need `${VAR}` substitution and never write the apiKey to a
JSON file on disk. This sidesteps the "header substitution syntax"
question Phase 0 was meant to spike for the static-file path.

extension.ts now registers the MCP provider after the participant;
the disposable lands in context.subscriptions for clean teardown.

10 unit tests cover MCP_PROVIDER_ID parity with the manifest entry,
name + uri shape (default + trailing-slash variants),
local-only-mode (empty headers), and remote / multi-tenant mode
(Authorization, all 4 tenant headers, only-populated-fields, header
shape mirrors OVClient).

246/246 tests passing across all workspaces. Bundle 47.9 KB.

Refs #18
Expand the VS Code surface from 8 hand-maintained settings to all 25
PluginConfig fields per PLAN.md §8.2, plus a SecretStorage-backed
`OpenViking: Set API Key` command so users never need to inline the
apiKey into settings.json.

Single source-of-truth: settings-schema.ts exports an
OPENVIKING_SETTINGS array of typed descriptors {key, type, default,
cfgField, secret?, enumValues?}. The manifest's
contributes.configuration block is hand-authored to match (so VS
Code gets exactly the JSON-schema fragments it needs); a
drift-detection test compares the two and breaks the build if
either side gets out of sync. Adding a new setting now means
touching schema.ts + the manifest — extension.ts's
readWorkspaceOverrides automatically picks it up by iterating the
descriptor list.

Set-API-Key flow split (matches the extension-core pattern):
- commands-core.ts (vscode-free) — runSetApiKeyCommand(secrets,
  input, opts?). password:true prompt, non-empty validate, trim,
  store, info message. Returns {saved, reason?: 'cancelled' | 'empty'}
  so callers can branch + tests can assert.
- commands.ts (vscode adapter) — wires vscode.window.showInputBox +
  vscode.window.showInformationMessage + context.secrets into the
  duck-typed SecretStorageLike + InputProvider shapes.

extension.ts becomes async (returns Promise<void>):
1. Register commands FIRST so the user can reach Set API Key even
   when the plugin is currently disabled.
2. Await context.secrets.get(SECRETS_API_KEY) and layer it above
   settings.json's apiKey value (SecretStorage > settings >
   env > ovcli.conf > ov.conf > defaults).
3. readWorkspaceOverrides drives off OPENVIKING_SETTINGS — no more
   hand-maintained mapping; the loop dispatches on descriptor.type
   (string / boolean / number / string-array / enum) and writes
   into PluginConfig fields by descriptor.cfgField.

Manifest:
- All 25 properties with markdownDescription. apiKey carries an
  explicit ⚠️ warning + nudge toward the SecretStorage command.
  Numeric properties get min/max bounds matching the floors in
  config.ts. captureMode is enum-restricted to semantic | keyword.
- New `commands` contribution declares the Set API Key command
  with category: "OpenViking" so it groups in the command palette.

17 new tests:
- settings-schema.test.ts (10): catalogue shape (25 entries, prefix
  invariant, secret/enum singletons), exported constants, drift
  detection (each schema entry → manifest property with matching
  type, default match, no orphan manifest properties), apiKey
  warning text, Set-API-Key command declared in manifest.
- commands-core.test.ts (7): happy path (password:true / store /
  trim / custom secretKey / validate function), cancel
  (undefined → reason:cancelled, no store, no info), empty
  (whitespace / "" → reason:empty).

263/263 passing across all workspaces. Bundle 52.88 KB, vsix 18.15 KB.

Refs #19
Real npm manifest with:
- bin: openviking-copilot-mcp → ./dist/mcp-server.js
- dependencies: @openviking/copilot-shared (workspace) +
  @modelcontextprotocol/sdk@^1.21.0 (used in #21)
- esbuild bundle script (scripts/build.mjs) reads
  package.json#version and injects it as the __OV_CLI_VERSION__
  define so --version reports the published value without runtime
  fs reads. Single ESM file out, source shebang preserved
  (banner removed mid-flight after the first build duplicated
  the shebang and broke Node's loader).
- prepack runs build, so `npm pack` always produces a fresh
  artefact.

cli.ts is the testable runMain(argv, opts) — vscode-free, takes
injectable stdout/stderr/loadConfig/isPluginEnabled. Flags:
  --help / -h     usage to stdout, exit 0
  --version / -v  semver to stdout, exit 0
  --check         loadConfig({agentIdDefault:'copilot-cli'}),
                  redacted summary to stdout (apiKey shown as
                  '<set, N chars>', never the value), exit 0/3
                  reflecting isPluginEnabled
  default         stub-to-stderr pointing at #21 where the real
                  MCP server bootstrap lands, exit 0
  unknown arg     error-to-stderr + exit 2

mcp-server.ts is a thin shebang-bearing shim that calls
runMain(process.argv.slice(2)) and exits with the returned code.
Top-level fatal handler logs stack to stderr.

10 unit tests cover --help / -h, --version / -v, --check (calls
loadConfig with the right agentIdDefault, summary content,
apiKey redaction with regex match on '<set, N chars>',
exit-code reflects enabled state), default invocation
(stub-to-stderr + exit 0), and unknown-arg handling.

End-to-end verified: npm run build → 13.2 kB ESM bundle, runs
cleanly under node, --check reads my real ~/.openviking/ov.conf
and prints the resolved baseUrl + agentId. npm pack →
openviking-copilot-cli-memory-0.0.0.tgz (4.4 kB, 2 files).

Old smoke.test.ts removed (replaced by cli.test.ts which exercises
the real entry point). .gitignore picks up *.tgz.

Refs #20
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
…-server

feat(copilot-plugin): add CLI MCP server tools
* feat(copilot-plugin): add recall tools

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

* fix(copilot-plugin): type recall config

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

---------

Co-authored-by: Januar <januar@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Januar <januar@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Januar <januar@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Optional degraded-fidelity capture path for the GitHub Copilot CLI.
The openviking_capture MCP tool from #26 stays the primary capture
mechanism; this wrapper closes one specific gap: captures the model
recorded mid-session that didn't cross commitTokenThreshold to trigger
an automatic commit.

Three pieces:

1. cli.ts gains `--commit-flush --session=<id>`. Loads cfg, builds an
   OVClient, force-commits the given session id. Bypass is honoured
   automatically because OVClient.commit short-circuits internally.
   Maps OVResult to exit codes (0=ok, 1=transport error with HTTP
   status if present, 2=missing/empty --session). commitFlush is
   injectable for tests so they don't hit a real server. The argv
   parser learns --key=value form alongside the existing --flag form.

2. server.ts wires OPENVIKING_CLI_SESSION_ID into runStdioMcpServer:
   when set, the MCP server uses it as defaultSessionId so the
   in-process openviking_capture calls and the wrapper's post-exit
   --commit-flush both target the same OV session.

3. wrapper/copilot.sh is a sourceable bash/zsh function. Per
   invocation it derives `cp-<uuid>` (uuidgen) or `cp-$$-$(date +%s)`
   (fallback), exports OPENVIKING_CLI_SESSION_ID, runs `command
   copilot "$@"`, then on exit fires `openviking-copilot-mcp
   --commit-flush --session=<id>`. Preserves the user's exit code
   regardless of what the post-exit call does.

   Configuration env vars:
   - OPENVIKING_BYPASS_SESSION=1   skip the wrapper entirely (no env
                                    var set, no post-exit call)
   - OPENVIKING_WRAPPER_QUIET=1    swallow commit-flush stderr too
   - OPENVIKING_DEBUG=1            wrapper + MCP-server hook log
                                    lands in ~/.openviking/logs/
   - OPENVIKING_CLI_SESSION_ID    (the wrapper sets it)

Why "degraded fidelity": the wrapper does NOT see the user's prompts
or the assistant's responses. Capture itself still requires the model
to call openviking_capture; without that, the OV session has nothing
pending and the post-exit commit archives nothing. The wrapper's
value is closing the model-called-capture-but-didn't-trigger-commit
edge case. README spells this out with a diagram and a "when NOT to
use" section so users have the right expectations.

Tests:
- 6 cli.test.ts cases for --commit-flush (missing session, success,
  transport error, HTTP status formatting, whitespace trim, empty
  after trim)
- 1 stdio integration in server.test.ts: spawns the real bin with
  OPENVIKING_CLI_SESSION_ID=cp-wrapper-coord-test +
  OPENVIKING_BYPASS_SESSION=true, calls openviking_capture without
  a sessionId, asserts the response's sessionId is the wrapper
  value
- Manual end-to-end shell smoke: wrapper exports env var, runs
  fake-copilot (exit 42), preserves the 42 exit code, then calls
  openviking-copilot-mcp --commit-flush --session=cp-<uuid>. Bypass
  mode skips entirely with empty env var and no post-exit call.

299/299 tests passing across all workspaces; typecheck clean.

Refs #27
Per PLAN.md §9.5. All three .github/workflows/copilot-*.yml files
gate on `paths: examples/copilot/**` so unrelated changes skip them
cleanly:

- copilot-shared.yml runs on ubuntu-latest only (pure-TS doesn't
  benefit from a matrix). Builds shared, typechecks, runs the new
  `test:coverage` script which exits non-zero when v8 line coverage
  on packages/shared/src drops below 80%. Uploads the coverage
  report as an artefact.
- copilot-vscode.yml runs on ubuntu/macos/windows × Node 22 with
  fail-fast:false. Typecheck + Vitest unit suite + esbuild bundle.
  Linux job additionally runs `vsce package` as a smoke for the
  publishable .vsix.
- copilot-cli.yml runs on the same matrix. Builds the bin first so
  the stdio integration test in server.test.ts can spawn it. After
  the suite passes, runs `--help` and `--version` against the built
  bundle as a real-process smoke. Linux job runs `npm pack` and
  uploads the tarball. Adds workflow_dispatch + nightly schedule so
  a future real-Copilot-CLI fixture can plug into the same job
  without changing trigger config.

Supporting changes:
- packages/shared/vitest.config.ts gains v8 coverage config with
  thresholds.lines:80 (current measurement: 91.85% — comfortably
  clear). Includes lcov reporter so coverage tooling like Codecov
  can pick it up later.
- packages/shared/package.json gains a `test:coverage` script.
- root package.json adds @vitest/coverage-v8 as a dev dep.
- cli-plugin/scripts/build.mjs guards chmodSync behind
  `process.platform !== 'win32'` + try/catch so Windows CI doesn't
  fail with EPERM. The chmod is a no-op on Windows anyway since
  NTFS doesn't track POSIX mode bits.

Local validation: 299/299 tests green, typecheck clean, coverage
gate passes at 91.85% lines, exit code 0.

Acceptance status:
- [x] Coverage gate ≥80% lines on packages/shared/src/
- [x] Workflows skip when no relevant files changed (paths filter)
- [⏳] All three pass on a no-op PR — verifiable on next GitHub
      Actions run after this commit lands

Refs #30
…gnostics

Two host-free diagnostic runners in @openviking/copilot-shared,
mirroring the CC plugin's scripts/debug-recall.mjs +
scripts/debug-capture.mjs. Plus two new flags on the CLI bin
(`--debug-recall=<query>` / `--debug-capture=<path>`) that wire the
runners into runMain so users can invoke them via the existing
`openviking-copilot-mcp` install without a separate executable.

debug/recall-debugger.ts (runDebugRecall):
  - Sections: config snapshot (apiKey redacted to <set, N chars>),
    health check, query profile (tokens + temporal/preference
    flags), recall request body, raw flat-hit count, ranked list
    with base/rank/boost per hit, final <openviking-context>
    block, telemetry (contentCount/hintCount/budgetUsed/bytes).
  - Health failure → exitCode 1 but continues; recall failure →
    exitCode 2 + aborts after that point so the user sees the OV
    response shape that broke things.
  - Always resolves; never throws.

debug/capture-debugger.ts (runDebugCapture):
  - Loads JSON transcript ([{role, text}, ...]) with shape
    validation (rejects non-array, non-object items, missing
    role/text, unsupported roles).
  - Per-turn KEEP/DROP report with explicit reason — one of
    empty-after-sanitise / filtered-assistant / too-long /
    bad-shape — plus raw/sanitised/trimmed lengths so it's
    obvious which gate triggered.
  - Final OVTurn[] payload with first-200-char previews per turn.
  - Commit-queue projection: estimated tokens vs
    commitTokenThreshold with explicit YES/no would-trigger
    verdict.
  - Footer notes when bypassSession or autoCapture=false is set,
    so users immediately understand why the diagnostic differs
    from production behaviour.
  - File-not-found / invalid-JSON / not-an-array failures map to
    exitCode 2 with a helpful stderr message.

Both runners take injectable client/readFile/write so tests stay
fast and host-free. Returns {exitCode, output} so the bin can map
straight to a process exit code.

CLI integration in cli.ts:
  - argv parser learns --debug-recall=<value> and
    --debug-capture=<value> alongside the existing --session=
    pattern (key=value form).
  - Empty-after-trim arg rejected with exit 2.
  - Default runners construct OVClient + createDebugLogger and
    hand off; tests inject vi.fn() runners so no real network or
    file I/O happens.

19 new tests:
  - debug-recall.test.ts (7): happy path emits all sections,
    apiKey redaction (no leak + regex match on <set, N chars>),
    write-sink streaming, unhealthy server continues with exit 1,
    recall failure aborts with exit 2, empty-after-threshold msg.
  - debug-capture.test.ts (12): happy path, threshold YES/no
    flag, all four drop reasons, three file-error paths, bypass
    + autoCapture footer notes.
  - cli.test.ts (+6): both flag rejections (no value / empty
    value), runner invoked with trimmed args, output piped
    through, exit code propagated.

End-to-end smoke verified:
  - --debug-capture=transcript.json prints a clean report against
    a real file with an injected <openviking-context> block —
    KEEP/KEEP per-turn, cleaned payload preview, "would trigger
    commit: no" verdict.
  - --debug-recall="..." in bypass mode prints config + bypassed
    health + empty ranked list + empty final block.

Tests: 324/324 passing across all workspaces (223 shared + 63
vscode-extension + 38 cli-plugin). Shared coverage stays at
93.19% lines, well clear of the 80% gate (debug modules
themselves are at 97.86%).

Refs #29
Co-authored-by: Januar <januar@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
Co-authored-by: Januar <januar@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
jwayong and others added 2 commits May 8, 2026 22:25
Co-authored-by: Januar <januar@gmail.com>
Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 85
🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ No major issues detected

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 8, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Replace blocking sync file read with async read in async function

Replace blocking sync file read with async file read in the async runDebugCapture
function to avoid blocking the event loop. Update the readFile dependency to return
a promise and use fs/promises.readFile as the default implementation.

examples/copilot/packages/shared/src/debug/capture-debugger.ts [21-239]

-import { readFileSync } from "node:fs";
+import { readFile } from "node:fs/promises";
 // ...
 export interface DebugCaptureDeps {
   cfg: PluginConfig;
-  /** Inject for tests; defaults to fs.readFileSync at runtime. */
-  readFile?: (path: string) => string;
+  /** Inject for tests; defaults to fs.promises.readFile at runtime. */
+  readFile?: (path: string) => Promise<string>;
   /** Buffered output sink. Defaults to a string accumulator. */
   write?: (chunk: string) => void;
 }
 // ...
 try {
   const reader = deps.readFile ?? defaultReadFile;
-  body = reader(args.path);
+  body = await reader(args.path);
 } catch (err) {
   writeLn(`  ERROR: ${err instanceof Error ? err.message : String(err)}`);
   return { exitCode: 2, output: lines.join("") };
 }
 // ...
-function defaultReadFile(path: string): string {
-  return readFileSync(path, "utf-8");
+async function defaultReadFile(path: string): Promise<string> {
+  return readFile(path, "utf-8");
 }
Suggestion importance[1-10]: 5

__

Why: Replacing the blocking synchronous file read with an async read in the async runDebugCapture function avoids blocking the event loop, improving the function's behavior in async contexts. The suggestion correctly updates the readFile dependency interface to return a promise, uses fs/promises.readFile as the default implementation, and adjusts the call site to await the result.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

1 participant