Session content search and secret scanning: storage, API, CLI by wesm · Pull Request #534 · kenn-io/agentsview

wesm · 2026-05-22T19:06:20Z

Adds search across session content and a secret-detection and findings
pipeline. This PR covers the engine, storage, HTTP API, and CLI. The web UI
for both features is deferred to a follow-up PR.

Content search

SearchContent queries message text, tool input, and tool result content in
three modes: substring, regex (with a literal prefilter), and FTS5.
Results return snippets snapped to rune boundaries; secrets detected in a
snippet are masked by default.
One-shot/automated sessions are excluded from content search by default.
CLI: agentsview session search <pattern> with --regex, --fts,
--in messages,tool_input,tool_result, and date/project/agent filters.
HTTP: GET /api/v1/search/content.

Secret detection (`internal/secrets`)

A pure detector (no DB, no IO) with two confidence tiers:

Definite (well-anchored vendor formats): AWS, Anthropic, GitHub, Slack,
Stripe, and Google API keys, plus PEM private-key blocks.
Candidate (false-positive-prone heuristics): basic-auth URLs, JWTs, and
high-entropy assignments.

Scan reports findings, Redact masks them in arbitrary text, and Verify
re-checks a finding's coordinates against the current source.

Secret findings: storage and scanning

New secret_findings table plus secret_leak_count and
secrets_rules_version columns on sessions. secret_leak_count counts
definite findings only.
Findings are scanned and persisted on each sync write and carried forward
across a full-resync orphan copy (the database is a persistent archive).
agentsview secrets scan [--backfill] runs a resumable scan over the
archive; agentsview secrets list lists findings, redacted by default.
--reveal prints unredacted values only when the server is bound to
localhost, warns on stderr, never logs or stores revealed values, and
re-reads the source by stored coordinates and re-validates before printing.
agentsview session list --has-secret filters to sessions with definite
findings; secret_leak_count is surfaced in session get and health output.

Inline sync scans the definite rules only and stamps a definite-only ruleset
version, keeping the candidate rules (which dominate both CPU and false
positives) out of the sync hot path. An explicit secrets scan runs the full
ruleset and adds candidate findings; secrets scan --backfill re-scans
sessions that received only the inline scan. secrets list defaults to
definite findings, with candidates opt-in via --confidence.

PostgreSQL

The optional read-only PostgreSQL mirror reaches parity with SQLite for these
features: schema for findings and session columns, push of secret_leak_count
and findings, the has-secret filter, findings list and source lookup, and
content search (substring and regex).

Migration

Schema changes are additive (new table, new columns); existing rows are
preserved. The data-version bump triggers a non-destructive full resync that
carries findings forward.

Not in this PR

The Svelte frontend for content search and secret findings is deferred to a
follow-up PR. This change is storage, API, and CLI only.

roborev-ci · 2026-05-22T19:09:58Z

roborev: Combined Review (`8bd10e5`)

Review blocked: one reviewer could not access the diff, and no substantive code findings were reported.