PostgreSQL dialect refactor - PR3 of 4 - Functional store layer by webgress · Pull Request #328 · kenn-io/msgvault

webgress · 2026-05-20T23:58:39Z

Summary

Makes the PostgreSQL backend functional through the dialect-abstracted store layer.
Builds on PR1 (#276, dialect extraction) and PR2 (#298, PG scaffold).

[... PR description here ...]

Test plan

make test (SQLite, all packages pass)
MSGVAULT_TEST_DB=postgres://... make test (PostgreSQL, all packages pass)

Deferred to PR4

FTS weight parity (SQLite FTS5 vs PG ts_rank with setweight)
Deletion execution E2E on PG
Attachment storage E2E on PG
Vector pgvector backend (parallel to sqlitevec)
CI lane for MSGVAULT_TEST_DB=postgres

Makes the PostgreSQL backend functional end-to-end through the store layer. After this commit, a PostgreSQL connection can initialize the schema, insert rows, run FTS queries, and serve the TUI / HTTP / MCP aggregate paths. SQLite path is unchanged; all existing tests pass. Builds on PR1 (kenn-io#276, Dialect interface) and PR2 (kenn-io#298, PostgreSQLDialect scaffold + pgx wiring). PR4 will follow up with FTS weighting parity, end-to-end deletion/attachment coverage, and a PG CI lane. Dialect interface - Add `LegacyColumnMigrations()` — moves the hardcoded ALTER TABLE migration loop out of InitSchema. SQLite returns the full list; PostgreSQL returns nil (schema_pg.sql ships complete). - Add `DatabaseSize()` — SQLite uses os.Stat on the file path, PostgreSQL uses pg_database_size(current_database()). Fixes GetStats reporting 0 when the DSN is a URL, not a path. Schema - internal/store/schema_pg.sql: full PostgreSQL-native DDL. BIGINT GENERATED ALWAYS AS IDENTITY, TIMESTAMPTZ, BYTEA, JSONB, tsvector column + GIN index for FTS, ON DELETE CASCADE relations. Store layer (Rebind + RETURNING) - All store-layer queries flow through loggedDB / loggedTx, which apply dialect.Rebind() to every statement. Call sites emit portable `?` placeholders. - LastInsertId() replaced with RETURNING id at every insert site in messages.go and sync.go (pgx has no LastInsertId). - collection.go, sync.go: remaining raw `INSERT OR IGNORE` statements now go through dialect.InsertOrIgnore(...). - store.go InitSchema uses dialect.LegacyColumnMigrations() instead of a hardcoded SQLite migration loop. GetStats uses dialect.DatabaseSize(). - store.go: add `Store.IsPostgreSQL()` so callers can pick the engine without depending on internal/query. FTS backfill - dialect_pg.go: FTSBackfillBatchSQL uses LEFT JOIN message_bodies so messages without a body row are still indexed (header-only FTS). Query engine - internal/query/dialect.go (new): small query-layer Dialect interface for the SQL-generation differences that surface in aggregate/search paths — Rebind, TimeTruncExpression (strftime vs to_char), FTSSearchExpression (FTS5 MATCH vs tsvector @@), HasFTSTableSQL, FTSJoin, BuildFTSTerm, SanitizeFTSQuery. - internal/query/postgres.go: replaces the ErrNotImplemented-everywhere scaffold with a thin constructor that builds a dialect-parameterized SQLiteEngine; NewEngine(db, isPostgres) factory picks the right one. ErrNotImplemented kept as a sentinel for engines/methods a backend can't satisfy (used by isEngineUnsupported in the API handler). - internal/query/sqlite.go: SQLiteEngine takes a Dialect field; all QueryContext / QueryRowContext go through dialect-aware wrappers. Aggregate helpers (aggDimensionForView, buildFilterJoinsAndConditions) take a Dialect argument so they can emit the right SQL on either backend. Engine factory wired - cmd/msgvault/cmd/*: every `query.NewSQLiteEngine(s.DB())` call site now uses `query.NewEngine(s.DB(), s.IsPostgreSQL())`. Docs - docs/PG_STATUS.md: mark PR3 blockers resolved, document what works end-to-end now, scope remaining PR4 work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 54b3f04)

collection.go was added after PR2 and used res.LastInsertId(), which pgx does not support. Switch to the same RETURNING id pattern used in messages.go and sync.go so the call works on both backends. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit c534f1f)

Both tables were added to schema.sql after PR2 landed but were never ported to schema_pg.sql. Their absence caused 31 PG test failures across cmd, dedup, fbmessenger, and store packages (all surfacing as 'relation does not exist'). Column types match the SQLite definitions: source_id is BIGINT (PG identity column type), TEXT for strings, TIMESTAMPTZ for timestamps. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 7e2d4f6)

SQLite permits referring to a SELECT-list alias inside HAVING; PG only permits aliases in ORDER BY. The 'HAVING cnt > 1' clause caused 14 PG test failures across dedup with 'column "cnt" does not exist'. Switch to 'HAVING COUNT(*) > 1' which is portable. The SELECT-list alias 'AS cnt' is still scanned by name, so the receiving column remains unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit f6d7e07)

PostgreSQL has no implicit cast between BOOLEAN and INTEGER, so 'COALESCE(m.is_from_me, 0)' and 'CAST(EXISTS(...) AS INTEGER)' both failed with type errors on PG. SQLite stores booleans as 0/1 internally and accepts those forms. Replace both with portable 'CASE WHEN ... THEN 1 ELSE 0 END' which returns an integer on both backends and keeps the Go scan side (scanning into int, comparing == 1) unchanged. Resolves the 4 PG COALESCE/operator type errors in GetDuplicateGroupMessages and GetAllRawMIMECandidates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 6451b16)

BEGIN EXCLUSIVE is SQLite-only syntax. PostgreSQL needs an explicit LOCK TABLE on the table the protected operations touch. Both RemoveSourceSerialized and WithExclusiveLock want the same invariant: block concurrent StartSync (INSERT INTO sync_runs) while letting readers proceed. - SQLite implementation: a single 'BEGIN EXCLUSIVE'. WAL mode keeps readers unblocked. - PostgreSQL implementation: 'BEGIN' + 'LOCK TABLE sync_runs IN EXCLUSIVE MODE'. EXCLUSIVE conflicts with the ROW EXCLUSIVE lock INSERT acquires, so StartSync blocks; ACCESS SHARE (reads) is still permitted. Resolves the 3 'syntax error at or near "EXCLUSIVE"' PG test failures in TestStore_RemoveSourceSerialized_*. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 91c6e7b)

The sender-rehydration QueryRow in importer.go went through st.DB().QueryRow() — the raw *sql.DB, which bypasses loggedDB's auto-Rebind. With ? placeholders this works on SQLite but PostgreSQL errors at parse time. Wrap the SQL in st.Rebind(...) so the same statement runs on either backend. This is a latent production bug that the existing tests don't currently exercise (the rehydration branch is only taken when a prior import left sender data); fixing it preemptively. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit a68c4cb)

The PostgreSQL InsertOrIgnore detected "complete statement" by checking for a trailing ")" — meant to distinguish VALUES-form INSERTs from the prefix form used by insertInChunks. But INSERT ... SELECT statements fall through as well (they end with FROM/WHERE, not ")"), so the conflict clause was never appended. This caused EnsureDefaultCollection, MergeDuplicates label-union, and MergeDuplicates raw-MIME backfill to fail with unique-constraint violations on PG. Switch the prefix detection to "ends with VALUES" (the actual distinguishing feature of the chunked-prefix form). Everything else, including INSERT ... SELECT, now correctly gets ON CONFLICT DO NOTHING. Add a regression test case for the INSERT ... SELECT path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 16eb96f)

Sweeps internal/store tests so every package runs green against both backends: - Wrap all raw-? queries on st.DB() / f.Store.DB() with st.Rebind() (sync_test, dedup_test, dedup_delete_test, messages_test, sources_test, account_identities_test, storetest fixture's GetMessageBody). - Drop the LastInsertId path in messages_test in favor of INSERT ... RETURNING id (already what production code does). - Skip FTS5-vtable tests on PG: TestStore_UpsertFTS, BackfillFTS, NeedsFTSBackfill, RebuildFTS_* (4 cases). PG uses a tsvector column exercised separately via FTSSearchClause / FTSBackfillBatchSQL. - Skip TestStore_BackfillRFC822IDs_DoesNotOvercountRolledBackBatch on PG — it uses SQLite-specific CREATE TRIGGER ... NEW.* / RAISE(FAIL,...) syntax. - Skip TestScanSyncRun_ZeroTime / TestScanSource_ZeroTime on PG — they assert go-sqlite3 driver normalization of invalid DATETIME strings to zero time; PG TIMESTAMPTZ rejects the string outright. - Gate the messages_fts existence check in TestStore_RemoveSource with IsPostgreSQL() so PG doesn't query the SQLite vtable. No production code changed by this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit 5391001)

Sweep test code so the remaining four PG-failing packages run green: internal/fbmessenger: - Wrap all raw-? queries on st.DB() with st.Rebind() across importer_test.go (NonTextMessageBodies, LabelTaxonomy, SelfParticipantSeeded, SenderIDPreservedOnReimport, ReimportRepairsConversationParticipant). - Replace 'is_from_me = 1' / '= 0' literal comparisons with 'IS TRUE' / 'IS NOT TRUE' (portable on SQLite ≥3.23 and PG). - Skip the two FTS5-MATCH tests on PG (MojibakeFTSIndexed, ReactionsDualPath); the dialect-level tsvector behavior is covered separately by FTSSearchClause. internal/dedup: - Rebind assertSoftDeleted helper's '? param. cmd/msgvault/cmd: - Skip all four UTF-8 repair tests on PG: they either toggle 'PRAGMA foreign_keys = OFF' (SQLite-only) or insert raw invalid UTF-8 bytes into TEXT columns (PG rejects with invalid_text_representation). The TestRepairEncoding_NoScanErrors happy-path test (no invalid bytes) stays portable and runs on PG. No production code changed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> (cherry picked from commit dd54561)

PostgreSQLDialect.LegacyColumnMigrations() previously returned nil on the assumption that schema_pg.sql is always shipped complete. That's true for fresh installs but leaves existing PG databases unable to pick up new columns added to schema_pg.sql over time — the SQLite ALTER TABLE migration loop in InitSchema iterates the dialect's list, so an empty list means no upgrade path on PG. Mirror the SQLite list, translating types to their PG spellings: INTEGER (id ref) → BIGINT INTEGER (counter) → INTEGER TEXT → TEXT DATETIME → TIMESTAMPTZ JSON → JSONB Each ALTER uses `ADD COLUMN IF NOT EXISTS` (supported by PG ≥ 9.6) so the statements are idempotent on their own; IsDuplicateColumnError stays as a safety net. Only columns that exist in both schemas are ported — no webgress-only tables are referenced. Update the Dialect interface comment to reflect that both backends now return the same logical list, and add a row to PG_STATUS.md's "Resolved in PR3" table.

roborev-ci · 2026-05-21T00:02:42Z

roborev: Combined Review (`c8db376`)

Summary verdict: PostgreSQL support has runtime-breaking query issues that need fixing before merge.

High

internal/query/sqlite.go:154
buildAggregateSQL emits FROM ( ... ) without an alias. PostgreSQL requires every derived table in FROM to have an alias, so PostgreSQL aggregate paths will fail at runtime.
Fix: Add an alias after the aggregate subquery, for example ) AS agg, before appending the sort clause.
internal/query/sqlite.go:785
Several PostgreSQL-wired query-engine paths still call e.db.QueryContext / QueryRowContext directly with raw ? placeholders, including message detail helpers and label fetching for list/search results. These bypass dialect rebinding and will fail on PostgreSQL once those paths return rows.
Fix: Route all remaining query-engine SQL with placeholders through e.queryContext / e.queryRowContext.

Medium

internal/query/sqlite.go:187
Query-layer filters still compare boolean columns with integer literals, for example has_attachments = 1. PostgreSQL defines these columns as BOOLEAN, so attachment-only aggregate/list/search/stat filters will error instead of matching rows.
Fix: Use dialect-aware boolean predicates or portable boolean SQL such as has_attachments IS TRUE.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

- sync.go timestamp scans (Issue 1): scanSource and scanSyncRun scanned timestamp columns into sql.NullString and parsed the resulting text through parseDBTime. pgx/v5/stdlib decodes DATE/TIMESTAMP/TIMESTAMPTZ as time.Time at the driver level and rejects conversion to *string, so the Scan call itself failed on PG. Switched to sql.NullTime / time.Time scans — go-sqlite3 also accepts those destinations and parses its stored formats internally, so a single typed path works for both backends. Required-field NULL violations still surface with field-name context via requireNullTime. Removed the now-dead parseNullTime / parseRequiredTime helpers. - api.go timestamp scans (Issue 2): GetMessage and scanMessageRows had the same broken sql.NullString → parseSQLiteTime pattern as sync.go. Switched to sql.NullTime scans for sent_at and deleted_from_source_at. - has_attachments = 1 (Issue 3): PostgreSQL rejects integer comparisons against boolean columns. Added BoolTrueExpr(col) to both Dialect interfaces; SQLite emits "col = 1" (INTEGER 0/1), PG emits the bare column name. Updated all five sites: four in internal/query/sqlite.go (optsToFilterConditions, buildFilterJoinsAndConditions, GetTotalStats, buildSearchQueryParts) and one in internal/store/api.go (SearchMessagesQuery). - TextEngine exposed on PG (Issue 4): internal/query/sqlite_text.go implements TextEngine with FTS5 MATCH and strftime() — SQLite-only. NewEngine returned *SQLiteEngine, which satisfied TextEngine for the PG path too, so TUI type assertions silently succeeded and sent SQLite SQL to PostgreSQL. Introduced an unexported pgEngine wrapper that embeds the Engine *interface* (so TextEngine methods are not promoted) and changed NewEngine/NewPostgreSQLEngine to return Engine. Type-assertion to TextEngine now cleanly fails on PG. - buildFTSExpression FTS5 syntax sent to plainto_tsquery (Issue 5): buildFTSExpression produced "term1" AND "term2" and passed it to plainto_tsquery, which treats the whole string as a literal phrase. Added Dialect.BuildFTSArg(terms) — SQLite returns FTS5 form, PG returns a plain space-separated string (plainto_tsquery handles AND natively). Removed the now-dead buildFTSExpression helper.

roborev-ci · 2026-05-21T14:42:49Z

roborev: Combined Review (`700055a`)

High: PostgreSQL read paths still execute SQLite-style placeholders directly, and Medium: PostgreSQL FTS query generation is inconsistent with its sanitizer.

High

internal/query/sqlite.go:681
NewPostgreSQLEngine routes PostgreSQL through SQLiteEngine, but message-detail paths still call e.db.QueryRowContext / e.db.QueryContext directly with ? placeholders. This affects GetMessage, GetMessageBySourceID, raw-body fallback, participants, labels, attachments, and batched label fetches, so these PostgreSQL queries can fail instead of being rebound to $N.

Fix: Route all SQL execution in the dialect-parameterized engine through e.queryContext / e.queryRowContext, including getMessageByQuery, extractBodyFromRaw, fetchParticipants, fetchLabels, fetchAttachments, and fetchLabelsForMessages. Add PostgreSQL coverage for message detail and list/search result label population.

Medium

internal/query/dialect.go
PostgreSQLQueryDialect.FTSSearchExpression uses plainto_tsquery(), but SanitizeFTSQuery returns formatted tsquery syntax such as foo:* & bar:*. Passing that into plainto_tsquery strips or literalizes the :* and & operators, so prefix/operator search semantics are lost.

Fix: Use to_tsquery('simple', ?) in the PostgreSQL FTS expression.
internal/query/dialect.go (tsqueryEscape)
tsqueryEscape strips whitespace entirely, so a multi-word search like "hello world" becomes helloworld:*, which is not a meaningful match for phrase or token search.

Fix: Preserve or normalize whitespace, then have BuildFTSTerm split multi-word input into tokens and join them with &, or <-> for exact phrase behavior.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

- shared.go helpers used unbound ? placeholders (Findings 6+7): fetchLabelsForMessageList, fetchMessageLabelsDetail, fetchParticipantsShared, fetchAttachmentsShared, extractBodyFromRawShared, getMessageRawShared, and getMessageByQueryShared all built SQL with ? and dispatched directly to *sql.DB. On PG (pgx/v5/stdlib) ? is invalid, so every one of these failed at the driver. Threaded an explicit rebindFunc through each helper; SQLiteEngine callers pass dialect.Rebind, DuckDB callers pass noopRebind (DuckDB and the SQLite mirror both accept ? natively). All sub-calls inside getMessageByQueryShared forwarded. - BeginExclusive only locked sync_runs (Finding 8): SQLite's BEGIN EXCLUSIVE blocks all writers database-wide; RemoveSourceSerialized depends on that to safely cascade-delete a source. The PG implementation locked only sync_runs, leaving a concurrent sync free to keep inserting messages, attachments, labels, recipients, and participants throughout the deletion. Expanded LOCK TABLE to cover the full set of tables a sync writes. - isSQLiteError conflict check in CreateCollection (Finding 9): duplicate-name detection looked for the SQLite "UNIQUE constraint failed" substring, which never matches PG's SQLSTATE 23505. Routed through s.dialect.IsConflictError, which already handles both backends.

roborev-ci · 2026-05-21T15:44:36Z

roborev: Combined Review (`fc01acb`)

Verdict: No Medium, High, or Critical issues were found across the reviews.

All agents reported the code as clean.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-05-21T21:39:28Z

roborev: Combined Review (`fc01acb`)

Verdict: No Medium, High, or Critical issues were reported.

All agents found no actionable issues for this PR.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

- SQLite-only cache pipeline ran on PG DSNs (Bug A): cacheNeedsBuild and buildCache feed sql.Open("sqlite3", dbPath) and dispatch ? placeholders to the raw db.DB(); both fail on PostgreSQL. Guarded the cache block in tui.go and serve.go so PG goes straight to query.NewEngine(db, true). cacheNeedsBuild and the build-cache cobra command now short-circuit when dbPath is a postgres:// URL. Exported store.IsPostgresURL so cmd-side helpers can detect PG DSNs without first opening a Store. - BeginExclusive lock list still incomplete (Bug B): expanded LOCK TABLE to cover every table the sync path writes — sync_runs, sources, conversations, conversation_participants, messages, message_recipients, message_labels, message_bodies, message_raw, attachments, labels, participants, participant_identifiers, reactions — matching the SQLite BEGIN EXCLUSIVE semantics RemoveSourceSerialized depends on. - PST resume validated only by path/inode (Bug C): added ArchiveID to pstCheckpoint and persisted it through savePstCheckpoint. On resume, if the saved fingerprint disagrees with the current PST's fingerprint the importer restarts from FolderIndex=0/MsgIndex=0 with a warning, instead of resuming at an offset that points into different bytes after the file was replaced in place. - PG FTS semantics diverged across code paths (Bug D): the store's BuildFTSArg + FTSSearchClause used plainto_tsquery (no prefix match), while the query engine's PostgreSQLQueryDialect.BuildFTSTerm used to_tsquery with ":*" prefix terms. Searching "invo" in the API path missed "invoice" hits the engine path returned. Aligned the store dialect on to_tsquery and ":*" terms; added a local pgTsqueryEscape helper that mirrors the query-package helper.

roborev-ci · 2026-05-22T04:35:43Z

roborev: Combined Review (`eced034`)

Review verdict: one Medium issue found; no High or Critical findings.

Medium

internal/query/dialect.go:210, internal/store/dialect_pg.go:61
The tsqueryEscape / pgTsqueryEscape logic strips whitespace unconditionally. Multi-word parsed search terms, such as "invoice final" in q.TextTerms, can become a single token like invoicefinal:*, causing PostgreSQL full-text search to miss expected matches.

Suggested fix: split on whitespace, escape each token independently, and join tokens with &, or <-> if exact phrase matching is intended, before applying the :* prefix.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-05-22T04:36:21Z

roborev: Combined Review (`eced034`)

Mixed verdict: one Medium issue should be addressed before merge; no Critical or High findings were reported.

Medium

internal/query/dialect.go:210 (tsqueryEscape) and internal/store/dialect_pg.go:61 (pgTsqueryEscape): The PostgreSQL tsquery escape helpers strip whitespace unconditionally. Multi-word parsed terms, such as a quoted phrase like "invoice final" in q.TextTerms, become a single token (invoicefinal:*) and can fail to match the tsvector index. Split on whitespace, escape each token separately, and join with & or <-> for exact phrase matching before applying prefix matching.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

Two leftover divergences from the prior PG FTS fix (Bug D): - store.SQLiteDialect.BuildFTSArg produced "term1" AND "term2" without a trailing "*", while query.SQLiteQueryDialect.BuildFTSTerm produced "term"* (prefix-match) space-joined. So on SQLite, the API search path missed prefix-match hits that the engine deep-search path returned — searching "invo" matched "invoice" only via the engine. Aligned BuildFTSArg on the same shape: quote-escape, strip embedded "*" to protect the operator, suffix with "*", space-join (FTS5 treats space as implicit AND). - query.PostgreSQLQueryDialect.FTSSearchExpression still returned plainto_tsquery — same shape as the bug Finding 5 / Bug D fixed on the store side. The method currently has no callers, but it sits next to BuildFTSTerm/SanitizeFTSQuery (both to_tsquery + :*) and is a trap for the next person who wires it. Switched to to_tsquery so all three helpers in the package emit compatible WHERE/argument shapes.

X'00' is SQLite hex-literal syntax that PostgreSQL doesn't parse, and PG TEXT rejects embedded NUL bytes outright, so the phase-1 rename step in EnsureLabelsBatch failed on PG before it could place the temporary name. Replaced the SQL-side X'00' construction with a sentinel built in Go and bound as a parameter: "\x01__msgvault_pending_rename__<id>". The leading SOH (U+0001) is a control character Gmail's UI does not permit in label names, so collision with any real label in the same source remains impossible — phase 2 overwrites the temp name within the same transaction, so the sentinel is never observable to readers.

Audit pass found four narrow-scope gaps. Tests are clean (SQLite-only paths open a file directly and never run against PG; PG-aware tests use Rebind; FTS5 vtable tests are SkipIfPostgres-guarded). - Schema parity: schema.sql had two indexes that schema_pg.sql lacked. Added idx_attachments_storage_path and idx_collection_sources_source_id so PG matches the SQLite catalogue. Without them, attachment storage- path lookups and per-source collection membership reads do full scans on PG. - verify: runIntegrityCheck unconditionally ran "PRAGMA integrity_check" which PG cannot parse. Returns (nil, nil) on PG and the verify cobra command skips the printout there. PG corruption checks live in external tooling (pg_amcheck) and aren't this CLI's job. - deduplicate: backupDatabase used SQLite's VACUUM INTO. Refuse with a clear message on PG that points the user at pg_dump and --no-backup, instead of failing later with a confusing syntax error.

roborev-ci · 2026-05-22T05:36:35Z

roborev: Combined Review (`19eaf27`)

Summary verdict: One Medium issue found; no Critical or High findings.

Medium

internal/store/api.go (SearchMessagesQuery) and internal/store/dialect_pg.go (BuildFTSArg)
- If q.TextTerms contains only punctuation or metacharacters stripped by pgTsqueryEscape, BuildFTSArg returns an empty string. Passing that to to_tsquery('simple', ?) causes PostgreSQL to error with “text-search query doesn't contain lexemes,” failing the search request.
- Suggested fix: check ftsExpr == "" before appending FTS conditions, or handle the empty fallback inside BuildFTSArg / FTSSearchClause, matching the query package behavior that emits FALSE for empty FTS terms.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

External audit flagged nine call sites that dispatched ? placeholders straight to s.DB() without going through the dialect's Rebind layer. On PostgreSQL (pgx/v5/stdlib) ? is not a valid placeholder, so every one of these fails at the driver. - internal/whatsapp/importer.go: rebind the existence check, the attachment-metadata UPDATE, lookupMessageByKeyID, and setReplyTo. The "clear attachment flags" UPDATE additionally hard-coded has_attachments = 0, which PG's BOOLEAN column rejects; switched to has_attachments = ? bound to a Go bool, matching the pattern already in internal/sync/sync.go and internal/importer/ingest.go so the driver emits the right literal for each backend. - internal/imessage/client.go: rebind the group-chat title COUNT and SELECT in buildGroupTitle and the UPDATE that stores the generated title. The literal '?' inside the SELECT's COALESCE is inside single quotes, so Rebind (which tracks quote state) leaves it intact. - internal/sync/sync.go and internal/importer/ingest.go: rebind the "recount stored attachments" SELECT and the corrective UPDATE. These were the canonical example of the Go-bool pattern; they just weren't rebinding their own SQL. No dialect branching needed — Rebind is a no-op on SQLite and only rewrites ? to $N on PG. go clean -testcache && make test green.

External review flagged that BuildFTSArg on PG returns "" when every input term reduces to nothing usable under tsquery escaping (e.g. "!!!", "---", ""). Passing that empty string to to_tsquery('simple', $1) raises "text-search query doesn't contain lexemes". SQLite has the analogous problem: FTS5 MATCH on a phrase that tokenizes to nothing is also a syntax error, and the previous SQLite BuildFTSArg happily produced "\"!!!\"*" → driver error. Fix is a two-part dialect-agnostic pattern matching the query package's (expr="FALSE", arg="") fallback in BuildFTSTerm: - SQLiteDialect.BuildFTSArg now drops terms that contain no Unicode letter or digit (default FTS5 tokenizer emits nothing for them). Returns "" if all terms drop, mirroring the PG implementation. Added hasFTSToken helper to make the rule explicit. - SearchMessagesQuery checks if BuildFTSArg returned "" and, if so, appends a "FALSE" condition instead of dispatching the dialect's FTSSearchClause. Zero rows come back from either backend without ever feeding the FTS function a malformed argument. Added TestSearchMessagesQuery_TokenlessTextTerms covering "!!!", "---", "", and a mix — runs under both SQLite and PostgreSQL via storetest.New (which uses testutil.NewTestStore). The test also runs a real-term baseline search so a regression that silently dropped the FTS predicate (returning everything) would fail visibly.

roborev-ci · 2026-05-22T06:39:53Z

roborev: Combined Review (`82c0648`)

High-level verdict: not clean; two actionable findings remain, with one high-risk SQLite regression.

High

internal/store/api.go:116: GetMessage and scanMessageRows now scan COALESCE(m.sent_at, m.received_at, m.internal_date) into sql.NullTime. On SQLite, this expression may not retain a declared DATETIME type, so the driver can return a string that sql.NullTime cannot scan. This risks breaking default SQLite reads/searches that previously parsed these values from sql.NullString.

Suggested fix: Keep expression timestamps scanned as sql.NullString on SQLite, or add a helper that accepts both time.Time and string values before assigning to the API model.

Medium

internal/store/api.go:279: SearchMessages still passes the raw user query string into FTSSearchClause, but PostgreSQL now uses to_tsquery, which expects a tsquery-formatted argument like foo:* & bar:*. Multi-word or punctuation-heavy queries can error and fall back to LIKE-only search, losing body FTS results and prefix-match parity.

Suggested fix: Build the FTS argument through the dialect before binding in SearchMessages, or keep a separate PostgreSQL clause for raw user text using plainto_tsquery / websearch_to_tsquery.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

… fixes External review batch. BLOCKER kenn-io#9 — verify silently skipped its integrity check on PG. - Updated cmd/msgvault/cmd/verify.go to print "Skipping database integrity check (PostgreSQL — use pg_amcheck out-of-band)." instead of silently swallowing the step. Rewrote the cobra Long description so both the SQLite branch ("runs PRAGMA integrity_check") and the PG branch ("prints a notice…") are visible to --help readers. Kept the IsPostgreSQL() guard inside runIntegrityCheck as belt-and-braces for any future caller. IMPORTANT kenn-io#4 — legacy SearchMessages bypassed BuildFTSArg. - internal/store/api.go SearchMessages bound the raw user query string straight into FTSSearchClause's placeholder. On PG that fed to_tsquery un-escaped input; multi-word or punctuated queries (the norm) errored at the parser. On SQLite raw FTS5 metacharacters in user input would reach MATCH the same way. Now SearchMessages splits on whitespace and delegates to SearchMessagesQuery so the dialect's BuildFTSArg sanitizes per backend and the just-landed FALSE-fallback handles tokenless inputs uniformly. Whitespace-only input short-circuits to zero hits before invoking the FTS pipeline. NIT kenn-io#1, kenn-io#14 — dialect comments drifted from current behavior. - internal/store/dialect.go BuildFTSArg interface comment now describes the actual prefix-match output shape on both backends and the empty-fallback contract. - internal/store/dialect_pg.go BuildFTSArg comment trimmed of the past-tense plainto_tsquery history; now describes only the current to_tsquery shape and the empty-fallback rule. Tests - TestSearchMessages_LegacyRawString covers multi-word, single-word, pure punctuation, pure dashes, whitespace-only, and mixed punctuation inputs to the legacy entrypoint. Runs on both SQLite and PostgreSQL via storetest.New / testutil.NewTestStore.

External review caught three SQL shapes in internal/query/sqlite.go (which backs both SQLite and PostgreSQL via the QueryDialect interface) that PG rejects at parse time, breaking TUI startup and API aggregate / deletion-staging endpoints when the store is PG. HIGH kenn-io#1 — buildAggregateSQL emitted `FROM ( <inner SELECT> )` with no alias on the outer derived table. PG requires every derived table in FROM to be aliased ("syntax error at or near ')'"). Added `AS agg`; SQLite tolerates the explicit alias unchanged. HIGH kenn-io#2 — GetGmailIDsByFilter used SELECT DISTINCT … ORDER BY m.sent_at DESC, m.id DESC. PG rejects DISTINCT when the ORDER BY expressions are not in the SELECT list, and the multiplying joins to message_recipients / message_labels were why DISTINCT was there in the first place. Converted every multiplicative filter (sender, sender name, recipient, recipient name, domain, label) into an EXISTS subquery — per the SQL guidance in CLAUDE.md — so the join graph no longer multiplies, DISTINCT is unnecessary, and the ORDER BY stands on its own. The sources scope (1:1 with messages) stays as a JOIN. MEDIUM kenn-io#3 — ListMessages's ORDER BY referenced `m.size_estimate` and `m.subject` raw while the SELECT exposed them through `COALESCE(m.size_estimate, 0)` and `COALESCE(m.subject, '')`. Under SELECT DISTINCT, PG insists the ORDER BY expressions textually match something in the SELECT list. Switched the size and subject sorts to the COALESCE expressions; the date sort already matched. Test - internal/query/pg_compat_test.go exercises Aggregate (ViewSenders), GetGmailIDsByFilter with a Label filter (the previously multiplying join, now EXISTS) and verifies no duplicate IDs come back, and ListMessages with every supported sort field. Uses testutil.NewTestStore, so MSGVAULT_TEST_DB=postgres://… runs the same suite against PG and catches regressions of these three bug shapes.

roborev-ci · 2026-05-22T07:39:20Z

roborev: Combined Review (`3d48ca7`)

Review blocked for one agent due to diff access; no code issues were reported by agents that reviewed the diff.

High

Environment/Tooling
- Problem: Gemini could not review because the diff file /tmp/roborev-snapshot-393231413/roborev-snapshot-content.diff resolved outside the allowed workspace directories.
- Fix: Move the diff file into an allowed workspace directory or include the diff content directly in the prompt.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

roborev-ci · 2026-05-22T07:40:03Z

roborev: Combined Review (`3d48ca7`)

No actionable code findings were reported; security review found no exploitable issues.

The only high-severity item was a tooling/sandbox access failure from one agent, not a defect in the PR code, so it is omitted from the PR findings.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

webgress and others added 11 commits May 20, 2026 23:21

webgress added 3 commits May 22, 2026 04:42

webgress added 2 commits May 22, 2026 06:28

webgress added 2 commits May 22, 2026 06:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PostgreSQL dialect refactor - PR3 of 4 - Functional store layer#328

PostgreSQL dialect refactor - PR3 of 4 - Functional store layer#328
webgress wants to merge 21 commits into
kenn-io:mainfrom
webgress:pr3-upstream

webgress commented May 20, 2026

Uh oh!

roborev-ci Bot commented May 21, 2026

Uh oh!

roborev-ci Bot commented May 21, 2026

Uh oh!

roborev-ci Bot commented May 21, 2026

Uh oh!

roborev-ci Bot commented May 21, 2026

Uh oh!

roborev-ci Bot commented May 22, 2026

Uh oh!

roborev-ci Bot commented May 22, 2026

Uh oh!

roborev-ci Bot commented May 22, 2026

Uh oh!

roborev-ci Bot commented May 22, 2026

Uh oh!

roborev-ci Bot commented May 22, 2026

Uh oh!

roborev-ci Bot commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

webgress commented May 20, 2026

Summary

Test plan

Deferred to PR4

Uh oh!

roborev-ci Bot commented May 21, 2026

roborev: Combined Review (c8db376)

High

Medium

Uh oh!

roborev-ci Bot commented May 21, 2026

roborev: Combined Review (700055a)

High

Medium

Uh oh!

roborev-ci Bot commented May 21, 2026

roborev: Combined Review (fc01acb)

Uh oh!

roborev-ci Bot commented May 21, 2026

roborev: Combined Review (fc01acb)

Uh oh!

roborev-ci Bot commented May 22, 2026

roborev: Combined Review (eced034)

Medium

Uh oh!

roborev-ci Bot commented May 22, 2026

roborev: Combined Review (eced034)

Medium

Uh oh!

roborev-ci Bot commented May 22, 2026

roborev: Combined Review (19eaf27)

Medium

Uh oh!

roborev-ci Bot commented May 22, 2026

roborev: Combined Review (82c0648)

High

Medium

Uh oh!

roborev-ci Bot commented May 22, 2026

roborev: Combined Review (3d48ca7)

High

Uh oh!

roborev-ci Bot commented May 22, 2026

roborev: Combined Review (3d48ca7)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

roborev: Combined Review (`c8db376`)

roborev: Combined Review (`700055a`)

roborev: Combined Review (`fc01acb`)

roborev: Combined Review (`fc01acb`)

roborev: Combined Review (`eced034`)

roborev: Combined Review (`eced034`)

roborev: Combined Review (`19eaf27`)

roborev: Combined Review (`82c0648`)

roborev: Combined Review (`3d48ca7`)

roborev: Combined Review (`3d48ca7`)