Skip to content

store: weight SQLite FTS5 bm25 to mirror PostgreSQL setweight ranking#337

Merged
wesm merged 1 commit into
kenn-io:mainfrom
webgress:pr4c-upstream
May 25, 2026
Merged

store: weight SQLite FTS5 bm25 to mirror PostgreSQL setweight ranking#337
wesm merged 1 commit into
kenn-io:mainfrom
webgress:pr4c-upstream

Conversation

@webgress
Copy link
Copy Markdown
Contributor

This is PR4c of what was going to be 4 PRs. The main larger PR4 for PostgreSQL capability is pending PR3 merge.
This one is smaller and independent.

Summary

  • SQLiteDialect.FTSSearchClause now orders results by bm25(messages_fts, 1.0, 10.0, 1.0, 4.0, 1.0, 1.0) so subject matches outrank
    sender matches outrank body/recipient matches — matching what PostgreSQL already does with setweight 'A'=1.0 / 'B'=0.4 / 'D'=0.1.
  • Adds two cross-backend tests: TestFTSRankWeightsAcrossBackends (single-token rank attribution) and TestFTSRankParityFixture
    (multi-query fixture suite that logs ordering per backend).
  • Updates docs/PG_STATUS.md to mark FTS rank ordering as partially resolved (intra-class tie-breaking still differs because bm25 and
    ts_rank are different scorer functions).

@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 24, 2026

roborev: Combined Review (47fcbf8)

High: PR has a build-breaking import path issue in new tests.

High

  • internal/store/fts_rank_parity_test.go L8-L9 and internal/store/fts_rank_test.go L7-L8
    The new test files import packages using go.kenn.io/msgvault, but this repository’s go.mod module path is github.com/wesm/msgvault. This will cause build/test failures in the current repo.

    Fix: Update the imports in both test files to use github.com/wesm/msgvault.

No Medium or Critical findings were reported.


Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@webgress
Copy link
Copy Markdown
Contributor Author

roborev: Combined Review (47fcbf8)

High: PR has a build-breaking import path issue in new tests.

High

  • internal/store/fts_rank_parity_test.go L8-L9 and internal/store/fts_rank_test.go L7-L8
    The new test files import packages using go.kenn.io/msgvault, but this repository’s go.mod module path is github.com/wesm/msgvault. This will cause build/test failures in the current repo.
    Fix: Update the imports in both test files to use github.com/wesm/msgvault.

No Medium or Critical findings were reported.

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

This "high" finding is a false positive — it claims go.mod is
github.com/wesm/msgvault, but PR #336 (commit eabce62, merged
2026-05-22) renamed the module to go.kenn.io/msgvault. This PR's
base (a3e6038) is one commit after that rename; new test files
match the current module path and pass in CI.

The failing test check is a pre-existing govulncheck failure on
golang.org/x/net v0.54.0 (5 CVEs fixed in v0.55.0). The same check
is red on main since 2026-05-22 — not caused by this PR. PR #328
already bumps to v0.55.0; this check goes green here automatically
once #328 lands.

SearchMessages previously ranked SQLite results with the default bm25
(every column equal) while PostgreSQL applies setweight 'A' to subject
and 'B' to sender. The two backends returned the same rows in different
orders.

SQLiteDialect.FTSSearchClause now orders by
  bm25(messages_fts, 1.0, 10.0, 1.0, 4.0, 1.0, 1.0)
positional over every declared FTS5 column (the leading slot is the
UNINDEXED message_id). The 10:4:1 ratio across subject/sender/body
matches PG's 'A'=1.0 / 'B'=0.4 / 'D'=0.1, so subject-only matches
outrank sender-only matches, which outrank body-only matches on both
backends. bm25 and ts_rank remain different scoring functions, so
intra-class tie-breaking can still diverge — documented as a known
difference in PG_STATUS.md.

Adds TestFTSRankWeightsAcrossBackends, a cross-backend test that seeds
the search token in exactly one FTS column per row and asserts the
subject > sender > body ordering on whichever backend NewTestStore
resolves to.

Additional commits squashed:
* store: add multi-query FTS rank parity fixture test
* store: adapt FTS rank parity test import path for upstream module path

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wesm wesm force-pushed the pr4c-upstream branch from 47fcbf8 to 265a6e7 Compare May 25, 2026 02:25
@roborev-ci
Copy link
Copy Markdown

roborev-ci Bot commented May 25, 2026

roborev: Combined Review (265a6e7)

PR needs a small test import fix before merge.

Medium

  • internal/store/fts_rank_test.go:6
    New test imports use go.kenn.io/msgvault/..., but this repo’s go.mod and existing imports use github.com/wesm/msgvault/.... This can break local test compilation or make the new tests inconsistent with the module path.
    Fix the new imports in fts_rank_test.go and fts_rank_parity_test.go to use:
    • github.com/wesm/msgvault/internal/store
    • github.com/wesm/msgvault/internal/testutil

Synthesized from 3 reviews (agents: codex, gemini | types: default, security)

@wesm
Copy link
Copy Markdown
Member

wesm commented May 25, 2026

false positive

@wesm wesm merged commit 7600457 into kenn-io:main May 25, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants