perf: replace window-function retrieval with UNION ALL + per-bank HNSW indexes by nicoloboschi · Pull Request #541 · vectorize-io/hindsight

nicoloboschi · 2026-03-11T10:17:29Z

Problem

retrieve_semantic_bm25_combined() used ROW_NUMBER() OVER (PARTITION BY fact_type ORDER BY embedding <=> $1) to get top-N results per fact type. This forced a full sequential scan — pgvector cannot use HNSW indexes when a window function partitions on the ordering column.

Closes #539.

Solution

1. UNION ALL retrieval (`retrieval.py`)

Replace the window-function CTE with a UNION ALL of per-fact_type subqueries. Each arm has its own ORDER BY embedding <=> $1 LIMIT n, which gives the query planner a shape it can satisfy with an HNSW index scan. Semantic arms over-fetch 5× (min 100) to compensate for HNSW approximation loss; results are trimmed to limit in Python. A single connection/round-trip is preserved (no parallel connections per recall unlike the approach in PR #540).

2. `hnsw.ef_search=200` at pool init (`memory_engine.py`)

Set once per connection via asyncpg.create_pool(init=...) — no per-query SET/RESET overhead. Gracefully ignored if the extension doesn't support it.

3. Per-(bank, fact_type) partial HNSW indexes (`bank_utils.py`, migrations)

Two root causes prevented the planner from using existing indexes:

fact_type-only partial indexes: the idx_memory_units_bank_id B-tree always wins when bank_id is in the WHERE clause.
Global HNSW index (idx_memory_units_embedding): competes with per-fact_type partials for larger partitions (world, observation) and wins.

Fix: create per-(bank_id, fact_type) partial HNSW indexes (both predicates in the WHERE clause) and drop the global index. Index naming uses a stable internal_id UUID UNIQUE column added to the banks table.

Lifecycle:

Created at bank-creation time (create_bank_hnsw_indexes)
Dropped on bank deletion (drop_bank_hnsw_indexes, combined with DELETE ... RETURNING to avoid an extra round-trip)

4. Migrations

a3b4c5d6e7f8: interim fact_type-only partial HNSW indexes
d5e6f7a8b9c0: adds internal_id UUID UNIQUE to banks, replaces fact_type-only indexes with per-(bank, fact_type) partials, drops global HNSW index, backfills indexes for existing banks

Test plan

Run cd hindsight-api && uv run pytest tests/ -v — all tests pass
Run migration on a test DB: uv run hindsight-admin run-db-migration
Verify with EXPLAIN (ANALYZE, BUFFERS) on a bank with 20K+ rows that the planner selects Index Scan using idx_mu_emb_worl_<uid> (not Seq Scan) for each UNION ALL arm
Run recall performance benchmark: ./scripts/benchmarks/run-retain-perf.sh and confirm latency improvement vs baseline

…W indexes The previous retrieve_semantic_bm25_combined() used ROW_NUMBER() OVER (PARTITION BY fact_type ...) which forced a full sequential scan — pgvector cannot use HNSW indexes when a window function partitions on the same column as the ORDER BY. Changes: - retrieval.py: rewrite to UNION ALL of per-fact_type subqueries; each arm has its own ORDER BY embedding <=> $1 LIMIT n, enabling partial HNSW index scans. Semantic arms over-fetch 5x (min 100) for HNSW approximation; trimmed in Python. - memory_engine.py: set hnsw.ef_search=200 at pool init (persistent per-connection, no per-query SET/RESET overhead). - bank_utils.py: add create_bank_hnsw_indexes / drop_bank_hnsw_indexes for per-(bank_id, fact_type) partial HNSW index lifecycle management. - fact_storage.py / bank_utils.py: create per-bank indexes on fresh bank insert. - memory_engine.py delete_bank: drop per-bank indexes via DELETE...RETURNING to avoid a separate round-trip. - Migration a3b4c5d6e7f8: add interim fact_type-only partial indexes. - Migration d5e6f7a8b9c0: add internal_id UUID UNIQUE to banks, replace fact_type-only indexes with per-(bank, fact_type) partial HNSW indexes, drop the global idx_memory_units_embedding that competed with them. Why per-(bank, fact_type) not just per-fact_type: The idx_memory_units_bank_id B-tree index always wins over fact_type-only partial indexes when bank_id appears in the WHERE clause. Including bank_id in the partial index predicate removes the B-tree from consideration and lets the planner choose HNSW. The global HNSW index must also be dropped to avoid competing for the larger fact_type partitions (world, observation).

Instead of relying on DEFAULT gen_random_uuid() and RETURNING internal_id, generate the UUID in application code before the INSERT. This means we always know the value upfront and can call create_bank_hnsw_indexes immediately without needing a DB round-trip to retrieve the assigned ID. Also adds tests for HNSW index lifecycle and retrieve_semantic_bm25_combined.

Migration fixes: - Add text() wrappers for raw SQL in d5e6f7a8b9c0 (SQLAlchemy 2.0 compat) - Drop stale fact_type-only partial indexes (idx_mu_emb_world/observation/experience) that may exist from prior migrations on the same DB migrations.py fix: - Skip global HNSW index creation when per-bank partial HNSW indexes already exist on memory_units (idx_mu_emb_* pattern). Without this, the post-migration vector index check detects no %embedding% named index and recreates the global idx_memory_units_embedding, which defeats the per-bank index strategy. Verified with EXPLAIN ANALYZE on 66K-row bank: all three fact_type arms use their per-bank HNSW index scan (idx_mu_emb_worl/expr/obsv_<uid16>).

nicoloboschi added 5 commits March 11, 2026 11:17

refactor: collapse two HNSW migrations into one

d5a1439

fix: use correct embeddings.encode() in test

bb6e613

nicoloboschi merged commit 43b3efc into main Mar 11, 2026
35 of 37 checks passed

nicoloboschi mentioned this pull request Mar 11, 2026

fix(retrieval): replace window function with per-fact_type HNSW queries #540

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: replace window-function retrieval with UNION ALL + per-bank HNSW indexes#541

perf: replace window-function retrieval with UNION ALL + per-bank HNSW indexes#541
nicoloboschi merged 5 commits intomainfrom
fix/hnsw-union-all-retrieval

nicoloboschi commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nicoloboschi commented Mar 11, 2026

Problem

Solution

1. UNION ALL retrieval (retrieval.py)

2. hnsw.ef_search=200 at pool init (memory_engine.py)

3. Per-(bank, fact_type) partial HNSW indexes (bank_utils.py, migrations)

4. Migrations

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. UNION ALL retrieval (`retrieval.py`)

2. `hnsw.ef_search=200` at pool init (`memory_engine.py`)

3. Per-(bank, fact_type) partial HNSW indexes (`bank_utils.py`, migrations)