Skip to content

perf: replace window-function retrieval with UNION ALL + per-bank HNSW indexes#541

Merged
nicoloboschi merged 5 commits intomainfrom
fix/hnsw-union-all-retrieval
Mar 11, 2026
Merged

perf: replace window-function retrieval with UNION ALL + per-bank HNSW indexes#541
nicoloboschi merged 5 commits intomainfrom
fix/hnsw-union-all-retrieval

Conversation

@nicoloboschi
Copy link
Collaborator

Problem

retrieve_semantic_bm25_combined() used ROW_NUMBER() OVER (PARTITION BY fact_type ORDER BY embedding <=> $1) to get top-N results per fact type. This forced a full sequential scan — pgvector cannot use HNSW indexes when a window function partitions on the ordering column.

Closes #539.

Solution

1. UNION ALL retrieval (retrieval.py)

Replace the window-function CTE with a UNION ALL of per-fact_type subqueries. Each arm has its own ORDER BY embedding <=> $1 LIMIT n, which gives the query planner a shape it can satisfy with an HNSW index scan. Semantic arms over-fetch 5× (min 100) to compensate for HNSW approximation loss; results are trimmed to limit in Python. A single connection/round-trip is preserved (no parallel connections per recall unlike the approach in PR #540).

2. hnsw.ef_search=200 at pool init (memory_engine.py)

Set once per connection via asyncpg.create_pool(init=...) — no per-query SET/RESET overhead. Gracefully ignored if the extension doesn't support it.

3. Per-(bank, fact_type) partial HNSW indexes (bank_utils.py, migrations)

Two root causes prevented the planner from using existing indexes:

  • fact_type-only partial indexes: the idx_memory_units_bank_id B-tree always wins when bank_id is in the WHERE clause.
  • Global HNSW index (idx_memory_units_embedding): competes with per-fact_type partials for larger partitions (world, observation) and wins.

Fix: create per-(bank_id, fact_type) partial HNSW indexes (both predicates in the WHERE clause) and drop the global index. Index naming uses a stable internal_id UUID UNIQUE column added to the banks table.

Lifecycle:

  • Created at bank-creation time (create_bank_hnsw_indexes)
  • Dropped on bank deletion (drop_bank_hnsw_indexes, combined with DELETE ... RETURNING to avoid an extra round-trip)

4. Migrations

  • a3b4c5d6e7f8: interim fact_type-only partial HNSW indexes
  • d5e6f7a8b9c0: adds internal_id UUID UNIQUE to banks, replaces fact_type-only indexes with per-(bank, fact_type) partials, drops global HNSW index, backfills indexes for existing banks

Test plan

  • Run cd hindsight-api && uv run pytest tests/ -v — all tests pass
  • Run migration on a test DB: uv run hindsight-admin run-db-migration
  • Verify with EXPLAIN (ANALYZE, BUFFERS) on a bank with 20K+ rows that the planner selects Index Scan using idx_mu_emb_worl_<uid> (not Seq Scan) for each UNION ALL arm
  • Run recall performance benchmark: ./scripts/benchmarks/run-retain-perf.sh and confirm latency improvement vs baseline

…W indexes

The previous retrieve_semantic_bm25_combined() used ROW_NUMBER() OVER (PARTITION
BY fact_type ...) which forced a full sequential scan — pgvector cannot use HNSW
indexes when a window function partitions on the same column as the ORDER BY.

Changes:
- retrieval.py: rewrite to UNION ALL of per-fact_type subqueries; each arm has
  its own ORDER BY embedding <=> $1 LIMIT n, enabling partial HNSW index scans.
  Semantic arms over-fetch 5x (min 100) for HNSW approximation; trimmed in Python.
- memory_engine.py: set hnsw.ef_search=200 at pool init (persistent per-connection,
  no per-query SET/RESET overhead).
- bank_utils.py: add create_bank_hnsw_indexes / drop_bank_hnsw_indexes for
  per-(bank_id, fact_type) partial HNSW index lifecycle management.
- fact_storage.py / bank_utils.py: create per-bank indexes on fresh bank insert.
- memory_engine.py delete_bank: drop per-bank indexes via DELETE...RETURNING to
  avoid a separate round-trip.
- Migration a3b4c5d6e7f8: add interim fact_type-only partial indexes.
- Migration d5e6f7a8b9c0: add internal_id UUID UNIQUE to banks, replace
  fact_type-only indexes with per-(bank, fact_type) partial HNSW indexes, drop
  the global idx_memory_units_embedding that competed with them.

Why per-(bank, fact_type) not just per-fact_type:
The idx_memory_units_bank_id B-tree index always wins over fact_type-only partial
indexes when bank_id appears in the WHERE clause. Including bank_id in the partial
index predicate removes the B-tree from consideration and lets the planner choose
HNSW. The global HNSW index must also be dropped to avoid competing for the larger
fact_type partitions (world, observation).
Instead of relying on DEFAULT gen_random_uuid() and RETURNING internal_id,
generate the UUID in application code before the INSERT. This means we
always know the value upfront and can call create_bank_hnsw_indexes
immediately without needing a DB round-trip to retrieve the assigned ID.

Also adds tests for HNSW index lifecycle and retrieve_semantic_bm25_combined.
Migration fixes:
- Add text() wrappers for raw SQL in d5e6f7a8b9c0 (SQLAlchemy 2.0 compat)
- Drop stale fact_type-only partial indexes (idx_mu_emb_world/observation/experience)
  that may exist from prior migrations on the same DB

migrations.py fix:
- Skip global HNSW index creation when per-bank partial HNSW indexes already
  exist on memory_units (idx_mu_emb_* pattern). Without this, the post-migration
  vector index check detects no %embedding% named index and recreates the global
  idx_memory_units_embedding, which defeats the per-bank index strategy.

Verified with EXPLAIN ANALYZE on 66K-row bank: all three fact_type arms use
their per-bank HNSW index scan (idx_mu_emb_worl/expr/obsv_<uid16>).
@nicoloboschi nicoloboschi merged commit 43b3efc into main Mar 11, 2026
35 of 37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

perf: window function in retrieve_semantic_bm25_combined() prevents HNSW index use

1 participant