Skip to content

vector storage performance

Anthony Bible edited this page Nov 21, 2025 · 1 revision

Vector Storage Performance Guide

This guide covers performance optimizations for the vector storage system, including pgvector iterative scanning, metadata filtering, and indexing strategies for optimal query performance.

Table of Contents

  1. Overview
  2. pgvector Iterative Scanning
  3. Metadata Filtering Performance
  4. Indexing Strategy
  5. Query Performance Tuning
  6. Benchmark Results
  7. Monitoring and Debugging
  8. Best Practices

Overview

The vector storage system is optimized for high-performance semantic search with selective filtering. Key optimizations include:

  • pgvector 0.8.0+ iterative scanning ensures complete result sets even with selective filters
  • SQL-level metadata filtering eliminates application-level filtering overhead
  • Strategic indexing provides 2-3x performance improvements for filtered searches
  • Denormalized metadata in partitioned tables for efficient WHERE clause filtering

pgvector Iterative Scanning

The Overfiltering Problem

Traditional vector search with approximate nearest neighbor (ANN) indexes often returns fewer results than requested when filters are applied:

-- Traditional approach: May return only 3 results instead of requested 20
SELECT e.id, e.embedding <=> $1::vector as distance
FROM codechunking.embeddings_partitioned e
WHERE e.repository_id = $2 AND e.language = 'go'
ORDER BY e.embedding <=> $1::vector
LIMIT 20;  -- Requested 20, might get only 3

Iterative Scanning Solution

pgvector 0.8.0+ iterative scanning automatically expands the search scope when filters reduce result count:

SET LOCAL hnsw.iterative_scan = 'relaxed_order';

-- Now guaranteed to return up to 20 results if they exist
SELECT e.id, e.embedding <=> $1::vector as distance
FROM codechunking.embeddings_partitioned e
WHERE e.repository_id = $2 AND e.language = 'go'
ORDER BY e.embedding <=> $1::vector
LIMIT 20;  -- Will return up to 20 matching results

Iterative Scan Modes

Mode Description Use Case Performance
off Disabled (default pgvector behavior) Non-critical queries, simple filters Fastest
relaxed_order Approximate ordering with high recall Recommended for most use cases Fast
strict_order Exact distance ordering Benchmarks, ordering-critical queries Slower

Configuration

search:
  iterative_scan_mode: relaxed_order  # Recommended default
  # iterative_scan_mode: strict_order  # For exact ordering
  # iterative_scan_mode: off          # Disable iterative scanning

Environment variable override:

export CODECHUNK_SEARCH_ITERATIVE_SCAN_MODE=relaxed_order

Metadata Filtering Performance

SQL-Level vs Application-Level Filtering

✅ SQL-Level Filters (High Performance)

These filters use WHERE clauses and benefit from iterative scanning:

Filter SQL Implementation Performance Impact
Repository WHERE e.repository_id = $1 Minimal + iterative scanning benefits
Language WHERE e.language = 'go' Minimal + iterative scanning benefits
Chunk Type WHERE e.chunk_type = 'function' Minimal + iterative scanning benefits
File Extension WHERE e.file_path LIKE '%.go' Minimal + iterative scanning benefits

⚠️ Application-Level Filters (Lower Performance)

These filters are applied after the vector search results are retrieved:

Filter Implementation Performance Impact
Entity Name Post-processing filtering Moderate impact
Visibility Post-processing filtering Low impact

Performance Comparison

Real-world performance with 1M embeddings:

Scenario No Filter App-Level Filter SQL-Level Filter Improvement
10% match language 45ms 180ms 65ms 2.8x faster
5% match chunk type 45ms 250ms 75ms 3.3x faster
Combined filters 45ms 320ms 90ms 3.6x faster

Example Query with SQL-Level Filtering

{
  "query": "authentication middleware",
  "repository_ids": ["auth-service"],
  "languages": ["go"],
  "types": ["function", "method"],
  "file_types": [".go"],
  "limit": 20
}

Generated SQL:

SET LOCAL hnsw.iterative_scan = 'relaxed_order';

SELECT e.id, e.chunk_id, e.repository_id, e.embedding, e.model_version,
       e.created_at, e.deleted_at, e.language, e.chunk_type, e.file_path,
       1 - (e.embedding <=> $1::vector) AS similarity,
       (e.embedding <=> $1::vector) AS distance,
       ROW_NUMBER() OVER (ORDER BY (1 - (e.embedding <=> $1::vector)) DESC) as rank
FROM codechunking.embeddings_partitioned e
WHERE e.deleted_at IS NULL
  AND e.repository_id IN ($2, $3)
  AND e.language IN ($4, $5)
  AND e.chunk_type IN ($6, $7)
  AND (e.file_path LIKE $8 OR e.file_path LIKE $9)
ORDER BY (1 - (e.embedding <=> $1::vector)) DESC
LIMIT 20;

Indexing Strategy

Partitioned Table Indexes

The embeddings_partitioned table uses strategic indexing for optimal performance:

Single-Column Indexes

-- Language filtering
CREATE INDEX idx_embeddings_partitioned_language
  ON codechunking.embeddings_partitioned(language)
  WHERE deleted_at IS NULL;

-- Chunk type filtering
CREATE INDEX idx_embeddings_partitioned_chunk_type
  ON codechunking.embeddings_partitioned(chunk_type)
  WHERE deleted_at IS NULL;

Composite Indexes

-- Repository + Language (common pattern: "Go code in specific repo")
CREATE INDEX idx_embeddings_partitioned_repo_lang
  ON codechunking.embeddings_partitioned(repository_id, language)
  WHERE deleted_at IS NULL;

-- Repository + Chunk Type (common pattern: "Functions in specific repo")
CREATE INDEX idx_embeddings_partitioned_repo_type
  ON codechunking.embeddings_partitioned(repository_id, chunk_type)
  WHERE deleted_at IS NULL;

-- Language + Chunk Type (cross-repository filtering)
CREATE INDEX idx_embeddings_partitioned_lang_type
  ON codechunking.embeddings_partitioned(language, chunk_type)
  WHERE deleted_at IS NULL;

Three-Column Composite Index

-- Repository + Language + Chunk Type (highly selective queries)
CREATE INDEX idx_embeddings_partitioned_repo_lang_type
  ON codechunking.embeddings_partitioned(repository_id, language, chunk_type)
  WHERE deleted_at IS NULL;

Index Selection Strategy

The query planner automatically selects the most efficient index based on the provided filters:

  • Single filter: Uses corresponding single-column index
  • Two filters: Uses appropriate composite index
  • Three filters: Uses three-column composite index
  • No filters: Uses HNSW vector index only

Vector Index

The primary vector index remains unchanged:

CREATE INDEX idx_embeddings_partitioned_vector
  ON codechunking.embeddings_partitioned
  USING hnsw (embedding vector_cosine_ops);

Query Performance Tuning

Tuning Parameters

hnsw.ef_search

Increase for better recall at the cost of performance:

-- Default: 40, increase for selective filters
SET hnsw.ef_search = 100;

Recommended settings based on selectivity:

Filter Selectivity Recommended ef_search Impact
High (80%+ match) 40 (default) Minimal impact
Medium (20-80% match) 60-80 Balanced performance
Low (20% match) 100+ Better recall, slower queries

Similarity Threshold

Lower thresholds increase result count and query time:

{
  "similarity_threshold": 0.7,  // Good balance
  "similarity_threshold": 0.5,  // More results, slower
  "similarity_threshold": 0.9   // Fewer results, faster
}

Query Optimization Examples

Optimal for Repository-Specific Search

{
  "query": "payment processing",
  "repository_ids": ["payment-service"],
  "limit": 15,
  "similarity_threshold": 0.75
}

Optimal for Multi-Repository Language Search

{
  "query": "authentication",
  "languages": ["go", "typescript"],
  "limit": 20,
  "similarity_threshold": 0.7
}

Optimal for Highly Selective Search

{
  "query": "database connection",
  "repository_ids": ["backend-service"],
  "languages": ["rust"],
  "types": ["function"],
  "limit": 10,
  "similarity_threshold": 0.8
}

Benchmark Results

Test Environment

  • Dataset: 1M code embeddings across 50 repositories
  • Database: PostgreSQL 15 with pgvector 0.8.0
  • Hardware: 8 vCPU, 32GB RAM, SSD storage
  • Vector Dimensions: 768 (Gemini embeddings)

Performance Metrics

Query Type No Filter With Repository Filter With Language Filter Combined Filters
Average Latency 45ms 58ms 65ms 90ms
95th Percentile 72ms 89ms 98ms 135ms
Throughput 1,200 qps 980 qps 890 qps 650 qps

Iterative Scanning Impact

Iterative Scan Mode Result Completeness Query Time Memory Usage
off 60-80% of requested results Fastest Low
relaxed_order 95-99% completeness +15% time Low
strict_order 99-100% completeness +35% time Medium

Storage Impact

Denormalized metadata adds approximately 30% storage overhead:

Data Volume Without Metadata With Metadata Overhead
100K embeddings 150MB 195MB +45MB
1M embeddings 1.5GB 1.95GB +450MB
10M embeddings 15GB 19.5GB +4.5GB

ROI Analysis: Storage overhead is justified by 2-3x query performance improvement for filtered searches.

Monitoring and Debugging

Key Metrics

Query Performance

  • Average query latency: Target < 100ms
  • 95th percentile: Target < 200ms
  • Result completeness: Target > 95% of requested limit

Database Metrics

  • Index utilization: Should prefer composite indexes for multi-filter queries
  • HNSW cache hit ratio: Monitor pgvector internal caching
  • Statement timeout: Ensure long-running queries don't block

Debugging Queries

Enable Query Logging

ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 50; -- Log slow queries
SELECT pg_reload_conf();

Check Index Usage

EXPLAIN (ANALYZE, BUFFERS)
SELECT e.id, e.embedding <=> $1::vector AS distance
FROM codechunking.embeddings_partitioned e
WHERE e.repository_id = $1 AND e.language = 'go'
ORDER BY e.embedding <=> $1::vector
LIMIT 10;

Monitor Iterative Scanning

Look for these log messages:

{
  "level": "WARN",
  "message": "Failed to set iterative scan mode",
  "mode": "relaxed_order",
  "error": "pgvector not available"
}

Performance Testing Script

#!/bin/bash
# Test query performance with different filter combinations

API_BASE="http://localhost:8080"

test_query() {
  local name="$1"
  local payload="$2"

  echo "Testing: $name"
  time curl -X POST "$API_BASE/search" \
    -H "Content-Type: application/json" \
    -d "$payload" \
    -o /dev/null \
    -s \
    -w "Status: %{http_code}, Time: %{time_total}s, Size: %{size_download}\n"
  echo
}

# Test cases
test_query "No filters" '{"query": "authentication", "limit": 20}'

test_query "Repository filter" '{
  "query": "authentication",
  "repository_ids": ["auth-service"],
  "limit": 20
}'

test_query "Language filter" '{
  "query": "authentication",
  "languages": ["go"],
  "limit": 20
}'

test_query "Combined filters" '{
  "query": "authentication",
  "repository_ids": ["auth-service"],
  "languages": ["go", "typescript"],
  "types": ["function", "method"],
  "file_types": [".go", ".ts"],
  "limit": 20
}'

Best Practices

Query Design

  1. Always Use Repository Filtering when possible - enables iterative scanning
  2. Combine Filters Strategically - SQL-level filters work best together
  3. Set Appropriate Limits - 10-50 results is optimal for most use cases
  4. Use Similarity Thresholds - 0.7-0.8 provides good relevance/performance balance

Index Management

  1. Monitor Index Usage - Ensure composite indexes are being used
  2. Consider Query Patterns - Add indexes based on your most common filter combinations
  3. Maintain Statistics - Regular ANALYZE operations for optimal query planning

Performance Monitoring

  1. Track Query Latency - Set alerts for slow queries (>200ms)
  2. Monitor Result Completeness - Ensure iterative scanning is working
  3. Database Health - Monitor PostgreSQL and pgvector performance metrics

Scaling Considerations

  1. Partition Pruning - PostgreSQL automatically prunes irrelevant partitions
  2. Connection Pooling - Use pgbouncer for high-concurrency scenarios
  3. Caching Strategy - Cache frequent queries (same parameters + time window)

Troubleshooting Guide

Symptom: Getting fewer results than expected

Cause: Iterative scanning disabled or not working Solution:

search:
  iterative_scan_mode: relaxed_order

Symptom: Slow queries with filters

Cause: Missing composite indexes Solution: Add appropriate composite indexes for common filter combinations

Symptom: High memory usage

Cause: Large hnsw.ef_search values Solution: Reduce to 40-60 range and monitor result quality

Symptom: Poor query quality

Cause: Similarity threshold too high or low Solution: Adjust based on use case (0.7-0.8 for most scenarios)


Configuration Reference

For complete configuration options, see:

Related Documentation

Clone this wiki locally