-
Notifications
You must be signed in to change notification settings - Fork 0
vector storage performance
This guide covers performance optimizations for the vector storage system, including pgvector iterative scanning, metadata filtering, and indexing strategies for optimal query performance.
- Overview
- pgvector Iterative Scanning
- Metadata Filtering Performance
- Indexing Strategy
- Query Performance Tuning
- Benchmark Results
- Monitoring and Debugging
- Best Practices
The vector storage system is optimized for high-performance semantic search with selective filtering. Key optimizations include:
- pgvector 0.8.0+ iterative scanning ensures complete result sets even with selective filters
- SQL-level metadata filtering eliminates application-level filtering overhead
- Strategic indexing provides 2-3x performance improvements for filtered searches
- Denormalized metadata in partitioned tables for efficient WHERE clause filtering
Traditional vector search with approximate nearest neighbor (ANN) indexes often returns fewer results than requested when filters are applied:
-- Traditional approach: May return only 3 results instead of requested 20
SELECT e.id, e.embedding <=> $1::vector as distance
FROM codechunking.embeddings_partitioned e
WHERE e.repository_id = $2 AND e.language = 'go'
ORDER BY e.embedding <=> $1::vector
LIMIT 20; -- Requested 20, might get only 3pgvector 0.8.0+ iterative scanning automatically expands the search scope when filters reduce result count:
SET LOCAL hnsw.iterative_scan = 'relaxed_order';
-- Now guaranteed to return up to 20 results if they exist
SELECT e.id, e.embedding <=> $1::vector as distance
FROM codechunking.embeddings_partitioned e
WHERE e.repository_id = $2 AND e.language = 'go'
ORDER BY e.embedding <=> $1::vector
LIMIT 20; -- Will return up to 20 matching results| Mode | Description | Use Case | Performance |
|---|---|---|---|
off |
Disabled (default pgvector behavior) | Non-critical queries, simple filters | Fastest |
relaxed_order |
Approximate ordering with high recall | Recommended for most use cases | Fast |
strict_order |
Exact distance ordering | Benchmarks, ordering-critical queries | Slower |
search:
iterative_scan_mode: relaxed_order # Recommended default
# iterative_scan_mode: strict_order # For exact ordering
# iterative_scan_mode: off # Disable iterative scanningEnvironment variable override:
export CODECHUNK_SEARCH_ITERATIVE_SCAN_MODE=relaxed_orderThese filters use WHERE clauses and benefit from iterative scanning:
| Filter | SQL Implementation | Performance Impact |
|---|---|---|
| Repository | WHERE e.repository_id = $1 |
Minimal + iterative scanning benefits |
| Language | WHERE e.language = 'go' |
Minimal + iterative scanning benefits |
| Chunk Type | WHERE e.chunk_type = 'function' |
Minimal + iterative scanning benefits |
| File Extension | WHERE e.file_path LIKE '%.go' |
Minimal + iterative scanning benefits |
These filters are applied after the vector search results are retrieved:
| Filter | Implementation | Performance Impact |
|---|---|---|
| Entity Name | Post-processing filtering | Moderate impact |
| Visibility | Post-processing filtering | Low impact |
Real-world performance with 1M embeddings:
| Scenario | No Filter | App-Level Filter | SQL-Level Filter | Improvement |
|---|---|---|---|---|
| 10% match language | 45ms | 180ms | 65ms | 2.8x faster |
| 5% match chunk type | 45ms | 250ms | 75ms | 3.3x faster |
| Combined filters | 45ms | 320ms | 90ms | 3.6x faster |
{
"query": "authentication middleware",
"repository_ids": ["auth-service"],
"languages": ["go"],
"types": ["function", "method"],
"file_types": [".go"],
"limit": 20
}Generated SQL:
SET LOCAL hnsw.iterative_scan = 'relaxed_order';
SELECT e.id, e.chunk_id, e.repository_id, e.embedding, e.model_version,
e.created_at, e.deleted_at, e.language, e.chunk_type, e.file_path,
1 - (e.embedding <=> $1::vector) AS similarity,
(e.embedding <=> $1::vector) AS distance,
ROW_NUMBER() OVER (ORDER BY (1 - (e.embedding <=> $1::vector)) DESC) as rank
FROM codechunking.embeddings_partitioned e
WHERE e.deleted_at IS NULL
AND e.repository_id IN ($2, $3)
AND e.language IN ($4, $5)
AND e.chunk_type IN ($6, $7)
AND (e.file_path LIKE $8 OR e.file_path LIKE $9)
ORDER BY (1 - (e.embedding <=> $1::vector)) DESC
LIMIT 20;The embeddings_partitioned table uses strategic indexing for optimal performance:
-- Language filtering
CREATE INDEX idx_embeddings_partitioned_language
ON codechunking.embeddings_partitioned(language)
WHERE deleted_at IS NULL;
-- Chunk type filtering
CREATE INDEX idx_embeddings_partitioned_chunk_type
ON codechunking.embeddings_partitioned(chunk_type)
WHERE deleted_at IS NULL;-- Repository + Language (common pattern: "Go code in specific repo")
CREATE INDEX idx_embeddings_partitioned_repo_lang
ON codechunking.embeddings_partitioned(repository_id, language)
WHERE deleted_at IS NULL;
-- Repository + Chunk Type (common pattern: "Functions in specific repo")
CREATE INDEX idx_embeddings_partitioned_repo_type
ON codechunking.embeddings_partitioned(repository_id, chunk_type)
WHERE deleted_at IS NULL;
-- Language + Chunk Type (cross-repository filtering)
CREATE INDEX idx_embeddings_partitioned_lang_type
ON codechunking.embeddings_partitioned(language, chunk_type)
WHERE deleted_at IS NULL;-- Repository + Language + Chunk Type (highly selective queries)
CREATE INDEX idx_embeddings_partitioned_repo_lang_type
ON codechunking.embeddings_partitioned(repository_id, language, chunk_type)
WHERE deleted_at IS NULL;The query planner automatically selects the most efficient index based on the provided filters:
- Single filter: Uses corresponding single-column index
- Two filters: Uses appropriate composite index
- Three filters: Uses three-column composite index
- No filters: Uses HNSW vector index only
The primary vector index remains unchanged:
CREATE INDEX idx_embeddings_partitioned_vector
ON codechunking.embeddings_partitioned
USING hnsw (embedding vector_cosine_ops);Increase for better recall at the cost of performance:
-- Default: 40, increase for selective filters
SET hnsw.ef_search = 100;Recommended settings based on selectivity:
| Filter Selectivity | Recommended ef_search | Impact |
|---|---|---|
| High (80%+ match) | 40 (default) | Minimal impact |
| Medium (20-80% match) | 60-80 | Balanced performance |
| Low (20% match) | 100+ | Better recall, slower queries |
Lower thresholds increase result count and query time:
{
"similarity_threshold": 0.7, // Good balance
"similarity_threshold": 0.5, // More results, slower
"similarity_threshold": 0.9 // Fewer results, faster
}{
"query": "payment processing",
"repository_ids": ["payment-service"],
"limit": 15,
"similarity_threshold": 0.75
}{
"query": "authentication",
"languages": ["go", "typescript"],
"limit": 20,
"similarity_threshold": 0.7
}{
"query": "database connection",
"repository_ids": ["backend-service"],
"languages": ["rust"],
"types": ["function"],
"limit": 10,
"similarity_threshold": 0.8
}- Dataset: 1M code embeddings across 50 repositories
- Database: PostgreSQL 15 with pgvector 0.8.0
- Hardware: 8 vCPU, 32GB RAM, SSD storage
- Vector Dimensions: 768 (Gemini embeddings)
| Query Type | No Filter | With Repository Filter | With Language Filter | Combined Filters |
|---|---|---|---|---|
| Average Latency | 45ms | 58ms | 65ms | 90ms |
| 95th Percentile | 72ms | 89ms | 98ms | 135ms |
| Throughput | 1,200 qps | 980 qps | 890 qps | 650 qps |
| Iterative Scan Mode | Result Completeness | Query Time | Memory Usage |
|---|---|---|---|
| off | 60-80% of requested results | Fastest | Low |
| relaxed_order | 95-99% completeness | +15% time | Low |
| strict_order | 99-100% completeness | +35% time | Medium |
Denormalized metadata adds approximately 30% storage overhead:
| Data Volume | Without Metadata | With Metadata | Overhead |
|---|---|---|---|
| 100K embeddings | 150MB | 195MB | +45MB |
| 1M embeddings | 1.5GB | 1.95GB | +450MB |
| 10M embeddings | 15GB | 19.5GB | +4.5GB |
ROI Analysis: Storage overhead is justified by 2-3x query performance improvement for filtered searches.
- Average query latency: Target < 100ms
- 95th percentile: Target < 200ms
- Result completeness: Target > 95% of requested limit
- Index utilization: Should prefer composite indexes for multi-filter queries
- HNSW cache hit ratio: Monitor pgvector internal caching
- Statement timeout: Ensure long-running queries don't block
ALTER SYSTEM SET log_statement = 'all';
ALTER SYSTEM SET log_min_duration_statement = 50; -- Log slow queries
SELECT pg_reload_conf();EXPLAIN (ANALYZE, BUFFERS)
SELECT e.id, e.embedding <=> $1::vector AS distance
FROM codechunking.embeddings_partitioned e
WHERE e.repository_id = $1 AND e.language = 'go'
ORDER BY e.embedding <=> $1::vector
LIMIT 10;Look for these log messages:
{
"level": "WARN",
"message": "Failed to set iterative scan mode",
"mode": "relaxed_order",
"error": "pgvector not available"
}#!/bin/bash
# Test query performance with different filter combinations
API_BASE="http://localhost:8080"
test_query() {
local name="$1"
local payload="$2"
echo "Testing: $name"
time curl -X POST "$API_BASE/search" \
-H "Content-Type: application/json" \
-d "$payload" \
-o /dev/null \
-s \
-w "Status: %{http_code}, Time: %{time_total}s, Size: %{size_download}\n"
echo
}
# Test cases
test_query "No filters" '{"query": "authentication", "limit": 20}'
test_query "Repository filter" '{
"query": "authentication",
"repository_ids": ["auth-service"],
"limit": 20
}'
test_query "Language filter" '{
"query": "authentication",
"languages": ["go"],
"limit": 20
}'
test_query "Combined filters" '{
"query": "authentication",
"repository_ids": ["auth-service"],
"languages": ["go", "typescript"],
"types": ["function", "method"],
"file_types": [".go", ".ts"],
"limit": 20
}'- Always Use Repository Filtering when possible - enables iterative scanning
- Combine Filters Strategically - SQL-level filters work best together
- Set Appropriate Limits - 10-50 results is optimal for most use cases
- Use Similarity Thresholds - 0.7-0.8 provides good relevance/performance balance
- Monitor Index Usage - Ensure composite indexes are being used
- Consider Query Patterns - Add indexes based on your most common filter combinations
-
Maintain Statistics - Regular
ANALYZEoperations for optimal query planning
- Track Query Latency - Set alerts for slow queries (>200ms)
- Monitor Result Completeness - Ensure iterative scanning is working
- Database Health - Monitor PostgreSQL and pgvector performance metrics
- Partition Pruning - PostgreSQL automatically prunes irrelevant partitions
- Connection Pooling - Use pgbouncer for high-concurrency scenarios
- Caching Strategy - Cache frequent queries (same parameters + time window)
Cause: Iterative scanning disabled or not working Solution:
search:
iterative_scan_mode: relaxed_orderCause: Missing composite indexes Solution: Add appropriate composite indexes for common filter combinations
Cause: Large hnsw.ef_search values
Solution: Reduce to 40-60 range and monitor result quality
Cause: Similarity threshold too high or low Solution: Adjust based on use case (0.7-0.8 for most scenarios)
For complete configuration options, see:
- Search API Guide - Complete API reference
- Database Configuration - PostgreSQL setup
- Search Configuration - Search-specific settings
Configuration
- [📖 Configuration Reference](configuration reference) - Complete reference guide
- Configuration
- API Configuration
- Database Configuration
- Gemini Configuration
- Git Configuration
- Logging Configuration
- Middleware Configuration
- NATS Configuration
- Worker Configuration