Skip to content

Batch Processing Guide

Anthony Bible edited this page Nov 29, 2025 · 2 revisions

Batch Processing Guide

This guide explains the CodeChunking system's batch processing architecture and embedding generation workflow.

Overview

The CodeChunking system uses a multi-layer batch processing architecture that enables efficient embedding generation at scale through:

  • Three-layer processing architecture - Batch submission, progress tracking, and queue management
  • FIFO queue management - Simple first-in-first-out processing for consistent ordering
  • Rate-limit aware submission - Exponential backoff with global rate limit detection
  • Token counting - Pre-flight token counting with progressive chunk saving
  • Automatic fallback - Sequential processing fallback when batch operations fail

Architecture

┌─────────────────┐    ┌──────────────────┐    ┌─────────────────────┐
│   Job Processor │───▶│  Token Counter   │───▶│  Processing Mode    │
│                 │    │                  │    │     Decision        │
│ - Receives job  │    │ - Count tokens   │    │                     │
│ - Load chunks   │    │ - Progressive    │    │ chunks >= threshold │
│                 │    │   save (50/batch)│    │   → Batch Mode      │
└─────────────────┘    └──────────────────┘    │ chunks < threshold  │
                                               │   → Sequential Mode │
                                               └─────────────────────┘
                                                        │
                       ┌────────────────────────────────┼────────────────────────────────┐
                       │                                │                                │
                       ▼                                ▼                                │
            ┌──────────────────┐              ┌──────────────────┐                       │
            │   Batch Mode     │              │ Sequential Mode  │                       │
            │                  │              │                  │                       │
            │ - Queue requests │              │ - One-by-one     │                       │
            │ - Pre-save chunks│              │ - Immediate save │                       │
            │ - Submit batch   │              │ - Direct API     │                       │
            └────────┬─────────┘              └──────────────────┘                       │
                     │                                                                   │
                     ▼                                                                   │
            ┌──────────────────┐    ┌──────────────────┐    ┌──────────────────┐        │
            │  Batch Submitter │───▶│   Gemini API     │───▶│  Batch Poller    │        │
            │                  │    │                  │    │                  │        │
            │ - Rate limiting  │    │ - Files API      │    │ - Poll status    │        │
            │ - Exp. backoff   │    │ - Batches API    │    │ - Fetch results  │        │
            │ - Retry logic    │    │                  │    │ - Update chunks  │        │
            └──────────────────┘    └──────────────────┘    └──────────────────┘        │
                     │                                               │                   │
                     │              ┌──────────────────┐              │                   │
                     └─────────────▶│  Batch Progress  │◀─────────────┘                   │
                                    │    Tracking      │                                  │
                                    │                  │                                  │
                                    │ - Job status     │                                  │
                                    │ - Retry state    │                                  │
                                    │ - File URI cache │                                  │
                                    └──────────────────┘                                  │
                                             │                                            │
                                             │ (on failure + fallback enabled)            │
                                             └────────────────────────────────────────────┘

Three-Layer Architecture

Layer 1: Batch Submission (Gemini Batches API)

Location: internal/adapter/outbound/gemini/batch_embedding_client.go

The batch submission layer interfaces directly with Google's Gemini Batches API:

  1. File Upload: Chunks are converted to JSONL format and uploaded via Google Files API
  2. Job Creation: Batch job submitted to Gemini with reference to uploaded file
  3. Request ID Encoding: UUIDs encoded as chunk_<uuid_without_hyphens> for roundtrip mapping

Key Constraints:

  • Maximum 500 chunks per API batch (hard limit in Gemini API)
  • File URI persisted for idempotent retry on transient failures

Layer 2: Progress Tracking

Location: internal/domain/entity/batch_job_progress.go, internal/application/worker/batch_poller.go

Tracks individual batch jobs through their complete lifecycle:

┌───────────────────┐
│ pending_submission│ ◀─────────────────────────────┐
└─────────┬─────────┘                               │
          │ Submitter attempts                      │
          ▼                                         │
┌───────────────────┐                               │
│    processing     │                               │
└─────────┬─────────┘                               │
          │ Poller checks status                    │
          ├──────────────────┐                      │
          ▼                  ▼                      │
┌───────────────┐   ┌────────────────┐              │
│   completed   │   │ retry_scheduled │─────────────┘
└───────────────┘   └────────────────┘   (backoff expires)
                            │
                            ▼ (max retries exceeded)
                    ┌───────────────┐
                    │    failed     │
                    └───────────────┘

Batch Job States:

State Description
pending_submission Ready for first submission attempt
processing Submitted to Gemini, awaiting results
completed Successfully processed
retry_scheduled Scheduled for retry with backoff
failed Permanent failure (max retries exceeded)

Layer 3: Queue Management

Location: internal/adapter/outbound/queue/batch_queue_manager.go

Manages queuing before batch submission using a simple FIFO (first-in-first-out) queue. Requests are processed in the order they are received, ensuring predictable and fair ordering.

Queue Statistics and Monitoring

The queue manager tracks queue statistics:

// From batch_queue_manager.go QueueStats
type QueueStats struct {
    QueueSize      int  // Current queue depth
    TotalProcessed int  // Total requests processed
    TotalBatches   int  // Total batches created
    TotalErrors    int  // Total processing errors
    // ... other metrics
}

Submitting Embedding Requests

When submitting embedding requests, create an EmbeddingRequest:

request := &outbound.EmbeddingRequest{
    RequestID: uuid.New().String(),
    Text:      chunkContent,
    Deadline:  timePtr(time.Now().Add(5 * time.Second)),  // Optional deadline
    Metadata: map[string]interface{}{
        "repository_id": repoID,
        "chunk_id":      chunkID,
    },
}

Processing Modes

Mode Decision

The system chooses between batch and sequential processing based on chunk count:

if len(chunks) >= batchConfig.ThresholdChunks && batchProcessingEnabled {
    // Use batch processing
    processBatchEmbeddings(chunks)
} else {
    // Use sequential processing
    processSequentialEmbeddings(chunks)
}

Sequential Mode

When used:

  • Chunk count below threshold
  • Batch processing disabled
  • Fallback from failed batch operations

Characteristics:

  • Process chunks one-by-one
  • Each embedding saved immediately to database
  • Higher API cost per chunk
  • Simple error handling
  • Lower memory usage

Batch Mode

When used:

  • Chunk count meets or exceeds threshold
  • Batch processing enabled
  • Queue manager available

Characteristics:

  • Chunks pre-saved to database before API submission
  • Single API call for multiple chunks
  • Rate-limit aware submission with exponential backoff
  • Asynchronous polling for job completion
  • Lower API cost per chunk

Test Mode (Development)

Configuration: use_test_embeddings: true

Behavior:

  • Generates synthetic 768-dimensional vectors locally
  • Hash-based deterministic vectors (same content = same vector)
  • No API calls - completely offline
  • Fast iteration for development and testing

Token Counting

Token counting is a pre-flight operation that runs before embedding generation:

Token Counting Modes

Mode Behavior Use Case
all Count tokens for all chunks Production (accurate billing)
sample Count X% of chunks Large repositories (estimation)
on_demand Skip counting Performance-critical paths

Progressive Saving

Token counts are saved progressively in batches of 50 chunks to:

  • Reduce memory pressure
  • Enable progress tracking
  • Allow recovery on interruption

Configuration

token_counting:
  enabled: true
  mode: "all"              # "all", "sample", or "on_demand"
  sample_percent: 10       # For "sample" mode
  max_tokens_per_chunk: 8192   # Gemini embedding model limit

Rate Limit Handling

Submission Retry Logic

The batch submitter implements sophisticated rate limit handling:

  1. Detection: Identifies rate limit errors from Gemini API responses
  2. Global Backoff: Applies backoff to all pending submissions
  3. Exponential Backoff: Increases delay on consecutive failures
  4. Idempotent Retry: Reuses uploaded file URI to prevent duplicate uploads

Backoff Configuration

batch_processing:
  submission_initial_backoff: 1m   # Initial backoff on rate limit
  submission_max_backoff: 30m      # Maximum backoff duration
  max_submission_attempts: 10      # Max retries before permanent failure

Backoff Progression

Attempt 1: Fail → Wait 1m
Attempt 2: Fail → Wait 2m
Attempt 3: Fail → Wait 4m
Attempt 4: Fail → Wait 8m
...
Attempt N: Fail → Wait min(2^N minutes, 30m)

Configuration Reference

Base Configuration (config.yaml)

batch_processing:
  enabled: true                    # Enable batch processing
  threshold_chunks: 50             # Min chunks to trigger batch mode
  max_batch_size: 500              # Max chunks per Gemini API batch
  fallback_to_sequential: true     # Fallback on batch failures
  use_test_embeddings: false       # Use synthetic vectors (dev only)

  # Retry configuration
  initial_backoff: 30s
  max_backoff: 300s
  max_retries: 3

  # Submitter configuration
  submitter_poll_interval: 5s
  max_concurrent_submissions: 1
  submission_initial_backoff: 1m
  submission_max_backoff: 30m
  max_submission_attempts: 10

  # Poller configuration
  poller_interval: 30s
  max_concurrent_polls: 5

  # Queue limits
  queue_limits:
    max_queue_size: 10000
    max_wait_time: 30m

  # Token counting
  token_counting:
    enabled: true
    mode: "all"
    sample_percent: 10
    max_tokens_per_chunk: 8192

Development Overrides (config.dev.yaml)

batch_processing:
  threshold_chunks: 10             # Lower threshold for testing
  max_batch_size: 300              # Smaller batches
  fallback_to_sequential: false    # Test batch mode without fallback
  initial_backoff: 5s              # Shorter backoffs
  max_backoff: 60s
  max_retries: 2

Production Overrides (config.prod.yaml)

batch_processing:
  threshold_chunks: 100            # Higher threshold for efficiency
  max_batch_size: 1000             # Larger batches
  fallback_to_sequential: true     # Ensure reliability
  initial_backoff: 1m              # Longer backoffs
  max_backoff: 10m
  max_retries: 5
  poller_interval: 60s             # Less frequent polling
  max_concurrent_polls: 10         # More parallel polls

  queue_limits:
    max_queue_size: 50000          # Larger queue
    max_wait_time: 60m

Monitoring and Observability

Key Metrics

The system emits OpenTelemetry metrics for monitoring:

Throughput Metrics

  • batch_chunks_processed - Total chunks processed
  • batch_jobs_submitted - Batch jobs submitted to Gemini
  • batch_jobs_completed - Successfully completed jobs

Latency Metrics

  • batch_submission_latency - Time from queue to submission
  • batch_processing_latency - Gemini processing time
  • batch_end_to_end_latency - Total processing time

Error Metrics

  • batch_submission_failures - Failed submissions
  • batch_rate_limit_hits - Rate limit encounters
  • batch_fallback_count - Sequential fallback invocations

Log Messages

Test Mode Active:

"Processing batch embedding results (TEST MODE)"
"Using test embeddings for development"

Production Mode Active:

"Processing batch embedding results (PRODUCTION MODE)"
"chunk_count": 8269

Fallback Triggered:

"Batch processing failed, falling back to sequential"

Troubleshooting

Common Issues

Batch Mode Not Activating

Symptoms: Always using sequential processing

Check:

  1. batch_processing.enabled is true
  2. Chunk count meets threshold_chunks
  3. Queue manager initialized (check logs for "batch queue manager initialized")

Rate Limit Errors

Symptoms: Submissions failing with rate limit errors

Solutions:

  1. Reduce max_concurrent_submissions to 1
  2. Increase submission_initial_backoff
  3. Check Gemini API quota limits

High Latency

Symptoms: Slow embedding generation

Solutions:

  1. Increase max_concurrent_polls for faster status checking
  2. Reduce poller_interval for more frequent checks
  3. Consider reducing max_batch_size for faster individual batches

Queue Buildup

Symptoms: Requests waiting too long in queue

Solutions:

  1. Increase max_concurrent_submissions
  2. Check for rate limit backoff (review logs)
  3. Increase worker concurrency

Debugging Commands

# Check current queue status
codechunking worker --debug --log-level debug

# Monitor batch metrics
curl -s http://localhost:8080/metrics | grep batch_

# View worker logs with batch info
make logs-worker | grep -E "(batch|embedding)"

Integration Examples

Repository Indexing

job := &EmbeddingJob{
    Chunks:       repoChunks,
    MaxBatchSize: 200,  // Balanced for throughput
}

Bulk Migration

job := &EmbeddingJob{
    Chunks:       migrationChunks,
    MaxBatchSize: 500,  // Maximum efficiency
}

Performance Characteristics

Throughput vs Latency Trade-offs

Strategy Throughput Latency Cost Efficiency
Small Batches Low Minimal Low
Medium Batches Medium Low Medium
Large Batches High Medium High
Maximum Batches Maximum High Maximum

Cost Optimization

  • Batch Mode: Reduces API calls significantly compared to sequential
  • Token Counting: Enables accurate cost prediction and billing
  • Rate Limit Awareness: Avoids expensive retry storms

Verifying Batch Processing

To confirm batch processing is working correctly:

  1. Check Configuration:

    • use_test_embeddings: false (for production)
    • batch_processing.enabled: true
    • CODECHUNK_GEMINI_API_KEY environment variable set
  2. Monitor Logs:

    • Look for "Processing batch embedding results (PRODUCTION MODE)"
    • Verify chunk counts in batch submissions
  3. Review Metrics:

    • batch_jobs_submitted should increase
    • batch_fallback_count should remain low

See BATCH_JOB_STATUS.md for additional troubleshooting and monitoring guidance.

Clone this wiki locally