Conversation
…tch on failover Two related bugs in FailoverEmbedder when the primary embedding server is down: 1. Once initServers() marked all servers unhealthy, the checked flag prevented re-probing on subsequent Embed() calls — even after the server came back up. Fix: re-probe when active < 0 and a 30-second cooldown has elapsed. Also reset active = -1 on failover exhaustion so the re-probe path triggers. 2. ModelName() and Dimensions() returned stale server-0 values before the first Embed() call. When failover later selected a different server with different dimensions, the Indexer/Store was already created with wrong dimensions, causing sqlite-vec errors. Fix: trigger eager initServers() in ModelName() and Dimensions() so they reflect the server that will actually handle embeddings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughAdded cooldown-based periodic health re-probing to FailoverEmbedder. When no active server exists, Dimensions(), ModelName(), and Embed() can trigger a re-probe after a 30s cooldown. Failover exhaustion now sets active index to -1 and initialization records the last probe time. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/embedder/failover.go (1)
74-88: Consider extracting duplicated init/reprobe logic.The
needsInit/needsReprobepattern is repeated inDimensions(),ModelName(), andEmbed(). This is minor but could be consolidated into a helper method called under the lock.♻️ Optional refactor to reduce duplication
+// maybeReprobe checks if initialization or re-probing is needed and performs it. +// Must be called with f.mu held. +func (f *FailoverEmbedder) maybeReprobe(log bool) { + needsInit := !f.checked || f.serversChanged() + needsReprobe := f.active < 0 && time.Since(f.lastProbeTime) >= reprobeInterval + if needsInit || needsReprobe { + if log && needsReprobe && f.logger != nil { + f.logger.Info("re-probing embedding servers after cooldown") + } + f.initServers() + f.checked = true + } +} + func (f *FailoverEmbedder) Dimensions() int { f.mu.Lock() - needsInit := !f.checked || f.serversChanged() - needsReprobe := f.active < 0 && time.Since(f.lastProbeTime) >= reprobeInterval - if needsInit || needsReprobe { - f.initServers() - f.checked = true - } + f.maybeReprobe(false) idx := f.active f.mu.Unlock()Then apply similarly to
ModelName()andEmbed()(withf.maybeReprobe(true)forEmbedto enable logging).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/embedder/failover.go` around lines 74 - 88, Extract the duplicated init/reprobe logic in FailoverEmbedder into a new helper method (e.g., maybeReprobe) that runs under f.mu lock, checks the same conditions (uses f.checked, f.serversChanged(), f.active, time.Since(f.lastProbeTime) vs reprobeInterval), calls f.initServers() and sets f.checked as needed, and returns the current f.active index; update Dimensions() and ModelName() to call this helper, and update Embed() to call maybeReprobe(true) (or pass a boolean to enable logging) so the logging-enabled reprobe path is used there.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@internal/embedder/failover.go`:
- Around line 74-88: Extract the duplicated init/reprobe logic in
FailoverEmbedder into a new helper method (e.g., maybeReprobe) that runs under
f.mu lock, checks the same conditions (uses f.checked, f.serversChanged(),
f.active, time.Since(f.lastProbeTime) vs reprobeInterval), calls f.initServers()
and sets f.checked as needed, and returns the current f.active index; update
Dimensions() and ModelName() to call this helper, and update Embed() to call
maybeReprobe(true) (or pass a boolean to enable logging) so the logging-enabled
reprobe path is used there.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fc56bf9d-e88d-4674-9b26-3c62d899b670
📒 Files selected for processing (2)
internal/embedder/failover.gointernal/embedder/failover_test.go
…eprobe logic Dimensions(), ModelName(), and Embed() all had identical init/reprobe blocks. Extract into maybeReprobe(log bool) called under the existing mutex, with the log parameter controlling reprobe diagnostics in Embed(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary
initServers()found all embedding servers down,FailoverEmbedderwas permanently stuck returning errors — even after the server came back up. Now re-probes every 30 seconds when no healthy server is available.ModelName()andDimensions()returned stale server-0 values before the firstEmbed()call. When failover selected a different server (e.g. Ollama 768-dim instead of LM Studio 3584-dim), the Indexer was created with wrong dimensions, causing sqlite-vec errors. Now both methods eagerly init to return correct values.activewasn't reset to-1, preventing the re-probe path from triggering on subsequent calls.Test plan
TestFailover_ReprobesAfterCooldown— all servers down, one recovers, re-probe after 30s cooldown succeedsTestFailover_ReprobesAfterFailoverExhaustion— servers healthy on probe but 500 on embed, exhaustion resets active, re-probe worksTestFailover_ModelNameTriggersInit—ModelName()eagerly inits and returns healthy server's modelTestFailover_DimensionsTriggersInit—Dimensions()eagerly inits and returns healthy server's dimsTestFailover_DimensionsReflectActiveto match new eager init behavior🤖 Generated with Claude Code
Summary by CodeRabbit
Bug Fixes
Improvements
Tests