fix(bench): sample symbols in embedding benchmark to prevent CI timeouts#927
fix(bench): sample symbols in embedding benchmark to prevent CI timeouts#927carlos-alm merged 3 commits intomainfrom
Conversation
The codebase grew from ~1095 to 7128 symbols, causing every model worker to exceed the 30-min timeout during the search phase. Add deterministic sampling (seeded PRNG, cap 1500 symbols) so runtime stays bounded as the repo grows. Embedding still covers all DB symbols; only the search evaluation is sampled.
|
Claude finished @carlos-alm's task —— View job PR Review: Embedding Benchmark Timeout FixTodo List
Root Cause AnalysisProblem Being Solved: Legitimate scaling issue — codebase growth from ~1,095 to 7,128 symbols caused all embedding model workers to exceed the 30-min timeout during sequential search evaluation. This is not hiding symptoms but addressing genuine resource constraints. Why This Approach:
Risk Assessment: LOW — This fixes a real scalability bottleneck without compromising measurement quality. Backlog Compliance
Critical ConcernsNone. This is a well-executed fix for a legitimate scaling problem. Final Recommendation
Minor suggestions for future:
|
Greptile SummaryThis PR fixes a CI timeout regression in the embedding benchmark by adding deterministic symbol sampling (mulberry32 PRNG, Fisher-Yates shuffle, seed=42, capped at 1,500) that applies only to the search-recall evaluation loop, leaving the Confidence Score: 5/5Safe to merge — the sampling logic is correct and bounded, the timeout math is consistent, and the one open concern (ambiguous The PRNG (mulberry32) and Fisher-Yates shuffle are implemented correctly; the input order is stable (ORDER BY file, line) making the sample reproducible; scripts/embedding-benchmark.ts — the output JSON Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Worker process start] --> B[loadSymbols\nORDER BY file, line]
B --> C{symbols.length\n> MAX_SYMBOLS\n1500?}
C -- No --> D[Use all symbols]
C -- Yes --> E[seededShuffle seed=42\nFisher-Yates + mulberry32]
E --> F[.slice 0, 1500]
F --> D
D --> G[buildEmbeddings\nall DB symbols via SQLite]
G --> H[Search recall loop\nover sampled 1500 symbols]
H --> I[Compute hits@1/3/5/10]
I --> J[stdout: JSON\nsymbols: 1500, result]
J --> K[Parent collects\nsymbolCount = last data.symbols]
K --> L[Output JSON\nsymbols: 1500 sampled count]
Reviews (2): Last reviewed commit: "fix(bench): correct stale timing estimat..." | Re-trigger Greptile |
| * Cap symbol count so CI stays under the per-model timeout. | ||
| * At ~1500 symbols on a CPU-only runner, embed + search takes ~12-15 min | ||
| * per model — well within the 45-min timeout with headroom to spare. | ||
| */ | ||
| const MAX_SYMBOLS = 1500; |
There was a problem hiding this comment.
Stale timing estimates in JSDoc
The comment says "embed + search takes ~12-15 min per model" and "well within the 45-min timeout", but TIMEOUT_MS is 1_800_000 ms = 30 min, not 45 min. The TIMEOUT_MS comment also separately estimates embed at ~18 min and search at ~5 min (~23 min total), which contradicts the 12-15 min claim here. Because buildEmbeddings runs over all DB symbols (~7,128), not just the sampled 1,500, the 23-min estimate is more plausible.
| * Cap symbol count so CI stays under the per-model timeout. | |
| * At ~1500 symbols on a CPU-only runner, embed + search takes ~12-15 min | |
| * per model — well within the 45-min timeout with headroom to spare. | |
| */ | |
| const MAX_SYMBOLS = 1500; | |
| /** | |
| * Cap symbol count so CI stays under the per-model timeout. | |
| * At ~1500 symbols on a CPU-only runner, search evaluation takes ~5 min; | |
| * embedding all DB symbols takes ~18 min — ~23 min total, within the 30-min timeout. | |
| */ |
There was a problem hiding this comment.
Fixed in e7b1008 — updated the JSDoc to match the actual TIMEOUT_MS (30 min) and the realistic timing breakdown (~18 min embed + ~5 min search = ~23 min total). The stale "12-15 min" and "45-min timeout" references are gone.
There was a problem hiding this comment.
Fixed in e7b1008 — updated the JSDoc to match the actual TIMEOUT_MS (30 min) and the realistic timing breakdown (~18 min embed + ~5 min search = ~23 min total). The stale "12-15 min" and "45-min timeout" references are gone.
Codegraph Impact Analysis1 functions changed → 1 callers affected across 1 files
|
Summary
Test plan
benchmark.ymlviaworkflow_dispatchwithversion: devand verify the embedding-benchmark job completes successfullySampling 1500 of N symbols (deterministic seed=42)EMBEDDING-BENCHMARKS.mdis updated with valid results via auto-PR