feat: Genkit AIProvider adapter with per-tool strict + startup cache warmup#63
feat: Genkit AIProvider adapter with per-tool strict + startup cache warmup#63Anthony-Bible wants to merge 2 commits into
Conversation
|
Claude finished @Anthony-Bible's task in 3m 41s —— View job PR Review: feat: Genkit AIProvider adapter with per-tool strict + startup cache warmup
SummaryThis is a well-motivated PR with solid architecture — the hexagonal design is respected, the readiness gating is clean, and the empirical cache-timing data is a nice touch. Several issues need attention before merging, ranging from a correctness bug to supply-chain hygiene. Issues FoundBug — Tool name collision in
|
|
Claude finished @Anthony-Bible's task in 6m 8s —— View job PR Review: feat/genkit-adapter-with-warmup
SummaryWell-motivated PR with solid overall design. The architecture choices (Genkit as a side-by-side adapter, There is one critical bug (data race in CriticalData race:
SignificantSilent name-collision overwrite in When two tools have different original names that sanitize to the same key (e.g.,
An error from
Both sites manually replicate Minor
What's working well
|
| refs = append(refs, existing) | ||
| continue | ||
| } | ||
| } |
There was a problem hiding this comment.
Bug: TOCTOU race on tool name collision
There's a subtle race where two distinct tool names that sanitize to the same 64-char prefix (or the same already-invalid characters) will silently collide:
- Tool
foo_bar_bazandfoo_bar_baxboth sanitize to identical name → second overridestoolNameMap[sanitized]to its original, butalreadyRegistered = trueso it reuses the first tool's schema from the genkit registry.
Additionally, there's a TOCTOU window for the first registration: two concurrent goroutines can both see alreadyRegistered = false before either writes the map entry, then both call defineToolSafe. The second caller gets a panic-recovered error propagated to the user, rather than gracefully reusing the already-registered tool.
A safer pattern would be to check the genkit registry directly (before the map) and treat any successful LookupTool as authoritative, or to use a sync.Once per tool name:
// Instead of the current lock-check-unlock-lookuptool pattern:
a.nameMapMu.Lock()
_, alreadyRegistered := a.toolNameMap[sanitized]
a.toolNameMap[sanitized] = t.Name
a.nameMapMu.Unlock()
if alreadyRegistered {
if existing := genkit.LookupTool(a.g, sanitized); existing != nil {
refs = append(refs, existing)
continue
}
// fall through to re-register if LookupTool returned nil
}Consider also logging a warning when two different original names collapse to the same sanitized name, rather than silently using the wrong schema.
| } | ||
|
|
||
| content := contentBuilder.String() | ||
| if content == "" && len(thinkingBlocks) > 0 { |
There was a problem hiding this comment.
Spurious content fallback when AI returns only tool calls
When the model responds with only tool requests and no text (the common agentic loop case), content is empty and falls through to:
content = "[No content received from AI]"This string ends up in the conversation history as an assistant message, which could confuse the model on subsequent turns or appear in debug logs as an error signal. The AnthropicAdapter presumably returns an empty string or a different placeholder in this case.
Consider allowing empty content when len(entityToolCalls) > 0, and only using the fallback when truly nothing was in the response:
if content == "" && len(thinkingBlocks) > 0 {
content = "[AI internal reasoning completed]"
}
// Don't inject a fake content string when the response contains tool calls —
// that's a normal agentic turn with no text.
if content == "" && len(entityToolCalls) == 0 {
content = "[No content received from AI]"
}| } | ||
| } | ||
|
|
||
| // toolResultOutput packages a tool-call string result into a JSON-friendly |
There was a problem hiding this comment.
Tool result wrapping may differ from AnthropicAdapter's wire format
Results are wrapped in {"result": "..."} / {"error": "..."}, but the AnthropicAdapter presumably sends the raw string content back as a tool result. If the model has been trained or prompted to expect raw strings, wrapping them in a JSON object map could cause unexpected parsing behavior or confuse the AI's tool-result interpretation.
It's worth verifying that the Genkit Anthropic plugin correctly unwraps this before sending to the API, or that the model handles the wrapped form the same way.
| // | ||
| // TODO(refactor): extract to a shared helper consumed by both adapters. | ||
| // Duplicated for now to avoid touching anthropic_adapter.go in this PR. | ||
| func (a *GenkitAdapter) getSystemPrompt(ctx context.Context, opts port.AIRequestOptions) (string, error) { |
There was a problem hiding this comment.
Duplicated system prompt logic
The TODO comment here is correct — getSystemPrompt, buildPlanModePrompt, and buildBasePromptWithAgents are all duplicated from AnthropicAdapter. Beyond the maintenance overhead of keeping these in sync, there's a behavioral risk: if a bug is fixed in one adapter's prompt logic, the other silently diverges.
Consider extracting a shared systemPromptBuilder helper in the ai package (not in domain — it's adapter-specific), so both adapters call the same code. This is low-risk refactor with high return.
| // in-flight warmup HTTP request finish cleanly under its own 3-minute | ||
| // timeout rather than tearing it midway. | ||
| webhookAdapter.SetWarmupRequired(true) | ||
| go runWarmup(context.Background(), container, webhookAdapter, container.Logger()) |
There was a problem hiding this comment.
Warmup runs unconditionally for all providers
runWarmup always fires regardless of which AGENT_AI_PROVIDER is selected. If AnthropicAdapter is in use (the default), or if a user is pointing ANTHROPIC_BASE_URL at a local stub that doesn't support all the tool schemas, the warmup call will still block /ready for up to 3 minutes on failure.
Also: context.Background() is intentionally used to survive Ctrl+C (well documented in the PR), but it means if the user kills the process quickly, the goroutine outlives the server and calls MarkWarm() on a now-orphaned struct. This is harmless (setting an atomic bool), but slightly surprising.
Consider gating the warmup on a config flag (e.g., AGENT_WARMUP_ENABLED=true defaulting to false), or only enabling it when AGENT_AI_PROVIDER=genkit. This would prevent unnecessary startup delays in development or testing.
| // ToolExecutor and converts each entity.Tool to the port.ToolParam shape | ||
| // used on the wire. Mirrors the conversion in ConversationService so the | ||
| // schema fingerprint is identical to what real requests send. | ||
| func ToolParamsFromExecutor(exec port.ToolExecutor) ([]port.ToolParam, error) { |
There was a problem hiding this comment.
Duplicated conversion logic
The comment says this mirrors ConversationService's conversion. This is a maintenance hazard: if ConversationService.prepareAIRequest adds new fields to the ToolParam conversion (e.g., a future InputSchemaVersion field), ToolParamsFromExecutor will silently produce a different schema fingerprint, breaking the warmup's cache-hit guarantee.
Since the PR's warmup correctness depends on the fingerprint being identical to what real requests send, this duplication is load-bearing. A shared conversion helper (or exposing the conversion through the service itself) would be safer.
| toolchain go1.24.11 | ||
| go 1.25.0 | ||
|
|
||
| require ( |
There was a problem hiding this comment.
Forked dependency via replace directive
The replace directive pins to github.com/Anthony-Bible/genkit (a personal fork) because upstream firebase/genkit doesn't expose WithStrict(bool) per-tool. While the PR explains the two bugs fixed in the fork, this creates a supply chain dependency that:
- Won't receive upstream security patches automatically
- May become stale if the fork isn't updated when genkit ships breaking changes
- Blocks upgrading other dependencies that transitively depend on genkit's stable API
The PR description mentions upstreaming these fixes. Before merging, it's worth opening or tracking upstream PRs, and setting a concrete timeline for removing the replace directive once they're accepted. If upstreaming is blocked, consider vendoring (go mod vendor) so the fork is audited and version-pinned in the repo itself.
| // registerTools, which is idempotent and synchronized. Sharing keeps | ||
| // sanitized→original lookups consistent across clones. | ||
| toolNameMap: a.toolNameMap, | ||
| } |
There was a problem hiding this comment.
Data race: shared map, independent mutexes
toolNameMap is shared by reference, but nameMapMu is a value field — the clone gets a fresh zero-value sync.RWMutex, not a copy of the original's. Both the original and every clone will call registerTools (which always writes to the shared map even on repeated calls — see line 364), and they'll be guarded by different mutexes. The Go race detector will flag this.
Fix: use a pointer mutex shared across clones, or switch to sync.Map:
type GenkitAdapter struct {
// ...
nameMapMu *sync.RWMutex // shared across clones
toolNameMap map[string]string
}and in Clone():
return &GenkitAdapter{
// ...
nameMapMu: a.nameMapMu, // share the same mutex
toolNameMap: a.toolNameMap,
}Or, simpler: replace nameMapMu + toolNameMap with a sync.Map — it's designed for exactly this "written once, read many" access pattern.
| a.nameMapMu.Lock() | ||
| _, alreadyRegistered := a.toolNameMap[sanitized] | ||
| a.toolNameMap[sanitized] = t.Name | ||
| a.nameMapMu.Unlock() |
There was a problem hiding this comment.
Always writes to toolNameMap, even when alreadyRegistered
Even on repeated calls (subsequent requests, clone calls), this unconditionally writes a.toolNameMap[sanitized] = t.Name. Combined with the mutex-sharing issue in Clone(), this is the write that causes the race.
There's also a silent name-collision bug here: if two tools have different original names but sanitize to the same key (e.g., "weird.tool" and "weird_tool" both become "weird_tool"), the second write silently overwrites the first mapping. The first tool's calls will then be mapped back to the second tool's name. Consider adding a collision check:
if existing, ok := a.toolNameMap[sanitized]; ok && existing != t.Name {
a.logger.Warn("tool name collision after sanitization — two tools share the same sanitized name",
"sanitized", sanitized, "first", existing, "second", t.Name)
}
a.toolNameMap[sanitized] = t.Name| if a.subagentManager == nil { | ||
| return basePrompt, nil | ||
| } | ||
| agents, err := a.subagentManager.ListAgents(ctx) |
There was a problem hiding this comment.
Agent listing error fails the entire AI request
If ListAgents fails (transient network error, missing directory, etc.), the error propagates all the way up through getSystemPrompt → buildGenerateOptions → SendMessage, causing the user's AI call to fail with "failed to list agents." This is surprising behavior.
Consider degrading gracefully instead, matching the pattern used elsewhere:
agents, err := a.subagentManager.ListAgents(ctx)
if err != nil {
a.logger.Warn("failed to list agents for system prompt; proceeding without agent list", "error", err)
return basePrompt, nil
}The AnthropicAdapter almost certainly has the same pattern — worth checking for consistency. (Related: the TODO at line 637 calls this out as needing a shared helper.)
| // ToolExecutor and converts each entity.Tool to the port.ToolParam shape | ||
| // used on the wire. Mirrors the conversion in ConversationService so the | ||
| // schema fingerprint is identical to what real requests send. | ||
| func ToolParamsFromExecutor(exec port.ToolExecutor) ([]port.ToolParam, error) { |
There was a problem hiding this comment.
Duplicated conversion logic — fragile invariant
ToolParamsFromExecutor manually replicates the entity.Tool → port.ToolParam conversion that ConversationService.PrepareConversationRequest does at line 310-318 of conversation_service.go. The comment says "Mirrors the conversion… so the schema fingerprint is identical", but there's nothing enforcing this at compile time.
If someone adds a new field to ToolParam (e.g., CacheControl) and wires it in ConversationService, the warmup fingerprint silently diverges — defeating the purpose of the warmup.
Consider extracting the conversion to a shared helper in the port package, e.g.:
// EntityToolToParam converts a domain tool to its wire representation.
// Used by both ConversationService and the warmup path to guarantee
// the same schema fingerprint on every request.
func EntityToolToParam(t entity.Tool) ToolParam {
return ToolParam{
Name: t.Name,
Description: t.Description,
InputSchema: t.InputSchema,
Strict: t.Strict,
}
}Both sites can then call port.EntityToolToParam(t) and stay in sync automatically.
| // getSystemPrompt mirrors AnthropicAdapter.getSystemPrompt's priority order. | ||
| // | ||
| // TODO(refactor): extract to a shared helper consumed by both adapters. | ||
| // Duplicated for now to avoid touching anthropic_adapter.go in this PR. |
There was a problem hiding this comment.
TODO left in production code
This is the right call for scope-bounding the PR, but leaving a TODO in production code creates drift risk — the duplication isn't obvious from the diff. Recommend opening a follow-up issue and linking it from the comment so it gets tracked:
// TODO(#NNN): extract to a shared helper consumed by both adapters.
// Duplicated for now to avoid touching anthropic_adapter.go in this PR.Otherwise "avoid touching X in this PR" tends to stay that way indefinitely.
| // Returns nil on success, or a wrapping error if the provider call fails. | ||
| // The caller is expected to log+continue rather than fail startup: a cold | ||
| // cache is a latency penalty, not a correctness problem. | ||
| func WarmCache( |
There was a problem hiding this comment.
No unit tests for WarmCache or ToolParamsFromExecutor
warmup.go is the only new file without test coverage. Both functions are straightforward to test with a mock AIProvider and ToolExecutor:
WarmCachewith nil provider → errorWarmCachewith zero tools → skips callWarmCachewith a failing provider → wraps errorToolParamsFromExecutorwith nil executor → errorToolParamsFromExecutorwith a real executor → convertsStrictfield correctly
The Strict field propagation is especially worth a test since it's the core correctness property of this feature — a regression there would be silent.
| // in-flight warmup HTTP request finish cleanly under its own 3-minute | ||
| // timeout rather than tearing it midway. | ||
| webhookAdapter.SetWarmupRequired(true) | ||
| go runWarmup(context.Background(), container, webhookAdapter, container.Logger()) |
There was a problem hiding this comment.
Warmup goroutine lifetime vs. k8s termination grace period
The warmup goroutine uses context.Background() intentionally (documented in the PR), so a Ctrl+C during boot doesn't abort the in-flight Anthropic request mid-stream. That's a good design for dev/manual use.
In k8s, the sequence is: SIGTERM → terminationGracePeriodSeconds (typically 30s) → SIGKILL. If warmup is still running when SIGKILL fires (which it will be if the grammar compilation takes 30–100s), the goroutine is killed mid-request anyway. The practical effect is the same as if you'd used the cancellable context — but the rationale ("let the request finish cleanly") doesn't hold under SIGKILL.
Worth documenting this nuance in the comment, and potentially setting terminationGracePeriodSeconds in the k8s manifests to something > WarmCacheTimeout (3 minutes) if clean warmup completion is actually desired during rolling deploys.
| golang.org/x/text v0.35.0 // indirect | ||
| ) | ||
|
|
||
| replace github.com/firebase/genkit/go => github.com/Anthony-Bible/genkit/go v1.7.1-0.20260506115940-43381ac9eb96 |
There was a problem hiding this comment.
Fork via replace directive — build supply-chain risk
Pinning to github.com/Anthony-Bible/genkit via a replace directive is a pragmatic workaround while upstream review is pending, but it creates ongoing risk:
- Build stability: if the fork branch is force-pushed or the repo is renamed/deleted,
go mod downloadwill break for all contributors. - Security: dependencies of a personal fork aren't scanned by Dependabot by default.
- Auditability: reviewers/CI can't easily verify the fork's changes against upstream.
Recommended mitigations:
- Pin to an immutable commit SHA (already done with the long hash) ✓
- Document the upstream PR number in
go.modor a comment so the fork lifecycle is trackable - Set a calendar reminder / issue to re-evaluate once upstream merges
// replace pending upstream PR: https://github.com/firebase/genkit/pull/NNNN
replace github.com/firebase/genkit/go => github.com/Anthony-Bible/genkit/go v1.7.1-0...
|
Claude finished @Anthony-Bible's task in 4m 30s —— View job PR Review: feat/genkit-adapter-with-warmup
What Previous Reviews Got Fixed ✅Two significant items from prior reviews are now resolved:
Issues That RemainBug —
|
…up gating - GenkitAdapter.nameMapMu is now a *sync.RWMutex shared with clones, so concurrent first registration of tools across original+clones serializes on a single mutex instead of per-clone zero-value mutexes racing on the shared toolNameMap. - convertResponse mirrors AnthropicAdapter: when the model returns only tool calls (the common agentic-loop case) fall back to FinishReason rather than injecting "[No content received from AI]" into history. - serve gates SetWarmupRequired/runWarmup on AIProvider == genkit so AnthropicAdapter callers don't pay an unintentional startup API call and /ready delay. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @Anthony-Bible's task in 2m 55s —— View job PR Review: feat/genkit-adapter-with-warmup (round 4)
What Was Fixed in This Round ✅Commit Clone() data race — fixed.
// genkit_adapter.go ≈ anthropic_adapter.go — identical priority order
if content == "" && resp.FinishReason != "" { content = string(resp.FinishReason) } // "tool_use"
if content == "" && len(thinkingBlocks) > 0 { content = "[AI internal reasoning completed]" }
if content == "" { content = "[No content received from AI]" }Tool-only agentic turns now get Warmup fires unconditionally — fixed. One Remaining Bug
|
|
Claude finished @Anthony-Bible's task in 2m 21s —— View job PR Review: feat/genkit-adapter-with-warmup (round 5)
What Was Fixed in This Round ✅Commit
Complete Bug ScorecardAll four previously-open bugs are resolved:
Remaining Low-Risk ItemsThese have been flagged across multiple review rounds. None are blockers for correct behavior, but two have merge implications. No tests for
|
…ction Adds a parallel AIProvider backed by Firebase Genkit Go's Anthropic plugin, selectable via AGENT_AI_PROVIDER (default: anthropic). Per-tool strict-schema validation lets critical tools (bash, edit_file, task, delegate) opt in while others leave strict off to avoid rejected JSON-schema keywords. Extracts a genkitPluginAdapter interface + process-global registry so non-Anthropic Genkit plugins can be added by dropping in a file with RegisterGenkitPlugin. Adds AGENT_GENKIT_PLUGIN config (default: "anthropic"). Primes the AI grammar cache at startup with an async warmup goroutine so /ready returns 503 until the cache is warm — k8s rolling deploys won't route traffic to a cold pod. Warmup failures still flip the gate (cold cache = latency penalty, not a correctness problem). Fixes Clone race on shared toolNameMap (sync.RWMutex), tool-only response fallback, and graceful degradation when ListAgents fails. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
a8da18c to
003481c
Compare
|
Claude finished @Anthony-Bible's task in 1m 57s —— View job PR Review: feat/genkit-adapter-with-warmup (round 6)
SummaryThis commit is a squash of the entire branch. All four bugs fixed across rounds 2–5 are present in this squashed state. No new issues were introduced. Three low-risk items flagged across every previous round are still open — two are trivial one-liners, one (tests) requires ~30 lines of new code. Bug Scorecard — All Fixed ✅
Three Remaining Items (all low-risk, flagged every round)1. No tests for
|
Renames the per-tool strict opt-in to WithStrictSchema to match the updated genkit fork API. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Claude finished @Anthony-Bible's task in 1m 36s —— View job PR Review: feat/genkit-adapter-with-warmup (round 7)
What Changed in This RoundCommit New Minor Issue (introduced by b4b935d)Stale comment at The comment above // using ai.WithStrict(false) (provided by the fork) so opt-out of strictShould read Three Items Still Open (carried from round 6)1. No tests for
|
Summary
AIProviderimplementation backed by Firebase Genkit Go's Anthropic plugin (selectable viaAGENT_AI_PROVIDER=genkit; default remains the existingAnthropicAdapter).bash,edit_file,task, anddelegateopt in. Tools using JSON-schema keywords Anthropic rejects in strict mode (maxItems,minimum, etc.) leave strict off./readyreturns503 {\"status\":\"warming up\"}until the warmup completes so k8s rolling deploys don't route traffic to a cold pod.Motivation
Strict-mode tool schemas trigger Anthropic's grammar compilation on first use (observed ~30-100s in our tests, then ~2-3s once the 24h regional cache is warm). Without amortization, every pod restart's first real webhook pays full cold-cache cost.
Empirical proof the cache works once tool order is deterministic:
Implementation notes
github.com/Anthony-Bible/genkitviareplacedirective — the fork addsWithStrict(bool)ToolOption and round-trips the bit through action metadata soLookupToolpreserves it. Two upstream bugs fixed in the fork during this work: (1)strict:falsewas serialized explicitly when omitted; (2)LookupTooldropped the strict field.ListToolsnow sorts by name so the tool-array fingerprint stays stable across requests — required for Anthropic's cache key.context.Background()(not the cancellable cmd ctx) so a Ctrl+C during boot lets the in-flight request finish under its own 3-minute timeout rather than tearing it midway.Test plan
🤖 Generated with Claude Code