-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
The Pi Coding Agent provider should return execution metrics (tokenUsage, costUsd, durationMs) in its ProviderResponse so they can be used by evaluators and included in evaluation results.
Current State
The provider already captures some of this data but doesn't surface it:
usagedata is extracted from Pi'sagent_endevent and stored inmetadata.usageon output messages- Execution duration is tracked for logging but not returned
- Cost is not calculated
Proposed Changes
In packages/core/src/evaluation/providers/pi-coding-agent.ts:
-
Parse token usage from
agent_endevent// In extractOutputMessages or new function const usage = agentEndEvent.usage; // { input_tokens, output_tokens, ... } const tokenUsage = { input: usage.input_tokens, output: usage.output_tokens, cached: usage.cached_tokens, // if available };
-
Calculate duration
const startTime = Date.now(); // ... execute Pi ... const durationMs = Date.now() - startTime;
-
Return metrics in ProviderResponse
return { raw: { ... }, outputMessages, tokenUsage, durationMs, // costUsd: optional, requires pricing info };
Files to Update
packages/core/src/evaluation/providers/pi-coding-agent.ts
Acceptance Criteria
-
tokenUsageextracted from Pi's usage data and returned in response -
durationMscalculated from wall-clock execution time - Metrics appear in evaluation results when using Pi provider
- Unit tests for metric extraction
Labels
enhancement, good-first-issue
Metadata
Metadata
Assignees
Labels
No labels