-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
Summary
The Codex provider should return execution metrics (tokenUsage, costUsd, durationMs) in its ProviderResponse so they can be used by evaluators and included in evaluation results.
Current State
Need to investigate what metrics data Codex CLI outputs in its JSON/JSONL format. The provider likely needs to:
- Parse usage/token data from Codex output events
- Calculate execution duration
- Optionally calculate cost if pricing info is available
Proposed Changes
In packages/core/src/evaluation/providers/codex.ts (or equivalent):
-
Parse token usage from Codex events
- Identify which Codex event contains usage data
- Extract input/output token counts
-
Calculate duration
const startTime = Date.now(); // ... execute Codex ... const durationMs = Date.now() - startTime;
-
Return metrics in ProviderResponse
return { raw: { ... }, outputMessages, tokenUsage, durationMs, };
Investigation Needed
- What format does Codex CLI use for usage/metrics in its output?
- Is there an equivalent to Pi's
agent_endevent with usage summary? - Does Codex report cost information?
Files to Update
- Codex provider file in
packages/core/src/evaluation/providers/
Acceptance Criteria
-
tokenUsageextracted from Codex output and returned in response -
durationMscalculated from wall-clock execution time - Metrics appear in evaluation results when using Codex provider
- Unit tests for metric extraction
Labels
enhancement, needs-investigation
Metadata
Metadata
Assignees
Labels
No labels