Summary
The Codex provider should return execution metrics (tokenUsage, costUsd, durationMs) in its ProviderResponse so they can be used by evaluators and included in evaluation results.
Current State
Need to investigate what metrics data Codex CLI outputs in its JSON/JSONL format. The provider likely needs to:
- Parse usage/token data from Codex output events
- Calculate execution duration
- Optionally calculate cost if pricing info is available
Proposed Changes
In packages/core/src/evaluation/providers/codex.ts (or equivalent):
-
Parse token usage from Codex events
- Identify which Codex event contains usage data
- Extract input/output token counts
-
Calculate duration
const startTime = Date.now();
// ... execute Codex ...
const durationMs = Date.now() - startTime;
-
Return metrics in ProviderResponse
return {
raw: { ... },
outputMessages,
tokenUsage,
durationMs,
};
Investigation Needed
Files to Update
- Codex provider file in
packages/core/src/evaluation/providers/
Acceptance Criteria
Labels
enhancement, needs-investigation
Summary
The Codex provider should return execution metrics (
tokenUsage,costUsd,durationMs) in itsProviderResponseso they can be used by evaluators and included in evaluation results.Current State
Need to investigate what metrics data Codex CLI outputs in its JSON/JSONL format. The provider likely needs to:
Proposed Changes
In
packages/core/src/evaluation/providers/codex.ts(or equivalent):Parse token usage from Codex events
Calculate duration
Return metrics in ProviderResponse
Investigation Needed
agent_endevent with usage summary?Files to Update
packages/core/src/evaluation/providers/Acceptance Criteria
tokenUsageextracted from Codex output and returned in responsedurationMscalculated from wall-clock execution timeLabels
enhancement, needs-investigation