Skip to content

feat: add execution metrics support to Codex provider #99

@christso

Description

@christso

Summary

The Codex provider should return execution metrics (tokenUsage, costUsd, durationMs) in its ProviderResponse so they can be used by evaluators and included in evaluation results.

Current State

Need to investigate what metrics data Codex CLI outputs in its JSON/JSONL format. The provider likely needs to:

  • Parse usage/token data from Codex output events
  • Calculate execution duration
  • Optionally calculate cost if pricing info is available

Proposed Changes

In packages/core/src/evaluation/providers/codex.ts (or equivalent):

  1. Parse token usage from Codex events

    • Identify which Codex event contains usage data
    • Extract input/output token counts
  2. Calculate duration

    const startTime = Date.now();
    // ... execute Codex ...
    const durationMs = Date.now() - startTime;
  3. Return metrics in ProviderResponse

    return {
      raw: { ... },
      outputMessages,
      tokenUsage,
      durationMs,
    };

Investigation Needed

  • What format does Codex CLI use for usage/metrics in its output?
  • Is there an equivalent to Pi's agent_end event with usage summary?
  • Does Codex report cost information?

Files to Update

  • Codex provider file in packages/core/src/evaluation/providers/

Acceptance Criteria

  • tokenUsage extracted from Codex output and returned in response
  • durationMs calculated from wall-clock execution time
  • Metrics appear in evaluation results when using Codex provider
  • Unit tests for metric extraction

Labels

enhancement, needs-investigation

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions