Skip to content

Distributed cost reporting#967

Merged
lchoquel merged 29 commits into
devfrom
fix/For-API-update
Jun 7, 2026
Merged

Distributed cost reporting#967
lchoquel merged 29 commits into
devfrom
fix/For-API-update

Conversation

@lchoquel

@lchoquel lchoquel commented Jun 6, 2026

Copy link
Copy Markdown
Member

What & why

This branch turns cost reporting from a leaky per-run in-process buffer into an event-sourced artifact on the run result, and in doing so fixes the UsageRegistry success-path leak and wires up distributed (cross-worker) cost reporting — they were two facets of one lifecycle (the leak is the missing close; distributed reporting is the missing replay-populate of the same per-run buffer).

The full as-built record, phase by phase with cold-start notes, is in TODOS.md — read it as the guide to this branch. The locked design is wip/registry/registry-lifecycle-synthesis.md.

The five load-bearing changes

  1. Leak fixed by removal. The UsageRegistry (and its open_registry/close_registry lifecycle, inject_tokens_usages, generate_report, the live-add trio) is gone. It was opened in pipeline_run_setup but closed only on the failure path, so every successful run leaked its registry — and in a long-lived process (the Pipelex API) reusing a pipeline_run_id could collide on the orphan. With nothing to open, the leak is structurally impossible.

  2. Usage rides on PipeOutput. New tokens_usages: list[AnyTokensUsage] | None + usage_assembly_error: str | None, mirroring graph_spec / graph_assembly_error. Usage is assembled from the trace-event stream at end of run, in both DIRECT and TEMPORAL modes — so a cross-worker run carries its true total cost back on the result, exposed automatically wherever a PipeOutput is returned (including the Pipelex API response).

  3. --costs / is_generate_costs (default on) decouples cost collection from --graph. Both ride the shared event-log transport: --graph gates graph events + GraphSpecAssembler; --costs gates UsageReportEvents + UsageAggregator. --cost-report is removed (folded into --costs; no back-compat, per repo policy).

  4. Event-sourced renderer. The only cost renderer now reads pipe_output.tokens_usages via CostRegistry.generate_report(tokens_usages=...) (main CLI through reporting/cost_report_renderer.py). The Temporal act_assemble_graph activity is renamed act_assemble_tracing and assembles both graph and usage from a single event read. is_log_costs_to_console now defaults true (CLI shows the table by default when --costs is on); per-inference-job console logging is removed (report renders once at end of run). Dry runs are suppressed (zero tokens + zero cost); free/zero-price model runs (e.g. Ollama) still report token usage with a zero total.

  5. --mock-inference. A LIVE run that fakes AI calls at the inference leaf with reportable non-zero synthetic usage — operators dispatch real activities (exercising the true distributed path) but no tokens are billed. It's the cheap, deterministic way to validate distributed cost reporting (and the temporal-e2e-validate skill's new Tier 8b consumes it). Mutually exclusive with --dry-run. Covers the LLM leaf; image-gen/extract/search under --mock-inference fail loud with MockInferenceUnsupportedError rather than silently spending.

Breaking changes (no back-compat, per repo policy)

  • --cost-report/--no-cost-report removed from run pipe|method|bundle → use --costs (default on).
  • ReportingProtocol / ReportingNoOp no longer expose open_registry / close_registry / generate_report / inject_tokens_usages; the UsageRegistry model is gone. Embedders that rendered via get_report_delegate().generate_report() must render from the returned pipe_output.tokens_usages.
  • is_log_costs_to_console default flipped false → true (in pipelex.toml, .pipelex/pipelex.toml, and the kit/configs client template).

Testing

make agent-check (ruff/plxt clean, pyright 0/0, mypy clean) and full make agent-test green. Distributed cost reporting is validated without spend: test_split_worker_usage.py (read-back aggregation), test_mock_inference_temporal.py (--mock-inference cross-process), test_direct_tracing_assembly.py / test_cost_report_rendering.py (DIRECT). The temporal-e2e-validate skill's Tier 8b is the manual 3-process counterpart.

Cross-repo (deferred to release pin-bump — not in this PR)

cocode and pipelex-api pin a released pipelex (registry still present there). They don't break on this branch; their migration (swe_cmd.pyCostRegistry.generate_report(tokens_usages=...), plus API leak verification) lands when the pin bumps to a build carrying these changes. See TODOS Phase 3/4 cross-repo boxes.

Deferred non-goals

Consolidated in wip/registry/deferred-followups.md (D5 costs-only tracer skip, cost-per-node correlation, the GraphContext.from_execution_config factory, T5, T3 request-scoped tracing, --mock-inference per-operator coverage) and the two open design decisions in wip/registry/cost-report-deferred-decisions.md.

🤖 Generated with Claude Code


Summary by cubic

Rebuilt cost reporting as an event‑sourced artifact on PipeOutput, fixing the success‑path UsageRegistry leak and enabling cross‑worker cost aggregation. Hardening: trace reads now catch botocore.ClientError and botocore.BotoCoreError, and both event‑log emission and cost‑report rendering failures are best‑effort so successful runs don’t fail; obsolete registry docs were removed and the leak fix is documented.

  • New Features

    • Usage rides on pipe_output.tokens_usages (+ usage_assembly_error), assembled from trace events in DIRECT and TEMPORAL; multi‑worker runs aggregate into one report.
    • --costs/--no-costs (default on) decouples usage from --graph; costs‑only skips graph assembly and heavy payload serialization.
    • Unified renderer reads tokens_usages; console prints once at end when enabled, suppressed for dry runs (0 tokens) and when usage assembly/aggregation fails. CSV/summary errors are logged and skipped.
    • Agent CLI attaches a best‑effort cost_report JSON (omitted for dry runs, --no-costs, API‑runner path, or if local summary aggregation fails). Free/zero‑price models report tokens with a 0 total.
    • Temporal switched to act_assemble_tracing to assemble graph and costs from a single read; unexpected errors propagate so WfPipeRun records them on *_assembly_error.
    • --mock-inference: LLM leaf returns synthetic, non‑zero usage without provider calls; non‑LLM leaves raise MockInferenceUnsupportedError. Mocked objects are re‑validated; failures raise MockInferenceObjectFidelityError.
    • Event‑log writes are best‑effort: emit failures are downgraded to warnings so runs don’t fail.
  • Migration

    • Replace --cost-report with --costs (default on). Config adds is_generate_usage (was is_generate_costs). Console default is_log_costs_to_console = true.
    • GraphContext renamed to TraceContext; JobMetadata.graph_contextJobMetadata.trace_context. The shared context carries both graph and usage streams; costs‑only still mints node IDs but does not assemble a GraphSpec.
    • Stop using get_report_delegate().open/close_registry and .generate_report(). Render from pipe_output.tokens_usages via CostRegistry.generate_report(...) or render_run_cost_report(...).
    • Use PipelineExecutionConfig.with_execution_overrides(...) (replaces with_graph_config_overrides(...)).
    • Temporal: act_assemble_graph renamed to act_assemble_tracing.
    • PipelexRunner.__init__ is now keyword‑only; pass all constructor args by name.

Written for commit 67fcf0f. Summary will update on new commits.

Review in cubic

lchoquel and others added 16 commits June 6, 2026 14:29
…to prevent collisions

fix(log_dispatch): append traceback to console message when include_exception is True

test(bundle_validator): ensure consecutive sweeps use distinct registry IDs and do not collide

test(bundle_validator): verify registry is opened once and closed in finally for successful runs

test(log_dispatch): add tests for exception logging behavior in console messages

docs(init): add non-interactive `pipelex init --yes` mode for CI compatibility

docs(registry-leak): document `UsageRegistry` leak on success path of full pipeline run
- Introduced `--costs` option in CLI commands (`run_bundle_cmd`, `run_method_cmd`, `run_pipe_cmd`) to enable or disable cost reporting.
- Updated execution configuration to include `is_generate_costs` flag for controlling cost reporting behavior.
- Enhanced `GraphContext` to manage emission of usage events independently from graph events.
- Modified pipeline run setup to initialize tracing context based on graph and cost reporting flags.
- Implemented tests to validate the behavior of cost reporting and event emission gates in various scenarios.
- Updated configuration files to reflect new cost reporting settings.
…nsure correctness across contexts

docs: update references to with_execution_overrides in execution graph tracing documentation

chore(TODOS): document implementation plan and review findings for Phase 1 emit decoupling
- Replace `act_assemble_graph` with `act_assemble_tracing` to unify tracing assembly logic.
- Remove the `act_assemble_graph` activity and its associated tests.
- Introduce `act_assemble_tracing` to handle both graph specification and usage aggregation from trace events.
- Update `pipeline_run_setup` to correctly propagate emit flags to the new tracing activity.
- Modify `WfPipeRun` and `WfPipeRouter` to accommodate changes in tracing assembly logic.
- Add integration tests for tracing assembly scenarios, ensuring correct population of `tokens_usages` and `graph_spec`.
- Implement unit tests for `TracingAssembly` to validate behavior under various conditions, including error handling.
… commands

- Introduced cost reporting in the agent CLI, allowing for structured cost summaries in JSON output.
- Removed deprecated cost report flags from CLI commands and replaced them with a unified cost reporting mechanism.
- Updated the CostRegistry to aggregate costs by model and compute total costs from token usages.
- Implemented a new cost report renderer to handle console and CSV outputs based on configuration settings.
- Added integration tests to ensure correct cost report generation for both non-zero and zero-cost runs.
- Updated configuration files to enable cost logging to console by default.
- Refactored existing code to streamline cost reporting logic and improve maintainability.
- Updated the cost reporting logic to ensure that runs using free models with real tokens still generate a cost report, reflecting a total cost of 0.
- Enhanced the `CostRegistry` to aggregate costs and determine reportability based on token usage and cost, ensuring that dry runs (zero tokens and zero cost) are the only cases that suppress reports.
- Modified the `render_run_cost_report` function to check for reportable usage before generating output.
- Added tests to verify that free models are reported correctly and that dry runs do not produce reports.
- Documented the changes in the relevant markdown files to clarify the behavior of cost reporting in different scenarios.
- Removed unnecessary assertions and code related to orphan registries in ReportingManager tests.
- Updated tests to reflect the removal of per-run usage registries, ensuring that reporting functions correctly without accumulating orphan registries.
- Simplified test setup for JobMetadata and workflow run IDs, eliminating the need for explicit registry opening and closing.
- Adjusted assertions in tests to verify correct behavior of reporting events and usage without relying on registry state.
- Enhanced clarity in test descriptions and streamlined the overall test structure for better maintainability.
…inference` CLI option for enhanced cost reporting
…reporting

- Introduced dry_mock.py to handle synthetic inference jobs for both --dry-run and --mock-inference modes.
- Added mock_llm_gen_text, mock_llm_gen_object, and mock_llm_gen_object_list functions to generate mock outputs and report usage.
- Updated llm_generate.py to route calls to mock functions based on job_metadata.is_mock_inference flag.
- Enhanced JobMetadata to include is_mock_inference attribute for tracking mock inference state.
- Modified pipeline execution setup to propagate is_mock_inference flag through job metadata.
- Created integration tests for --mock-inference in both direct and temporal contexts to ensure correct behavior and reporting.
- Added unit tests for dry_mock functions to validate synthetic job reporting and usage aggregation.
- Updated CLI commands to declare --mock-inference option and enforce mutual exclusivity with --dry-run.
…tions

- Introduced `MockInferenceUnsupportedError` to prevent real provider calls under `--mock-inference` for image generation, document extraction, and web search.
- Updated relevant leaf functions (`img_gen_single_image`, `img_gen_image_list`, `extract_gen_pages`, and `PipeSearch`) to raise the new error when `is_mock_inference` is true.
- Added integration and unit tests to ensure the guard functions correctly and prevents unnecessary spending.
- Documented the new error in the errors index and created a dedicated page for `MockInferenceUnsupportedError`.
- Updated TODOs and follow-up documentation to reflect the changes and future work on full leaf coverage.
…rence fidelity

- Added `validate_run_flag_combination` function to enforce legal combinations of `--dry-run`, `--mock-inference`, and `--mock-inputs` flags across `pipe`, `method`, and `bundle` commands.
- Refactored command implementations in `bundle_cmd.py`, `method_cmd.py`, and `pipe_cmd.py` to utilize the new validation function, removing redundant checks.
- Introduced `_revalidate_against_object_class` function in `content_generator.py` to ensure fidelity of mock inference objects against their original classes, raising `MockInferenceObjectFidelityError` for validation failures.
- Updated `ContentGeneratorInWorkflow` to include fidelity checks for mock inference objects.
- Created integration tests for mock inference fidelity and unit tests for run flag combination validation.
- Added detailed error handling and messaging for mock inference fidelity issues.
… docs

Phase 6 — temporal-e2e-validate skill: add Tier 8b (cross-worker cost
report assembly). Uses split router+runner workers + --mock-inference so it
is free and deterministic — asserts the submitter renders a single
non-suppressed cost report from runner-side usage, with a --no-costs negative
arm and an opt-in live arm. Updates the skill description/triggers, the
master results table, and reconciles the Tier 8 note to point at
--mock-inference as the cheap deterministic path.

Phase 7 — docs/changelog/wrap:
- CHANGELOG [Unreleased]: --costs/--no-costs (default on), --cost-report
  removal, distributed cost reporting, tokens_usages on PipeOutput,
  --mock-inference, agent-CLI cost_report JSON, is_log_costs_to_console
  default flip, dry-run suppression, act_assemble_tracing rename, and the
  UsageRegistry success-path leak fix.
- wip docs flipped to as-built: distributed-execution T2 -> FIXED,
  P0.1 + P1 -> DONE; registry README -> IMPLEMENTED. New
  wip/registry/deferred-followups.md consolidates the deferred non-goals.
- TODOS.md now reads as the as-built PR-review guide to the branch.

Markdown-only; make agent-check green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@greptile-apps

greptile-apps Bot commented Jun 6, 2026

Copy link
Copy Markdown

Greptile Summary

This PR moves cost reporting onto the run result and makes it work across distributed execution. It changes:

  • Adds event-sourced usage aggregation onto PipeOutput.tokens_usages.
  • Replaces the in-process usage registry with trace-event replay for cost reports.
  • Adds the default-on --costs path, independent from graph generation.
  • Adds --mock-inference for no-spend validation of distributed usage reporting.
  • Renames the shared graph/usage plumbing to TraceContext and is_generate_usage.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.

Reviews (11): Last reviewed commit: "refactor(tracing): rename GraphContext →..." | Re-trigger Greptile

Comment thread pipelex/temporal/tprl_pipe/act_assemble_tracing.py Outdated

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 51c5c85606

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pipelex/reporting/reporting_manager.py Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 124 files

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic

Comment thread pipelex/temporal/tprl_pipe/act_assemble_tracing.py Outdated
Comment thread pipelex/cli/agent_cli/commands/run/pipe_cmd.py
Comment thread pipelex/cli/agent_cli/commands/run/_run_core.py Outdated
Comment thread pipelex/pipeline/execution_seams.py Outdated
Three real findings fixed; two design tradeoffs documented as deferred.

- act_assemble_tracing no longer swallows unexpected errors (greptile + cubic
  P1). Expected best-effort failures are still caught inside assemble_tracing
  and returned on the *_assembly_error fields; a genuinely unexpected error now
  propagates so the activity fails and WfPipeRun's existing `except ActivityError`
  branch records it on usage_assembly_error / graph_assembly_error — symmetric
  with DIRECT mode, instead of the workflow silently succeeding with no cost
  report and no diagnostic. Pinned by a new unit test.
- Agent CLI local cost-summary block is now wrapped to mirror the main CLI's
  render_run_cost_report guard (cubic P2): CostRegistry.build_cost_summary can
  raise CostRegistryError (a PipelexError); catch + log so a reporting-side
  error never fails an otherwise-successful run.
- execution_seams.prepare_pipe_job docstring corrected (cubic P3): --mock-inference
  fakes the LLM leaf only; non-LLM leaves raise MockInferenceUnsupportedError.

Deferred (documented, not patched):
- Agent `--costs` is a silent no-op in API mode → #6b in
  wip/registry/cost-report-deferred-decisions.md (facet of #6; needs the
  tri-state param first).
- Activity usage write can mutate a workflow-registered buffer in a single-process
  Temporal deployment → pre-existing, T3 territory; documented in
  wip/registry/deferred-followups.md (split router+runner topology doesn't hit it).

make agent-check + full make agent-test green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lchoquel

lchoquel commented Jun 6, 2026

Copy link
Copy Markdown
Member Author

Review round 1 addressed (commit 51b7e3f)

Thanks for the reviews. Triage:

Fixed

  • act_assemble_tracing swallowed unexpected errors (greptile + cubic, P1): the activity no longer degrades unexpected errors to an empty result — they propagate so the activity fails and WfPipeRun's existing except ActivityError branch records them on usage_assembly_error / graph_assembly_error, symmetric with DIRECT mode. Expected best-effort failures stay caught inside assemble_tracing (no retry storm). New pinning test: test_act_assemble_tracing_error_propagation.py.
  • Agent local cost summary unguarded (cubic, P2): wrapped to mirror the main CLI's render_run_cost_report guard — a CostRegistryError no longer fails an otherwise-successful run.
  • execution_seams docstring overstated --mock-inference scope (cubic, P3): corrected to "LLM leaf only; non-LLM leaves raise MockInferenceUnsupportedError".

Deferred (documented design decisions, not silent bugs)

  • --costs no-op in agent API mode (cubic, P2 + greptile summary): facet of the existing Feature/pipelex init base config #6 — the agent costs param is hard-default True, so "reject as unsupported" needs the tri-state param first. Documented as #6b in wip/registry/cost-report-deferred-decisions.md.
  • Activity usage write mutating a workflow-registered buffer (codex, P1): pre-existing (the fast path predates this PR), only affects single-process Temporal deployments (the split router+runner topology takes the per-process fallback), and is T3-refactor territory. Documented in wip/registry/deferred-followups.md.

make agent-check + full make agent-test green.

@greptileai @cubic-dev-ai please re-review.

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 6, 2026

Copy link
Copy Markdown

Review round 1 addressed (commit 51b7e3f)

Thanks for the reviews. Triage:

Fixed
...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 125 files

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic

Comment thread pipelex/reporting/cost_report_renderer.py Outdated
Comment thread pipelex/cli/agent_cli/CLAUDE.md Outdated
…nd 2)

The agent CLI's CLAUDE.md overstated when `cost_report` appears — round-1's
build_cost_summary guard means it can now be absent on a real run too. Doc now
states it is best-effort and absent for: dry runs, --no-costs, the API-runner
path, or when cost-summary aggregation fails (caught + skipped). (cubic P3)

The other round-2 finding (UnicodeEncodeError in CSV write, cubic P2) is a false
positive: save_to_csv writes encoding="utf-8", which encodes every valid Python
str, so UnicodeEncodeError cannot occur — adding it would be a speculative catch.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lchoquel

lchoquel commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Review round 2 addressed (commit 73ac388)

  • Doc overstated cost_report availability (cubic, P3): fixed. The agent CLI CLAUDE.md now says the cost_report is best-effort and absent for dry runs, --no-costs, the API-runner path, or when cost-summary aggregation fails (caught + skipped) — consumers treat it as optional. This became inaccurate after round-1's summary guard.
  • UnicodeEncodeError in CSV write (cubic, P2): not a real path. save_to_csv writes encoding="utf-8", which encodes every valid Python str, so UnicodeEncodeError cannot occur for cost records — adding it would be a speculative catch (our standards forbid catching exceptions that can't be raised). Declined with rationale on the thread.

CI green on the prior commit; this change is doc-only.

@greptileai @cubic-dev-ai please re-review.

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review round 2 addressed (commit 73ac388)

  • Doc overstated cost_report availability (cubic, P3): fixed. The agent CLI CLAUDE.md now says the cost_report is best-effort and absent for dry runs, --no-costs, the API-runner path, or when cost-summary aggregation fails (caught + skipped) — consumers treat it as optional. This became inaccurate after round-1's summary guard.
  • UnicodeEncodeError in CSV write (cubic, P2): not a real path. save_to_csv writes encoding="utf-8", which encodes every valid Python str, so UnicodeEncodeError cannot occur for cost records — adding it would be a speculative catch (our standards forbid catching exceptions that can't be raised). Declined with rationale on the thread.

...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 125 files

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic

Comment thread CHANGELOG.md Outdated
…ound 3)

CHANGELOG's agent cost_report entry read as "always attached"; it is
best-effort/optional (absent for dry runs, --no-costs, API-runner path, or an
aggregation failure). Matches the agent_cli CLAUDE.md correction from round 2.
(cubic P2)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lchoquel

lchoquel commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Review round 3 addressed (commit 72692b0)

  • CHANGELOG overstated agent cost_report availability (cubic, P2): fixed — the entry now marks it best-effort/optional (absent for dry runs, --no-costs, API-runner path, or aggregation failure), matching the CLAUDE.md correction from round 2. Doc-only.

@greptileai @cubic-dev-ai please re-review.

- Cost-report guard now also catches UnicodeEncodeError (cubic P2). The guard's
  contract is "cost reporting never fails a successful run"; it already caught
  OSError (a CSV-write failure). The CSV is written UTF-8, which can raise
  UnicodeEncodeError on a lone-surrogate string — same write-failure category,
  so it belongs in the same catch. (Reverses the round-2 decline: the earlier
  "impossible with UTF-8" rationale was imprecise.) Pinned by a parametrized
  test (UnicodeEncodeError / CostRegistryError / OSError all degrade to a warning).
- execution_seams docstring: "LLM cogt leaf" -> "LLM inference-leaf" (cubic P3).
  `cogt` is the package name, not a typo, but the jargon read as one; reworded
  for clarity.
- reporting-config.md: console cost-table is printed only when the run had
  reportable usage — suppressed for dry runs and on an assembly/aggregation
  failure (cubic P3).

Deferred (documented): --no-costs no-op in agent API mode (cubic P2) is #6b in
wip/registry/cost-report-deferred-decisions.md.

make agent-check + targeted reporting/cost tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lchoquel

lchoquel commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Review round 4 addressed (commit 0ddd65c)

  • Cost-report guard missed UnicodeEncodeError (cubic P2): fixed — added it to the guard. Reversing my round-2 decline: UTF-8 can raise it on a lone surrogate, and this guard already catches OSError (a CSV-write failure), so the same write-failure category belongs there. New parametrized test pins it.
  • cogt reads as a typo (cubic P3): fixed — reworded to "LLM inference-leaf" (cogt is the package name, not a typo, but dropped the jargon for clarity).
  • reporting-config.md overstated console table (cubic P3): fixed — now notes it prints only when the run had reportable usage.
  • --no-costs no-op in agent API mode (cubic P2): deferred — same root as Feature/pipelex init base config #6/#6b (documented); needs the tri-state refactor + a product call, not a one-off patch.

make agent-check + targeted reporting/cost tests green.

@greptileai @cubic-dev-ai please re-review.

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review round 4 addressed (commit 0ddd65c)

  • Cost-report guard missed UnicodeEncodeError (cubic P2): fixed — added it to the guard. Reversing my round-2 decline: UTF-8 can raise it on a lone surrogate, and this guard already catches OSError (a CSV-write failure), so the same write-failure category belongs there. New parametrized test pins it.
  • cogt reads as a typo (cubic P3): fixed — reworded to "LLM inference-leaf" (cogt is the package name, not a typo, but dropped the jargon for clarity).
  • reporting-config.md overstated console table (cubic P3): fixed — now notes it prints only when the run had reportable usage.
    ...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 125 files

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic

Comment thread pipelex/cli/agent_cli/commands/run/method_cmd.py
Comment thread pipelex/pipeline/runner.py
Comment thread pipelex/reporting/cost_report_renderer.py Outdated
- PipelexRunner.__init__ is now keyword-only (cubic P2). is_mock_inference was
  added mid-signature; making the constructor keyword-only (all call sites
  already pass keywords) removes the positional-shift footgun for good and aligns
  with the codebase's keyword-only-args direction.
- cost_report_renderer module docstring corrected (cubic P3): only the main CLI
  calls render_run_cost_report; the agent CLI builds a machine-readable
  cost_report via CostRegistry.build_cost_summary instead (no Rich table).

Deferred (documented): --no-costs no-op in agent API mode, re-flagged on
method_cmd — #6b in wip/registry/cost-report-deferred-decisions.md.

make agent-check + pipeline integration tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lchoquel

lchoquel commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Review round 5 addressed (commit 0f92c37)

  • is_mock_inference added mid-signature (cubic P2): fixedPipelexRunner.__init__ is now keyword-only (*). All call sites already pass keywords, so non-breaking internally, and it removes the positional-shift footgun for good (aligns with the keyword-only-args direction).
  • renderer module docstring claimed both CLIs call it (cubic P3): fixed — only the main CLI calls render_run_cost_report; the agent CLI uses CostRegistry.build_cost_summary for its JSON envelope.
  • --no-costs API no-op (cubic P2, re-flagged on method_cmd): deferred — #6b, needs the tri-state refactor + product call.

make agent-check + pipeline integration tests green.

@greptileai @cubic-dev-ai please re-review.

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review round 5 addressed (commit 0f92c37)

  • is_mock_inference added mid-signature (cubic P2): fixedPipelexRunner.__init__ is now keyword-only (*). All call sites already pass keywords, so non-breaking internally, and it removes the positional-shift footgun for good (aligns with the keyword-only-args direction).
  • renderer module docstring claimed both CLIs call it (cubic P3): fixed — only the main CLI calls render_run_cost_report; the agent CLI uses CostRegistry.build_cost_summary for its JSON envelope.
  • --no-costs API no-op (cubic P2, re-flagged on method_cmd): deferred — #6b, needs the tri-state refactor + product call.
    ...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 125 files

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic

Comment thread pipelex/cli/agent_cli/commands/run/method_cmd.py
@lchoquel

lchoquel commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Review converged (commit 0f92c37)

Round 6 surfaced only a re-flag of the already-documented-deferred #6b (--no-costs no-op in agent API mode) — no new actionable finding. All review threads are resolved and CI is green across Tests (py3.10–3.14), Typecheck, Lint, Greptile, and cubic.

Summary of what was addressed across rounds:

  • Real fixes: act_assemble_tracing no longer swallows unexpected errors (P1, + pinning test); agent cost-summary guarded; cost-report guard also catches UnicodeEncodeError; PipelexRunner.__init__ made keyword-only; several doc-accuracy corrections (cost_report best-effort, renderer scope, mock-inference scope).
  • Deferred design decision (documented): Feature/pipelex init base config #6/#6b — the agent CLI's --costs/--no-costs semantics across the local vs remote API runner. Needs the tri-state-param refactor + a product call on API-response cost surfacing; tracked in wip/registry/cost-report-deferred-decisions.md.

Proceeding to final pre-merge review.

…/review)

Final pre-merge /review (Claude + Codex adversarial passes) found one real
best-effort violation: assemble_tracing's read-error catch omitted botocore
ClientError, so in DIRECT mode with the DynamoDB backend a transient DDB error
(throttle/auth/network) during tracing assembly would fail an otherwise-successful
run — the sibling reporting_manager fallback already handles it. Added the
conditional botocore catch (mirrors reporting_manager) so the read degrades to an
*_assembly_error note. Pinned by test_read_botocore_clienterror_is_caught
(pytest.importorskip-guarded).

Flagged + deferred (pre-existing, separate concerns; documented in wip/registry/):
- Cost report ignores reasoning/audio/prediction token costs (under-reports total)
  → deferred-followups.md (cost-model change, not this PR's scope).
- DIRECT vs TEMPORAL object re-validation use different model_dump modes
  → deferred-followups.md.
- Agent API-mode flag handling is inconsistent (--graph/--costs silently ignored
  while --dry-run/--mock-inputs are rejected) → broadened #6b in
  cost-report-deferred-decisions.md (holistic fix, not a partial --no-costs patch).

make agent-check + targeted pipe_run/reporting/pipeline tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lchoquel

lchoquel commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Final pre-merge review (gstack /review) — commit 474b88e

Ran independent Claude + Codex adversarial passes over the full diff.

Fixed (1 real best-effort violation):

  • assemble_tracing caught OSError/PipelexError but not botocore ClientError — in DIRECT mode with the DynamoDB backend a transient DDB error (throttle/auth/network) during tracing assembly would fail an otherwise-successful run. Added the conditional botocore catch (mirrors the reporting_manager fallback) so it degrades to a *_assembly_error note. Pinned by a new importorskip-guarded test.

Flagged + deferred (pre-existing, separate concerns — documented in wip/registry/, not expanded into this PR):

  • Cost report ignores reasoning/audio/prediction token costs (under-reports total) — pre-existing cost-model gap.
  • DIRECT vs TEMPORAL object re-validation use different model_dump modes.
  • Agent API-mode flag handling is inconsistent (--graph/--costs silently ignored while --dry-run/--mock-inputs are rejected) — the holistic framing of #6b; a partial --no-costs patch would worsen it.

Confirmed fine: exactly-once usage emit (fast-path XOR fallback), wf_pipe_router clears its context in finally, the log_dispatch traceback change is a genuine latent-bug fix, console-table default flip is intended (D6), mock free-model usage stays reportable.

make agent-check + targeted suites green. This was the final review pass.

@lchoquel

lchoquel commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Final commit 474b88ec adds one small defensive fix from the /review pass (botocore ClientError in assemble_tracing's best-effort read catch) plus deferred-item docs. CI is green across all Python versions. @greptileai @cubic-dev-ai one final re-review on this commit, please.

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Final commit 474b88ec adds one small defensive fix from the /review pass (botocore ClientError in assemble_tracing's best-effort read catch) plus deferred-item docs. CI is green across all Python versions. @greptileai @cubic-dev-ai one final re-review on this commit, please.

@lchoquel I have started the AI code review. It will take a few minutes to complete.

Comment thread pipelex/pipe_run/tracing_assembly.py
Comment thread pipelex/pipe_run/tracing_assembly.py Outdated

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 125 files

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic

Comment thread pipelex/pipe_run/tracing_assembly.py
… (PR #967 review round 7)

Greptile + cubic correctly flagged that the round-6 /review botocore fix was
incomplete: it caught ClientError (service-side throttle/auth) but not BotoCoreError
(transport/credential/timeout — EndpointConnectionError, ReadTimeoutError,
NoCredentialsError, ...), which is a SEPARATE base class, not a ClientError subclass.
A DynamoDB transport failure would still escape the best-effort read/emit catch and
fail an otherwise-successful run. Now both sibling classes are caught in both
assemble_tracing (read) and reporting_manager's runner fallback (emit) — matching
s3_storage_provider, which already catches the pair. botocore is untyped so the
imported classes are Unknown; cast to type[BaseException] keeps the except-tuple
(and the bound exception var) cleanly typed. Test extended to cover both
ClientError and a BotoCoreError subclass (EndpointConnectionError).

make agent-check (pyright 0/0, mypy 2026 files) + tracing/reporting tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@lchoquel

lchoquel commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

Review round 7 addressed (commit 5372b23)

Both bots correctly caught that my round-6 /review fix was incomplete: it caught botocore ClientError but not BotoCoreError (a sibling base class covering transport/credential/timeout — EndpointConnectionError, ReadTimeoutError, NoCredentialsError). Now both sibling classes are caught in both assemble_tracing (read) and reporting_manager's runner fallback (emit), matching s3_storage_provider which already catches the pair. Test extended to cover a BotoCoreError subclass.

make agent-check (pyright 0/0, mypy 2026 files) + tracing/reporting tests green. Genuinely useful catch — thanks.

@greptileai @cubic-dev-ai final re-review on this commit, please.

@cubic-dev-ai

cubic-dev-ai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review round 7 addressed (commit 5372b23)

Both bots correctly caught that my round-6 /review fix was incomplete: it caught botocore ClientError but not BotoCoreError (a sibling base class covering transport/credential/timeout — EndpointConnectionError, ReadTimeoutError, NoCredentialsError). Now both sibling classes are caught in both assemble_tracing (read) and reporting_manager's runner fallback (emit), matching s3_storage_provider which already catches the pair. Test extended to cover a BotoCoreError subclass.

make agent-check (pyright 0/0, mypy 2026 files) + tracing/reporting tests green. Genuinely useful catch — thanks.
...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

@cubic-dev-ai cubic-dev-ai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 125 files

Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.
On a pro plan you can use ultrareview for larger PRs.
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic

@lchoquel

lchoquel commented Jun 7, 2026

Copy link
Copy Markdown
Member Author

✅ Review complete — PR ready

Round 8 (commit 5372b23a) came back clean: no new findings from Greptile or cubic, all checks green, 0 unresolved threads, merge state CLEAN.

Across 8 review rounds + the gstack /review pass, every real finding was fixed:

  • act_assemble_tracing no longer swallows unexpected errors (P1) — WfPipeRun's ActivityError branch now records them
  • Agent local cost-summary guarded against a reporting-side failure failing the run
  • Cost-report guard catches UnicodeEncodeError + botocore ClientError and BotoCoreError (read + emit paths)
  • PipelexRunner.__init__ made keyword-only
  • Several doc-accuracy corrections (best-effort cost_report, renderer scope, mock-inference scope)

Design tradeoffs were deferred with documentation rather than reflexively patched (in wip/registry/): the agent CLI's API-mode flag handling (#6/#6b), pre-existing cost-category completeness (reasoning/audio/prediction), DIRECT vs TEMPORAL model_dump divergence, the R2 retry double-count, and the T3 request-scoped-tracing refactor.

make agent-check + full make agent-test green. Leaving the merge to a maintainer.

@lchoquel lchoquel changed the title fix: UsageRegistry leak fix + distributed (event-sourced) cost reporting Distributed cost reporting Jun 7, 2026
lchoquel and others added 5 commits June 7, 2026 11:09
…_costs → is_generate_usage (#968)

The per-execution tracing context was born graph-only, but distributed cost
reporting now rides usage/cost events over the same object — so the name
under-described it. Rename it to the shared trace transport it actually is, and
align the internal usage gate to the stream it gates.

- `GraphContext` → `TraceContext`; file `graph_context.py` → `trace_context.py`;
  attribute `graph_context` → `trace_context` everywhere (incl. `JobMetadata`).
  Re-document the object as one node tree with two streams (graph + usage).
- Gate flag `is_generate_costs` → `is_generate_usage` on `PipelineExecutionConfig`
  (+ all three `pipelex.toml` files) and the override param
  `with_execution_overrides(generate_costs=…)` → `generate_usage=…`.
- Cost-domain renderer left cost-named on purpose: `render_run_cost_report`
  keeps `is_generate_costs`, fed the usage gate at the seam ("if usage was
  generated, render the cost view"). Public `--costs` CLI flag unchanged.

Deferred: the optional `graph_id` → `run_id` nit (ripples into the
deliberately graph-named `GraphTracerManager` plumbing). No behavior change.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- Deleted the `registry-lifecycle-synthesis.md`, `registry-success-path-leak-assessment.md`, `registry-success-path-leak-brief.md`, `registry-success-path-leak-execution-contexts.md`, and `usage-reporting-without-cost.md` files as they are no longer relevant to the current implementation and design direction.
- The `UsageRegistry` leak was confirmed, where successful runs left registries open, leading to memory leaks in long-lived servers. The design now ensures that the registry is closed properly on success paths.
- The synthesis document now encapsulates the lifecycle of the `UsageRegistry`, addressing both the leak and the cost reporting mechanisms in a unified manner.
- The capability of usage reporting has been clarified, distinguishing between usage data and cost projections, with a focus on making usage a first-class output.
@lchoquel lchoquel merged commit 8fe1eda into dev Jun 7, 2026
25 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 7, 2026
@lchoquel lchoquel deleted the fix/For-API-update branch June 7, 2026 14:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant