Distributed cost reporting by lchoquel · Pull Request #967 · Pipelex/pipelex

lchoquel · 2026-06-06T23:26:48Z

What & why

This branch turns cost reporting from a leaky per-run in-process buffer into an event-sourced artifact on the run result, and in doing so fixes the UsageRegistry success-path leak and wires up distributed (cross-worker) cost reporting — they were two facets of one lifecycle (the leak is the missing close; distributed reporting is the missing replay-populate of the same per-run buffer).

The full as-built record, phase by phase with cold-start notes, is in TODOS.md — read it as the guide to this branch. The locked design is wip/registry/registry-lifecycle-synthesis.md.

The five load-bearing changes

Leak fixed by removal. The UsageRegistry (and its open_registry/close_registry lifecycle, inject_tokens_usages, generate_report, the live-add trio) is gone. It was opened in pipeline_run_setup but closed only on the failure path, so every successful run leaked its registry — and in a long-lived process (the Pipelex API) reusing a pipeline_run_id could collide on the orphan. With nothing to open, the leak is structurally impossible.
Usage rides on PipeOutput. New tokens_usages: list[AnyTokensUsage] | None + usage_assembly_error: str | None, mirroring graph_spec / graph_assembly_error. Usage is assembled from the trace-event stream at end of run, in both DIRECT and TEMPORAL modes — so a cross-worker run carries its true total cost back on the result, exposed automatically wherever a PipeOutput is returned (including the Pipelex API response).
--costs / is_generate_costs (default on) decouples cost collection from --graph. Both ride the shared event-log transport: --graph gates graph events + GraphSpecAssembler; --costs gates UsageReportEvents + UsageAggregator. --cost-report is removed (folded into --costs; no back-compat, per repo policy).
Event-sourced renderer. The only cost renderer now reads pipe_output.tokens_usages via CostRegistry.generate_report(tokens_usages=...) (main CLI through reporting/cost_report_renderer.py). The Temporal act_assemble_graph activity is renamed act_assemble_tracing and assembles both graph and usage from a single event read. is_log_costs_to_console now defaults true (CLI shows the table by default when --costs is on); per-inference-job console logging is removed (report renders once at end of run). Dry runs are suppressed (zero tokens + zero cost); free/zero-price model runs (e.g. Ollama) still report token usage with a zero total.
--mock-inference. A LIVE run that fakes AI calls at the inference leaf with reportable non-zero synthetic usage — operators dispatch real activities (exercising the true distributed path) but no tokens are billed. It's the cheap, deterministic way to validate distributed cost reporting (and the temporal-e2e-validate skill's new Tier 8b consumes it). Mutually exclusive with --dry-run. Covers the LLM leaf; image-gen/extract/search under --mock-inference fail loud with MockInferenceUnsupportedError rather than silently spending.

Breaking changes (no back-compat, per repo policy)

--cost-report/--no-cost-report removed from run pipe|method|bundle → use --costs (default on).
ReportingProtocol / ReportingNoOp no longer expose open_registry / close_registry / generate_report / inject_tokens_usages; the UsageRegistry model is gone. Embedders that rendered via get_report_delegate().generate_report() must render from the returned pipe_output.tokens_usages.
is_log_costs_to_console default flipped false → true (in pipelex.toml, .pipelex/pipelex.toml, and the kit/configs client template).

Testing

make agent-check (ruff/plxt clean, pyright 0/0, mypy clean) and full make agent-test green. Distributed cost reporting is validated without spend: test_split_worker_usage.py (read-back aggregation), test_mock_inference_temporal.py (--mock-inference cross-process), test_direct_tracing_assembly.py / test_cost_report_rendering.py (DIRECT). The temporal-e2e-validate skill's Tier 8b is the manual 3-process counterpart.

Cross-repo (deferred to release pin-bump — not in this PR)

cocode and pipelex-api pin a released pipelex (registry still present there). They don't break on this branch; their migration (swe_cmd.py → CostRegistry.generate_report(tokens_usages=...), plus API leak verification) lands when the pin bumps to a build carrying these changes. See TODOS Phase 3/4 cross-repo boxes.

Deferred non-goals

Consolidated in wip/registry/deferred-followups.md (D5 costs-only tracer skip, cost-per-node correlation, the GraphContext.from_execution_config factory, T5, T3 request-scoped tracing, --mock-inference per-operator coverage) and the two open design decisions in wip/registry/cost-report-deferred-decisions.md.

🤖 Generated with Claude Code

Summary by cubic

Rebuilt cost reporting as an event‑sourced artifact on PipeOutput, fixing the success‑path UsageRegistry leak and enabling cross‑worker cost aggregation. Hardening: trace reads now catch botocore.ClientError and botocore.BotoCoreError, and both event‑log emission and cost‑report rendering failures are best‑effort so successful runs don’t fail; obsolete registry docs were removed and the leak fix is documented.

New Features
- Usage rides on pipe_output.tokens_usages (+ usage_assembly_error), assembled from trace events in DIRECT and TEMPORAL; multi‑worker runs aggregate into one report.
- --costs/--no-costs (default on) decouples usage from --graph; costs‑only skips graph assembly and heavy payload serialization.
- Unified renderer reads tokens_usages; console prints once at end when enabled, suppressed for dry runs (0 tokens) and when usage assembly/aggregation fails. CSV/summary errors are logged and skipped.
- Agent CLI attaches a best‑effort cost_report JSON (omitted for dry runs, --no-costs, API‑runner path, or if local summary aggregation fails). Free/zero‑price models report tokens with a 0 total.
- Temporal switched to act_assemble_tracing to assemble graph and costs from a single read; unexpected errors propagate so WfPipeRun records them on *_assembly_error.
- --mock-inference: LLM leaf returns synthetic, non‑zero usage without provider calls; non‑LLM leaves raise MockInferenceUnsupportedError. Mocked objects are re‑validated; failures raise MockInferenceObjectFidelityError.
- Event‑log writes are best‑effort: emit failures are downgraded to warnings so runs don’t fail.
Migration
- Replace --cost-report with --costs (default on). Config adds is_generate_usage (was is_generate_costs). Console default is_log_costs_to_console = true.
- GraphContext renamed to TraceContext; JobMetadata.graph_context → JobMetadata.trace_context. The shared context carries both graph and usage streams; costs‑only still mints node IDs but does not assemble a GraphSpec.
- Stop using get_report_delegate().open/close_registry and .generate_report(). Render from pipe_output.tokens_usages via CostRegistry.generate_report(...) or render_run_cost_report(...).
- Use PipelineExecutionConfig.with_execution_overrides(...) (replaces with_graph_config_overrides(...)).
- Temporal: act_assemble_graph renamed to act_assemble_tracing.
- PipelexRunner.__init__ is now keyword‑only; pass all constructor args by name.

^{Written for commit 67fcf0f. Summary will update on new commits.}

…to prevent collisions fix(log_dispatch): append traceback to console message when include_exception is True test(bundle_validator): ensure consecutive sweeps use distinct registry IDs and do not collide test(bundle_validator): verify registry is opened once and closed in finally for successful runs test(log_dispatch): add tests for exception logging behavior in console messages docs(init): add non-interactive `pipelex init --yes` mode for CI compatibility docs(registry-leak): document `UsageRegistry` leak on success path of full pipeline run

- Introduced `--costs` option in CLI commands (`run_bundle_cmd`, `run_method_cmd`, `run_pipe_cmd`) to enable or disable cost reporting. - Updated execution configuration to include `is_generate_costs` flag for controlling cost reporting behavior. - Enhanced `GraphContext` to manage emission of usage events independently from graph events. - Modified pipeline run setup to initialize tracing context based on graph and cost reporting flags. - Implemented tests to validate the behavior of cost reporting and event emission gates in various scenarios. - Updated configuration files to reflect new cost reporting settings.

…nsure correctness across contexts docs: update references to with_execution_overrides in execution graph tracing documentation chore(TODOS): document implementation plan and review findings for Phase 1 emit decoupling

…vents

- Replace `act_assemble_graph` with `act_assemble_tracing` to unify tracing assembly logic. - Remove the `act_assemble_graph` activity and its associated tests. - Introduce `act_assemble_tracing` to handle both graph specification and usage aggregation from trace events. - Update `pipeline_run_setup` to correctly propagate emit flags to the new tracing activity. - Modify `WfPipeRun` and `WfPipeRouter` to accommodate changes in tracing assembly logic. - Add integration tests for tracing assembly scenarios, ensuring correct population of `tokens_usages` and `graph_spec`. - Implement unit tests for `TracingAssembly` to validate behavior under various conditions, including error handling.

…tests for WfPipeRun behavior

… commands - Introduced cost reporting in the agent CLI, allowing for structured cost summaries in JSON output. - Removed deprecated cost report flags from CLI commands and replaced them with a unified cost reporting mechanism. - Updated the CostRegistry to aggregate costs by model and compute total costs from token usages. - Implemented a new cost report renderer to handle console and CSV outputs based on configuration settings. - Added integration tests to ensure correct cost report generation for both non-zero and zero-cost runs. - Updated configuration files to enable cost logging to console by default. - Refactored existing code to streamline cost reporting logic and improve maintainability.

- Updated the cost reporting logic to ensure that runs using free models with real tokens still generate a cost report, reflecting a total cost of 0. - Enhanced the `CostRegistry` to aggregate costs and determine reportability based on token usage and cost, ensuring that dry runs (zero tokens and zero cost) are the only cases that suppress reports. - Modified the `render_run_cost_report` function to check for reportable usage before generating output. - Added tests to verify that free models are reported correctly and that dry runs do not produce reports. - Documented the changes in the relevant markdown files to clarify the behavior of cost reporting in different scenarios.

- Removed unnecessary assertions and code related to orphan registries in ReportingManager tests. - Updated tests to reflect the removal of per-run usage registries, ensuring that reporting functions correctly without accumulating orphan registries. - Simplified test setup for JobMetadata and workflow run IDs, eliminating the need for explicit registry opening and closing. - Adjusted assertions in tests to verify correct behavior of reporting events and usage without relying on registry state. - Enhanced clarity in test descriptions and streamlined the overall test structure for better maintainability.

…inference` CLI option for enhanced cost reporting

…reporting - Introduced dry_mock.py to handle synthetic inference jobs for both --dry-run and --mock-inference modes. - Added mock_llm_gen_text, mock_llm_gen_object, and mock_llm_gen_object_list functions to generate mock outputs and report usage. - Updated llm_generate.py to route calls to mock functions based on job_metadata.is_mock_inference flag. - Enhanced JobMetadata to include is_mock_inference attribute for tracking mock inference state. - Modified pipeline execution setup to propagate is_mock_inference flag through job metadata. - Created integration tests for --mock-inference in both direct and temporal contexts to ensure correct behavior and reporting. - Added unit tests for dry_mock functions to validate synthetic job reporting and usage aggregation. - Updated CLI commands to declare --mock-inference option and enforce mutual exclusivity with --dry-run.

…tions - Introduced `MockInferenceUnsupportedError` to prevent real provider calls under `--mock-inference` for image generation, document extraction, and web search. - Updated relevant leaf functions (`img_gen_single_image`, `img_gen_image_list`, `extract_gen_pages`, and `PipeSearch`) to raise the new error when `is_mock_inference` is true. - Added integration and unit tests to ensure the guard functions correctly and prevents unnecessary spending. - Documented the new error in the errors index and created a dedicated page for `MockInferenceUnsupportedError`. - Updated TODOs and follow-up documentation to reflect the changes and future work on full leaf coverage.

…rence fidelity - Added `validate_run_flag_combination` function to enforce legal combinations of `--dry-run`, `--mock-inference`, and `--mock-inputs` flags across `pipe`, `method`, and `bundle` commands. - Refactored command implementations in `bundle_cmd.py`, `method_cmd.py`, and `pipe_cmd.py` to utilize the new validation function, removing redundant checks. - Introduced `_revalidate_against_object_class` function in `content_generator.py` to ensure fidelity of mock inference objects against their original classes, raising `MockInferenceObjectFidelityError` for validation failures. - Updated `ContentGeneratorInWorkflow` to include fidelity checks for mock inference objects. - Created integration tests for mock inference fidelity and unit tests for run flag combination validation. - Added detailed error handling and messaging for mock inference fidelity issues.

… docs Phase 6 — temporal-e2e-validate skill: add Tier 8b (cross-worker cost report assembly). Uses split router+runner workers + --mock-inference so it is free and deterministic — asserts the submitter renders a single non-suppressed cost report from runner-side usage, with a --no-costs negative arm and an opt-in live arm. Updates the skill description/triggers, the master results table, and reconciles the Tier 8 note to point at --mock-inference as the cheap deterministic path. Phase 7 — docs/changelog/wrap: - CHANGELOG [Unreleased]: --costs/--no-costs (default on), --cost-report removal, distributed cost reporting, tokens_usages on PipeOutput, --mock-inference, agent-CLI cost_report JSON, is_log_costs_to_console default flip, dry-run suppression, act_assemble_tracing rename, and the UsageRegistry success-path leak fix. - wip docs flipped to as-built: distributed-execution T2 -> FIXED, P0.1 + P1 -> DONE; registry README -> IMPLEMENTED. New wip/registry/deferred-followups.md consolidates the deferred non-goals. - TODOS.md now reads as the as-built PR-review guide to the branch. Markdown-only; make agent-check green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-06-06T23:29:17Z

Greptile Summary

This PR moves cost reporting onto the run result and makes it work across distributed execution. It changes:

Adds event-sourced usage aggregation onto PipeOutput.tokens_usages.
Replaces the in-process usage registry with trace-event replay for cost reports.
Adds the default-on --costs path, independent from graph generation.
Adds --mock-inference for no-spend validation of distributed usage reporting.
Renames the shared graph/usage plumbing to TraceContext and is_generate_usage.

Confidence Score: 5/5

This looks safe to merge.

No blocking issues found in the changed code.

_{Reviews (11): Last reviewed commit: "refactor(tracing): rename GraphContext →..." | Re-trigger Greptile}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 51c5c85606

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

cubic-dev-ai

4 issues found across 124 files

_{Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic}

Three real findings fixed; two design tradeoffs documented as deferred. - act_assemble_tracing no longer swallows unexpected errors (greptile + cubic P1). Expected best-effort failures are still caught inside assemble_tracing and returned on the *_assembly_error fields; a genuinely unexpected error now propagates so the activity fails and WfPipeRun's existing `except ActivityError` branch records it on usage_assembly_error / graph_assembly_error — symmetric with DIRECT mode, instead of the workflow silently succeeding with no cost report and no diagnostic. Pinned by a new unit test. - Agent CLI local cost-summary block is now wrapped to mirror the main CLI's render_run_cost_report guard (cubic P2): CostRegistry.build_cost_summary can raise CostRegistryError (a PipelexError); catch + log so a reporting-side error never fails an otherwise-successful run. - execution_seams.prepare_pipe_job docstring corrected (cubic P3): --mock-inference fakes the LLM leaf only; non-LLM leaves raise MockInferenceUnsupportedError. Deferred (documented, not patched): - Agent `--costs` is a silent no-op in API mode → #6b in wip/registry/cost-report-deferred-decisions.md (facet of #6; needs the tri-state param first). - Activity usage write can mutate a workflow-registered buffer in a single-process Temporal deployment → pre-existing, T3 territory; documented in wip/registry/deferred-followups.md (split router+runner topology doesn't hit it). make agent-check + full make agent-test green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lchoquel · 2026-06-06T23:57:28Z

Review round 1 addressed (commit `51b7e3f`)

Thanks for the reviews. Triage:

Fixed

act_assemble_tracing swallowed unexpected errors (greptile + cubic, P1): the activity no longer degrades unexpected errors to an empty result — they propagate so the activity fails and WfPipeRun's existing except ActivityError branch records them on usage_assembly_error / graph_assembly_error, symmetric with DIRECT mode. Expected best-effort failures stay caught inside assemble_tracing (no retry storm). New pinning test: test_act_assemble_tracing_error_propagation.py.
Agent local cost summary unguarded (cubic, P2): wrapped to mirror the main CLI's render_run_cost_report guard — a CostRegistryError no longer fails an otherwise-successful run.
execution_seams docstring overstated --mock-inference scope (cubic, P3): corrected to "LLM leaf only; non-LLM leaves raise MockInferenceUnsupportedError".

Deferred (documented design decisions, not silent bugs)

--costs no-op in agent API mode (cubic, P2 + greptile summary): facet of the existing Feature/pipelex init base config #6 — the agent costs param is hard-default True, so "reject as unsupported" needs the tri-state param first. Documented as #6b in wip/registry/cost-report-deferred-decisions.md.
Activity usage write mutating a workflow-registered buffer (codex, P1): pre-existing (the fast path predates this PR), only affects single-process Temporal deployments (the split router+runner topology takes the per-process fallback), and is T3-refactor territory. Documented in wip/registry/deferred-followups.md.

make agent-check + full make agent-test green.

@greptileai @cubic-dev-ai please re-review.

cubic-dev-ai · 2026-06-06T23:57:35Z

Review round 1 addressed (commit 51b7e3f)

Thanks for the reviews. Triage:

Fixed
...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

2 issues found across 125 files

_{Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic}

…nd 2) The agent CLI's CLAUDE.md overstated when `cost_report` appears — round-1's build_cost_summary guard means it can now be absent on a real run too. Doc now states it is best-effort and absent for: dry runs, --no-costs, the API-runner path, or when cost-summary aggregation fails (caught + skipped). (cubic P3) The other round-2 finding (UnicodeEncodeError in CSV write, cubic P2) is a false positive: save_to_csv writes encoding="utf-8", which encodes every valid Python str, so UnicodeEncodeError cannot occur — adding it would be a speculative catch. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lchoquel · 2026-06-07T00:09:26Z

Review round 2 addressed (commit `73ac388`)

Doc overstated cost_report availability (cubic, P3): fixed. The agent CLI CLAUDE.md now says the cost_report is best-effort and absent for dry runs, --no-costs, the API-runner path, or when cost-summary aggregation fails (caught + skipped) — consumers treat it as optional. This became inaccurate after round-1's summary guard.
UnicodeEncodeError in CSV write (cubic, P2): not a real path. save_to_csv writes encoding="utf-8", which encodes every valid Python str, so UnicodeEncodeError cannot occur for cost records — adding it would be a speculative catch (our standards forbid catching exceptions that can't be raised). Declined with rationale on the thread.

CI green on the prior commit; this change is doc-only.

@greptileai @cubic-dev-ai please re-review.

cubic-dev-ai · 2026-06-07T00:09:36Z

Review round 2 addressed (commit 73ac388)

Doc overstated cost_report availability (cubic, P3): fixed. The agent CLI CLAUDE.md now says the cost_report is best-effort and absent for dry runs, --no-costs, the API-runner path, or when cost-summary aggregation fails (caught + skipped) — consumers treat it as optional. This became inaccurate after round-1's summary guard.

UnicodeEncodeError in CSV write (cubic, P2): not a real path. save_to_csv writes encoding="utf-8", which encodes every valid Python str, so UnicodeEncodeError cannot occur for cost records — adding it would be a speculative catch (our standards forbid catching exceptions that can't be raised). Declined with rationale on the thread.

...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

1 issue found across 125 files

_{Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic}

…ound 3) CHANGELOG's agent cost_report entry read as "always attached"; it is best-effort/optional (absent for dry runs, --no-costs, API-runner path, or an aggregation failure). Matches the agent_cli CLAUDE.md correction from round 2. (cubic P2) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lchoquel · 2026-06-07T00:21:30Z

Review round 3 addressed (commit `72692b0`)

CHANGELOG overstated agent cost_report availability (cubic, P2): fixed — the entry now marks it best-effort/optional (absent for dry runs, --no-costs, API-runner path, or aggregation failure), matching the CLAUDE.md correction from round 2. Doc-only.

@greptileai @cubic-dev-ai please re-review.

- Cost-report guard now also catches UnicodeEncodeError (cubic P2). The guard's contract is "cost reporting never fails a successful run"; it already caught OSError (a CSV-write failure). The CSV is written UTF-8, which can raise UnicodeEncodeError on a lone-surrogate string — same write-failure category, so it belongs in the same catch. (Reverses the round-2 decline: the earlier "impossible with UTF-8" rationale was imprecise.) Pinned by a parametrized test (UnicodeEncodeError / CostRegistryError / OSError all degrade to a warning). - execution_seams docstring: "LLM cogt leaf" -> "LLM inference-leaf" (cubic P3). `cogt` is the package name, not a typo, but the jargon read as one; reworded for clarity. - reporting-config.md: console cost-table is printed only when the run had reportable usage — suppressed for dry runs and on an assembly/aggregation failure (cubic P3). Deferred (documented): --no-costs no-op in agent API mode (cubic P2) is #6b in wip/registry/cost-report-deferred-decisions.md. make agent-check + targeted reporting/cost tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lchoquel · 2026-06-07T01:10:40Z

Review round 4 addressed (commit `0ddd65c`)

Cost-report guard missed UnicodeEncodeError (cubic P2): fixed — added it to the guard. Reversing my round-2 decline: UTF-8 can raise it on a lone surrogate, and this guard already catches OSError (a CSV-write failure), so the same write-failure category belongs there. New parametrized test pins it.
cogt reads as a typo (cubic P3): fixed — reworded to "LLM inference-leaf" (cogt is the package name, not a typo, but dropped the jargon for clarity).
reporting-config.md overstated console table (cubic P3): fixed — now notes it prints only when the run had reportable usage.
--no-costs no-op in agent API mode (cubic P2): deferred — same root as Feature/pipelex init base config #6/#6b (documented); needs the tri-state refactor + a product call, not a one-off patch.

make agent-check + targeted reporting/cost tests green.

@greptileai @cubic-dev-ai please re-review.

cubic-dev-ai · 2026-06-07T01:10:46Z

Review round 4 addressed (commit 0ddd65c)

Cost-report guard missed UnicodeEncodeError (cubic P2): fixed — added it to the guard. Reversing my round-2 decline: UTF-8 can raise it on a lone surrogate, and this guard already catches OSError (a CSV-write failure), so the same write-failure category belongs there. New parametrized test pins it.

cogt reads as a typo (cubic P3): fixed — reworded to "LLM inference-leaf" (cogt is the package name, not a typo, but dropped the jargon for clarity).

reporting-config.md overstated console table (cubic P3): fixed — now notes it prints only when the run had reportable usage.
...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

3 issues found across 125 files

_{Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic}

- PipelexRunner.__init__ is now keyword-only (cubic P2). is_mock_inference was added mid-signature; making the constructor keyword-only (all call sites already pass keywords) removes the positional-shift footgun for good and aligns with the codebase's keyword-only-args direction. - cost_report_renderer module docstring corrected (cubic P3): only the main CLI calls render_run_cost_report; the agent CLI builds a machine-readable cost_report via CostRegistry.build_cost_summary instead (no Rich table). Deferred (documented): --no-costs no-op in agent API mode, re-flagged on method_cmd — #6b in wip/registry/cost-report-deferred-decisions.md. make agent-check + pipeline integration tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lchoquel · 2026-06-07T01:31:13Z

Review round 5 addressed (commit `0f92c37`)

is_mock_inference added mid-signature (cubic P2): fixed — PipelexRunner.__init__ is now keyword-only (*). All call sites already pass keywords, so non-breaking internally, and it removes the positional-shift footgun for good (aligns with the keyword-only-args direction).
renderer module docstring claimed both CLIs call it (cubic P3): fixed — only the main CLI calls render_run_cost_report; the agent CLI uses CostRegistry.build_cost_summary for its JSON envelope.
--no-costs API no-op (cubic P2, re-flagged on method_cmd): deferred — #6b, needs the tri-state refactor + product call.

make agent-check + pipeline integration tests green.

@greptileai @cubic-dev-ai please re-review.

cubic-dev-ai · 2026-06-07T01:31:20Z

Review round 5 addressed (commit 0f92c37)

is_mock_inference added mid-signature (cubic P2): fixed — PipelexRunner.__init__ is now keyword-only (*). All call sites already pass keywords, so non-breaking internally, and it removes the positional-shift footgun for good (aligns with the keyword-only-args direction).

renderer module docstring claimed both CLIs call it (cubic P3): fixed — only the main CLI calls render_run_cost_report; the agent CLI uses CostRegistry.build_cost_summary for its JSON envelope.

--no-costs API no-op (cubic P2, re-flagged on method_cmd): deferred — #6b, needs the tri-state refactor + product call.
...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

1 issue found across 125 files

_{Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic}

lchoquel · 2026-06-07T01:44:06Z

Review converged (commit `0f92c37`)

Round 6 surfaced only a re-flag of the already-documented-deferred #6b (--no-costs no-op in agent API mode) — no new actionable finding. All review threads are resolved and CI is green across Tests (py3.10–3.14), Typecheck, Lint, Greptile, and cubic.

Summary of what was addressed across rounds:

Real fixes: act_assemble_tracing no longer swallows unexpected errors (P1, + pinning test); agent cost-summary guarded; cost-report guard also catches UnicodeEncodeError; PipelexRunner.__init__ made keyword-only; several doc-accuracy corrections (cost_report best-effort, renderer scope, mock-inference scope).
Deferred design decision (documented): Feature/pipelex init base config #6/#6b — the agent CLI's --costs/--no-costs semantics across the local vs remote API runner. Needs the tri-state-param refactor + a product call on API-response cost surfacing; tracked in wip/registry/cost-report-deferred-decisions.md.

Proceeding to final pre-merge review.

…/review) Final pre-merge /review (Claude + Codex adversarial passes) found one real best-effort violation: assemble_tracing's read-error catch omitted botocore ClientError, so in DIRECT mode with the DynamoDB backend a transient DDB error (throttle/auth/network) during tracing assembly would fail an otherwise-successful run — the sibling reporting_manager fallback already handles it. Added the conditional botocore catch (mirrors reporting_manager) so the read degrades to an *_assembly_error note. Pinned by test_read_botocore_clienterror_is_caught (pytest.importorskip-guarded). Flagged + deferred (pre-existing, separate concerns; documented in wip/registry/): - Cost report ignores reasoning/audio/prediction token costs (under-reports total) → deferred-followups.md (cost-model change, not this PR's scope). - DIRECT vs TEMPORAL object re-validation use different model_dump modes → deferred-followups.md. - Agent API-mode flag handling is inconsistent (--graph/--costs silently ignored while --dry-run/--mock-inputs are rejected) → broadened #6b in cost-report-deferred-decisions.md (holistic fix, not a partial --no-costs patch). make agent-check + targeted pipe_run/reporting/pipeline tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lchoquel · 2026-06-07T02:09:09Z

Final pre-merge review (gstack /review) — commit `474b88e`

Ran independent Claude + Codex adversarial passes over the full diff.

Fixed (1 real best-effort violation):

assemble_tracing caught OSError/PipelexError but not botocore ClientError — in DIRECT mode with the DynamoDB backend a transient DDB error (throttle/auth/network) during tracing assembly would fail an otherwise-successful run. Added the conditional botocore catch (mirrors the reporting_manager fallback) so it degrades to a *_assembly_error note. Pinned by a new importorskip-guarded test.

Flagged + deferred (pre-existing, separate concerns — documented in wip/registry/, not expanded into this PR):

Cost report ignores reasoning/audio/prediction token costs (under-reports total) — pre-existing cost-model gap.
DIRECT vs TEMPORAL object re-validation use different model_dump modes.
Agent API-mode flag handling is inconsistent (--graph/--costs silently ignored while --dry-run/--mock-inputs are rejected) — the holistic framing of #6b; a partial --no-costs patch would worsen it.

Confirmed fine: exactly-once usage emit (fast-path XOR fallback), wf_pipe_router clears its context in finally, the log_dispatch traceback change is a genuine latent-bug fix, console-table default flip is intended (D6), mock free-model usage stays reportable.

make agent-check + targeted suites green. This was the final review pass.

lchoquel · 2026-06-07T02:53:01Z

Final commit 474b88ec adds one small defensive fix from the /review pass (botocore ClientError in assemble_tracing's best-effort read catch) plus deferred-item docs. CI is green across all Python versions. @greptileai @cubic-dev-ai one final re-review on this commit, please.

cubic-dev-ai · 2026-06-07T02:53:08Z

Final commit 474b88ec adds one small defensive fix from the /review pass (botocore ClientError in assemble_tracing's best-effort read catch) plus deferred-item docs. CI is green across all Python versions. @greptileai @cubic-dev-ai one final re-review on this commit, please.

@lchoquel I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

1 issue found across 125 files

_{Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic}

… (PR #967 review round 7) Greptile + cubic correctly flagged that the round-6 /review botocore fix was incomplete: it caught ClientError (service-side throttle/auth) but not BotoCoreError (transport/credential/timeout — EndpointConnectionError, ReadTimeoutError, NoCredentialsError, ...), which is a SEPARATE base class, not a ClientError subclass. A DynamoDB transport failure would still escape the best-effort read/emit catch and fail an otherwise-successful run. Now both sibling classes are caught in both assemble_tracing (read) and reporting_manager's runner fallback (emit) — matching s3_storage_provider, which already catches the pair. botocore is untyped so the imported classes are Unknown; cast to type[BaseException] keeps the except-tuple (and the bound exception var) cleanly typed. Test extended to cover both ClientError and a BotoCoreError subclass (EndpointConnectionError). make agent-check (pyright 0/0, mypy 2026 files) + tracing/reporting tests green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lchoquel · 2026-06-07T03:09:17Z

Review round 7 addressed (commit `5372b23`)

Both bots correctly caught that my round-6 /review fix was incomplete: it caught botocore ClientError but not BotoCoreError (a sibling base class covering transport/credential/timeout — EndpointConnectionError, ReadTimeoutError, NoCredentialsError). Now both sibling classes are caught in both assemble_tracing (read) and reporting_manager's runner fallback (emit), matching s3_storage_provider which already catches the pair. Test extended to cover a BotoCoreError subclass.

make agent-check (pyright 0/0, mypy 2026 files) + tracing/reporting tests green. Genuinely useful catch — thanks.

@greptileai @cubic-dev-ai final re-review on this commit, please.

cubic-dev-ai · 2026-06-07T03:09:25Z

Review round 7 addressed (commit 5372b23)

Both bots correctly caught that my round-6 /review fix was incomplete: it caught botocore ClientError but not BotoCoreError (a sibling base class covering transport/credential/timeout — EndpointConnectionError, ReadTimeoutError, NoCredentialsError). Now both sibling classes are caught in both assemble_tracing (read) and reporting_manager's runner fallback (emit), matching s3_storage_provider which already catches the pair. Test extended to cover a BotoCoreError subclass.

make agent-check (pyright 0/0, mypy 2026 files) + tracing/reporting tests green. Genuinely useful catch — thanks.
...

@lchoquel I have started the AI code review. It will take a few minutes to complete.

cubic-dev-ai

No issues found across 125 files

_{Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.
You've manually re-run cubic several times on this PR. Each manual re-review checks the full PR again and counts toward your usage quota. To preserve your usage limits, we recommend letting cubic automatically review new commits.

Re-trigger cubic}

lchoquel · 2026-06-07T03:25:46Z

✅ Review complete — PR ready

Round 8 (commit 5372b23a) came back clean: no new findings from Greptile or cubic, all checks green, 0 unresolved threads, merge state CLEAN.

Across 8 review rounds + the gstack /review pass, every real finding was fixed:

act_assemble_tracing no longer swallows unexpected errors (P1) — WfPipeRun's ActivityError branch now records them
Agent local cost-summary guarded against a reporting-side failure failing the run
Cost-report guard catches UnicodeEncodeError + botocore ClientError and BotoCoreError (read + emit paths)
PipelexRunner.__init__ made keyword-only
Several doc-accuracy corrections (best-effort cost_report, renderer scope, mock-inference scope)

Design tradeoffs were deferred with documentation rather than reflexively patched (in wip/registry/): the agent CLI's API-mode flag handling (#6/#6b), pre-existing cost-category completeness (reasoning/audio/prediction), DIRECT vs TEMPORAL model_dump divergence, the R2 retry double-count, and the T3 request-scoped-tracing refactor.

make agent-check + full make agent-test green. Leaving the merge to a maintainer.

… failures

…_costs → is_generate_usage (#968) The per-execution tracing context was born graph-only, but distributed cost reporting now rides usage/cost events over the same object — so the name under-described it. Rename it to the shared trace transport it actually is, and align the internal usage gate to the stream it gates. - `GraphContext` → `TraceContext`; file `graph_context.py` → `trace_context.py`; attribute `graph_context` → `trace_context` everywhere (incl. `JobMetadata`). Re-document the object as one node tree with two streams (graph + usage). - Gate flag `is_generate_costs` → `is_generate_usage` on `PipelineExecutionConfig` (+ all three `pipelex.toml` files) and the override param `with_execution_overrides(generate_costs=…)` → `generate_usage=…`. - Cost-domain renderer left cost-named on purpose: `render_run_cost_report` keeps `is_generate_costs`, fed the usage gate at the seam ("if usage was generated, render the cost view"). Public `--costs` CLI flag unchanged. Deferred: the optional `graph_id` → `run_id` nit (ripples into the deliberately graph-named `GraphTracerManager` plumbing). No behavior change. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

- Deleted the `registry-lifecycle-synthesis.md`, `registry-success-path-leak-assessment.md`, `registry-success-path-leak-brief.md`, `registry-success-path-leak-execution-contexts.md`, and `usage-reporting-without-cost.md` files as they are no longer relevant to the current implementation and design direction. - The `UsageRegistry` leak was confirmed, where successful runs left registries open, leading to memory leaks in long-lived servers. The design now ensures that the registry is closed properly on success paths. - The synthesis document now encapsulates the lifecycle of the `UsageRegistry`, addressing both the leak and the cost reporting mechanisms in a unified manner. - The capability of usage reporting has been clarified, distinguishing between usage data and cost projections, with a focus on making usage a first-class output.

lchoquel and others added 16 commits June 6, 2026 14:29

Plan for cost registry

18c4c00

Plan

96836a3

fix: update terminology for cost reporting to clarify usage tracing e…

dc6fb8c

…vents

fix: enhance cost reporting and tracing assembly error handling; add …

159662d

…tests for WfPipeRun behavior

feat(mock-inference): introduce is_mock_inference flag and `--mock-…

4d6b96a

…inference` CLI option for enhanced cost reporting

greptile-apps Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread pipelex/temporal/tprl_pipe/act_assemble_tracing.py Outdated

chatgpt-codex-connector Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread pipelex/reporting/reporting_manager.py Outdated

cubic-dev-ai Bot reviewed Jun 6, 2026

View reviewed changes

Comment thread pipelex/temporal/tprl_pipe/act_assemble_tracing.py Outdated

Comment thread pipelex/cli/agent_cli/commands/run/pipe_cmd.py

Comment thread pipelex/cli/agent_cli/commands/run/_run_core.py Outdated

Comment thread pipelex/pipeline/execution_seams.py Outdated