Skip to content

Rating v1: turn metered token counts into money (the revenue path)#6

Closed
hhuuggoo wants to merge 2 commits into
postgres-drainerfrom
rating-v1
Closed

Rating v1: turn metered token counts into money (the revenue path)#6
hhuuggoo wants to merge 2 commits into
postgres-drainerfrom
rating-v1

Conversation

@hhuuggoo

@hhuuggoo hhuuggoo commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

What

The rating system — turns raw billing_event token counts into money. This is the piece between "we metered tokens" and "a neocloud can send an invoice." Per Wendy's PM assessment, this was the #1 revenue blocker (Phoebe metered but did not bill).

Mechanism only — Hugo sets the actual prices later, as data, not code. The rating system is reversible and buildable now; the prices are the one-way commercial door, decoupled.

Money correctness (treated like auth — go deep, it's expensive to be wrong)

  • Integer micro-USD everywhere, never float. cost_micro_usd int64 (1e-6 USD). 1 Atlas hourly_usage_record unit = 100 micro-USD.
  • The billable-prompt formula (the highest-risk line): vLLM's prompt_tokens is the TOTAL, cached_tokens the cache-hit SUBSET. Each prompt token is charged exactly once: billable_prompt = prompt_tokens - cached_tokens at the prompt rate, cached_tokens at the (discounted) cached rate. Charging the naive prompt*prompt_rate + cached*cached_rate would overbill every cache hit — guarded + tested by name.
  • Effective-dated prices — an event rates at the price effective at its event_ts; rating "now" never retroactively reprices old traffic.
  • Idempotent — rollups upsert on (auth_id, model, window_start); re-rating a window reconciles, never doubles.
  • Fail loud, never $0 — a model with no price → ErrNoPrice, counted, logged ERROR, dropped from rollups (not silently $0-billed). Rater exits 2 so a CronJob alerts.
  • Unattributable rows are counted, not silently dropped — billing_event rows with NULL auth_id/model are counted + surfaced + trigger exit-2 (a nonzero count means an upstream billing-gate leak). This closed a silent-loss gap found in review.

Schema

  • model_price — effective-dated price book (prompt/cached/completion micro-USD per token). Tables only; no prices in the migration.
  • rated_usage — hourly rollups per (auth_id, model, window_start), unique-keyed for idempotent upsert.
  • migrations/seed_example_prices.sql — clearly NON-BINDING placeholder prices (do not ship to prod).
  • Alembic chained after billing_event (down_revision = b1f0c2d3e4a5), following the drainer's migration-ownership pattern.

Code

  • internal/rating: pure Rate() money core, effective-dated PriceBook, Rater.Run(window), Store (+ Postgres impl).
  • cmd/rater: one-shot batch job (run by a k8s CronJob), --since/--until, default = last complete hour. Exit 0 / 1 (fatal) / 2 (rated but anomaly: unpriced or unattributable).

Base

Targets postgres-drainer (reads its billing_event table). ~36 tests, every money rule named, race/lint/fmt clean, zero new deps.

hhuuggoo and others added 2 commits June 8, 2026 16:41
The revenue path. A batch job (cmd/rater) reads billing_event over a time
window, joins an effective-dated price book, computes cost in integer
micro-USD, and upserts per-(auth_id, model, hour) cost rollups into rated_usage.

Money correctness is the product; the non-negotiable invariants, each tested
by name:

- INTEGER micro-USD (1e-6 USD) everywhere, never float. 1 Atlas
  hourly_usage_record unit (1e-4 USD) == 100 micro-USD; finer base avoids
  rounding tiny per-token prices to zero before the multiply.
- Billable-prompt formula (the highest-risk line): vLLM's prompt_tokens is the
  TOTAL and cached_tokens is the cache-hit SUBSET, so
  cost = (prompt-cached)*prompt_rate + cached*cached_rate + completion*comp_rate.
  Cached tokens are charged ONCE (at the discounted rate), never double-counted.
- Effective-dated prices: an event is rated with the price in effect at its
  event_ts (fallback created_at); rating now never retroactively reprices old
  traffic.
- Idempotent re-runs: rollups upsert ON CONFLICT (auth_id, model, window_start)
  DO UPDATE, recomputing totals from scratch — re-rating a window reconciles,
  never doubles.
- Fail-closed on missing price: a model with no price-book entry at the event's
  time is counted as unpriced and logged loudly, NEVER silently billed $0
  (that is lost revenue). The rater exits 2 so a CronJob can alert.

Schema (migrations + ready-to-copy Alembic, chained after billing_event's
b1f0c2d3e4a5): model_price (effective-dated per-token price book) and
rated_usage (the hourly rollup). Prices are DATA, not code — a clearly-labelled
non-binding seed lives in seed_example_prices.sql; no prices ship in the schema.

No new dependencies. go build/vet/test -race/golangci-lint/gofmt all clean.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
ReadWindow previously filtered out billing_event rows with NULL auth_id or
model via `AND auth_id IS NOT NULL AND model IS NOT NULL`, with a comment
claiming they were "surfaced elsewhere." That elsewhere did not exist, so such
a row vanished from rating with zero count and zero alert — a silent
revenue-loss / data-loss path, exactly what we fail LOUD on for missing prices.

Make the exclusion loud, mirroring UnpricedEvents:

- store: drop the NULL filter from the WHERE; scan every in-window row with
  sql.NullString and, when auth_id/model is NULL, count it and skip it (atomic
  single-scan count). ReadWindow now returns (events, unattributable, error).
- rater: add Result.UnattributableEvents, HasUnattributable(), and a combined
  HasAnomaly() = HasUnpriced() || HasUnattributable(); thread the count through
  Run(); log it loudly (ERROR) and fold it into the loud summary.
- cmd/rater: exit 2 on HasAnomaly() (was HasUnpriced()); rename exitUnpriced ->
  exitAnomaly and update the exit-code doc. A nonzero unattributable count now
  alerts a CronJob the same as unpriced events.
- replace the stale "surfaced elsewhere" comment with the truth.

This should never be nonzero (the interceptor's fail-closed billing gate rejects
requests missing auth_id before they are metered) — which is precisely why a
nonzero count must be loudly surfaced: it means something upstream is broken and
revenue is leaking.

Tests: TestRater_UnattributableEventsCountedNotSilent (not rated, counted,
triggers anomaly/exit-2) and TestPostgresStore_ReadWindowCountsUnattributable
(NULL rows skipped+counted in the scan). Existing money-rule and ReadWindow
tests updated for the new signature; all pass.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant