Redesign ipniVerifyMs to disambiguate verify outcomes and tighten histogram buckets

## Problem

`ipniVerifyMs` is a single histogram with no outcome dimension. It records duration regardless of result (verified, signal-aborted, monitoring-skipped, error). Dashboards cannot filter to success-only or compute success-rate from it. Bucket boundaries `[10, 50, 100, 500, 1000, 2000, 5000, 10000, 30000, 60000, 120000, 300000]` jump from 60s → 120s → 300s, so `quantileInterpolatedWeighted` linearly extrapolates a deep tail (e.g. p90 ≈ 264s) from a small number of >120s samples.

Two related call paths feed the metric:

1. **Verify path** (`apps/backend/src/ipni/ipni-verification.service.ts:25-142`): polls IPNI indexer, bounded by `IPNI_VERIFICATION_TIMEOUT_MS=60000` via `AbortSignal.timeout`. Observed at `apps/backend/src/deal-addons/strategies/ipni.strategy.ts:406` and `apps/backend/src/retrieval/retrieval.service.ts:427`.
2. **Monitoring fallback** (`apps/backend/src/deal-addons/strategies/ipni.strategy.ts:303-332`): when `monitorPieceStatus` (separate `POLLING_TIMEOUT_MS=120000`) times out, the verify call still runs after — but the wall-time framing differs and the metric cannot distinguish.

## Proposed design (synthesis of cursor + gemini peer reviews)

### A. Add `outcome` label to `ipniVerifyMs`

`outcome ∈ { verified, timeout, error }` — observed for every attempted verification. Lets dashboards filter `{outcome=\"verified\"}` for true latency SLI. Recording timeout durations under `outcome=timeout` exposes potential AbortSignal-leak bugs (durations exceeding the budget by a wide margin).

### B. Add `ipniVerifySkippedTotal` counter

Incremented when monitoring failure prevents a verify attempt. Avoids polluting the histogram with synthetic-zero or \"skipped\" durations.

### C. Tighten histogram buckets

Current: `[10, 50, 100, 500, 1000, 2000, 5000, 10000, 30000, 60000, 120000, 300000]`
Proposed: `[10, 50, 100, 500, 1000, 2000, 5000, 10000, 30000, 60000, 90000, 120000, 180000, 240000, 300000]`

Closes the 60s→120s→300s gap that drives misleading p90 interpolation.

### D. (Optional) Separate `ipniMonitorStatusMs` histogram

For `monitorPieceStatus` Path B latency, if observability into the upstream gate is wanted.

## Files

- `apps/backend/src/metrics-prometheus/metrics-prometheus.module.ts:101-110` (metric definition + buckets)
- `apps/backend/src/metrics-prometheus/check-metrics.service.ts:166-215` (observation site)
- `apps/backend/src/deal-addons/strategies/ipni.strategy.ts:282-410` (Path A/B fork, observation call site for deal flow)
- `apps/backend/src/retrieval/retrieval.service.ts:418-427` (observation call site for retrieval flow)
- `apps/backend/src/ipni/ipni-verification.service.ts:25-142` (verify implementation)

## Migration

- Keep `ipniVerifyMs` name. Add label + buckets in one release.
- Update Better Stack dashboards to filter `outcome=\"verified\"` once label is populated.
- After one retention window (30d), simplify or remove old chart panels.

## Acceptance criteria

- [ ] `ipniVerifyMs` carries `outcome` label `{verified|timeout|error}`.
- [ ] `ipniVerifySkippedTotal` counter exists and increments on monitoring-skipped verifications.
- [ ] Histogram buckets include 90s, 120s, 180s, 240s.
- [ ] Dashboard chart filters by `outcome=\"verified\"` for SLI.
- [ ] No new high-cardinality label combinations introduced (3 outcomes × existing labels only).

## Related

- Dashboard chart query uses cumulative counter — see #482 (should be fixed first to validate metric behavior cleanly).
- Spike-hour observations — see #483 (may indicate a deeper root cause; outcome labels would help diagnose).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redesign ipniVerifyMs to disambiguate verify outcomes and tighten histogram buckets #484

Problem

Proposed design (synthesis of cursor + gemini peer reviews)

A. Add `outcome` label to `ipniVerifyMs`

B. Add `ipniVerifySkippedTotal` counter

C. Tighten histogram buckets

D. (Optional) Separate `ipniMonitorStatusMs` histogram

Files

Migration

Acceptance criteria

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Redesign ipniVerifyMs to disambiguate verify outcomes and tighten histogram buckets #484

Description

Problem

Proposed design (synthesis of cursor + gemini peer reviews)

A. Add outcome label to ipniVerifyMs

B. Add ipniVerifySkippedTotal counter

C. Tighten histogram buckets

D. (Optional) Separate ipniMonitorStatusMs histogram

Files

Migration

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

A. Add `outcome` label to `ipniVerifyMs`

B. Add `ipniVerifySkippedTotal` counter

D. (Optional) Separate `ipniMonitorStatusMs` histogram