Skip to content

fix(retrieval): surface silent IPNI timeouts in logs#539

Open
SgtPooki wants to merge 1 commit into
feat/retrieval-parallel-transport-counterfrom
feat/retrieval-ipni-observability
Open

fix(retrieval): surface silent IPNI timeouts in logs#539
SgtPooki wants to merge 1 commit into
feat/retrieval-parallel-transport-counterfrom
feat/retrieval-ipni-observability

Conversation

@SgtPooki
Copy link
Copy Markdown
Collaborator

What changed

Two adjacent observability bugs left IPNI failures invisible whenever the outer pg-boss retrieval-job timeout fired:

  1. `IpniVerificationService.verify` check-order. Catch checked `signal?.aborted` before `verificationSignal.aborted`. On a race where both signals had asserted by the time the catch handler ran, `signal.throwIfAborted()` re-threw before the `ipni_verification_timed_out` log could fire. The inner-timeout log path was effectively dead code whenever outer ≤ inner. Inner check now runs first so the log fires whenever the inner timer aborted.

  2. `RetrievalService.verifyIpniForRetrieval` silent abort. The catch's signal-aborted branch returned a silent `failure.timedout` with no event log. Added `retrieval_ipni_verification_timed_out` so the retrieval-side caller records the abort.

Implements option C from #524. Stacks on #538.

Files

  • `apps/backend/src/ipni/ipni-verification.service.ts` — swap catch order so inner-timeout check wins same-tick races.
  • `apps/backend/src/retrieval/retrieval.service.ts` — add `retrieval_ipni_verification_timed_out` log in the abort branch.
  • `apps/backend/src/ipni/ipni-verification.service.spec.ts` — new test verifying `ipni_verification_timed_out` fires when both signals abort in the same tick.
  • `apps/backend/src/retrieval/retrieval.service.spec.ts` — new test verifying `retrieval_ipni_verification_timed_out` fires when outer signal aborts and IPNI verify throws.

How to verify

```
pnpm --filter dealbot-backend typecheck
pnpm --filter dealbot-backend lint
pnpm --filter dealbot-backend test
```

After deploy, staging logs should show `ipni_verification_timed_out` events 1:1 with `retrievalStatus=failure.timedout` deltas for SPs without IPNI advertisements (currently SP 5 Mongo2Stor, SP 17 superusey-calib in staging).

Notes

Base branch is `feat/retrieval-parallel-transport-counter` (#538). Re-target to `main` after that PR merges.

@FilOzzy FilOzzy added this to FOC May 14, 2026
@github-project-automation github-project-automation Bot moved this to 📌 Triage in FOC May 14, 2026
@SgtPooki SgtPooki self-assigned this May 14, 2026
Two adjacent observability bugs left IPNI failures invisible whenever
the outer pg-boss retrieval-job timeout fired:

1. IpniVerificationService.verify checked signal?.aborted before
   verificationSignal.aborted. On a race where both signals were
   asserted by the time the catch handler ran, signal.throwIfAborted()
   re-threw before the ipni_verification_timed_out log could fire.
   Inner check now runs first so the inner timeout log fires whenever
   the inner timeout signal aborted, regardless of outer state.

2. RetrievalService.verifyIpniForRetrieval's catch returned a silent
   failure.timedout when signal?.aborted, with no log. Added
   retrieval_ipni_verification_timed_out so the retrieval-side caller
   records the abort.

Tracks dealbot#524 option C. Pairs with #538 (parallel IPNI + transport).
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Two small observability fixes for IPNI verification timeouts in the retrieval check. The first reorders the catch handler in IpniVerificationService.verify so the inner-timeout log fires even when the outer job signal aborts in the same tick. The second adds a retrieval_ipni_verification_timed_out warn log in RetrievalService.verifyIpniForRetrieval's outer-abort branch, replacing a silent failure.timedout return. Both changes have accompanying unit tests.

Changes:

  • Swap catch-order in ipni-verification.service.ts so timeoutSignal.aborted is checked before the external signal?.aborted rethrow.
  • Emit retrieval_ipni_verification_timed_out log in retrieval.service.ts when the outer signal aborts during IPNI verify.
  • Add unit tests for both code paths.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
apps/backend/src/ipni/ipni-verification.service.ts Reorder catch checks so inner-timeout log wins same-tick races.
apps/backend/src/ipni/ipni-verification.service.spec.ts Test that ipni_verification_timed_out fires when both signals abort in the same tick.
apps/backend/src/retrieval/retrieval.service.ts Add structured warn log on outer-signal abort in verifyIpniForRetrieval.
apps/backend/src/retrieval/retrieval.service.spec.ts Test that retrieval_ipni_verification_timed_out fires when outer signal aborts and IPNI verify throws.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🔎 Awaiting review

Development

Successfully merging this pull request may close these issues.

4 participants