Skip to content

Conversation

@alexluong
Copy link
Collaborator

@alexluong alexluong commented Jan 22, 2026

Moves event fetching from the delivery flow to the retry flow.

What changed

  • Retry scheduler now fetches full event data from logstore before publishing to delivery queue
  • Delivery handler no longer needs logstore dependency or retry detection logic

Why this design

Scheduled retry tasks only store event IDs (not full payloads) to keep queue messages small. Previously, the delivery handler had to detect retries via event.Time.IsZero() and conditionally fetch event data.

Moving this to the retry scheduler is cleaner because:

  1. The retry flow inherently knows it's handling retries - no heuristic detection needed
  2. Fails earlier - missing events are caught before publishing to delivery queue
  3. Removes logstore dependency from delivery handler - delivery handler always receives complete data and no longer needs to know about or access logstore. It simply processes what it's given, regardless of message origin.

Both entry points now consistently provide full event data to the delivery queue:

Flow Event data fetched at
Initial delivery Already has full event
Scheduled retry Retry scheduler (before publish)
Manual retry API handler (before publish)

alexluong and others added 4 commits January 22, 2026 14:20
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@vercel
Copy link

vercel bot commented Jan 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Review Updated (UTC)
outpost-docs Ready Ready Preview, Comment Jan 22, 2026 2:30pm
outpost-website Ready Ready Preview, Comment Jan 22, 2026 2:30pm

Request Review

@alexluong alexluong changed the base branch from main to model-delivery-event January 22, 2026 08:29
Extends TestRetryDelivery to verify that the manual retry API publishes
a DeliveryTask with complete event data to deliveryMQ. This ensures
the deliverymq handler receives full event data for manual retries,
consistent with the scheduled retry flow which fetches event data
in the retry scheduler.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
alexluong and others added 5 commits January 22, 2026 20:43
Add unit and e2e tests verifying that retries are not lost when the
retry scheduler queries logstore before the event has been persisted.

Test scenario:
1. Initial delivery fails, retry is scheduled
2. Retry scheduler queries logstore for event data
3. Event is not yet persisted (logmq batching delay)
4. Retry should remain in queue and be reprocessed later

Tests added:
- TestRetryScheduler_RaceCondition_EventNotYetPersisted (unit test)
- TestE2E_Regression_RetryRaceCondition (e2e test)

Also adds:
- RetryVisibilityTimeoutSeconds config option
- WithRetryVisibilityTimeout scheduler option
- mockDelayedEventGetter for simulating delayed persistence

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Return error instead of nil so the retry message stays in queue and
will be reprocessed after the visibility timeout.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants