Skip to content

Conversation

@NagyZoltanPeter
Copy link
Contributor

@NagyZoltanPeter NagyZoltanPeter commented Oct 8, 2025

Summary

Fix a flaky reconciliation test in tests/waku_store_sync/test_protocol.nim that intermittently failed on macOS due to timestamp drift between message creation and the reconciliation sync window.

Root cause

  • The test created message timestamps using getNowInNanosecondTime() while the reconciliation time window was computed separately via calculateTimeRange().
  • On macOS, slower scheduling meant those separate "now" calls diverged enough to misalign the intended sync window and the generated message timestamps.
  • Result: messages expected to be “in-window” were sometimes treated as “out-window,” causing assertions like:
    • remoteNeeds.len == 0 while diffInWin == 18

Solution

  • Anchor all test time calculations to a single fixed baseTime captured once at the start of the test.
  • Define the intended sync window as [baseTime - 1s, baseTime].
  • Align the reconciliation’s effective “now” with baseTime by passing an explicit offset:
    • offset = actualNow - baseTime
    • Because calculateTimeRange(now, jitter) uses now = getNow() - jitter, setting jitter = actualNow - baseTime yields now_effective = baseTime.
  • Generate messages deterministically:
    • 18 in-window messages strictly inside the window with 50 ms margins from both ends.
    • 20 out-window messages placed well before the sync window to avoid overlap.
  • Use consistent signed int64 arithmetic for timestamp math.

Implementation details

  • File changed: tests/waku_store_sync/test_protocol.nim
    • Capture baseTime once and compute syncWindowStart/End from it.
    • Create in-window and out-window message timestamps relative to baseTime with clear buffers.
    • Compute timeOffset = actualNow - baseTime and pass it to storeSynchronization(offset = timeOffset) so the window is [baseTime - 1s, baseTime].
    • Ensure all timestamp operations use int64 to avoid signed/unsigned pitfalls.

Verification

  • macOS: make test tests/waku_store_sync/test_protocol.nim passes locally.
    • Reconciliation correctly detects 18 in-window messages and ignores the 20 out-window messages.
  • Recommendation: run CI matrix (macOS + Ubuntu) to validate cross-platform stability.

Scope / Risk

  • Test-only change; no production reconciliation logic altered.
  • Eliminates timing race by ensuring a deterministic relationship between message timestamps and the sync window.

Notes

  • Consider extracting test time helpers for future tests that depend on stable time ranges.

…g timestamps to a fixed baseTime and aligning sync window via offset

Root cause:
- Separate calls to getNowInNanosecondTime() for test message timestamps and sync window boundaries led to mismatched ranges on slower macOS runs.
- This caused in-window messages to be miscategorized as out-window, producing assertion failures (e.g., remoteNeeds.len==0 while diffInWin==18).

Fix:
- Introduce a single fixed baseTime at test start and construct all message timestamps relative to it.
- Define the intended sync window as [baseTime - 1s, baseTime] and compute the reconciliation offset so effective now equals baseTime (offset = actualNow - baseTime).
- Place in-window messages safely inside the window with 50ms margins, and out-window messages well before the window to avoid overlap.
- Use consistent int64 timestamp arithmetic.

Result:
- The test is deterministic and passes reliably on macOS; eliminates the timing race. Recommend running CI matrix (macOS + Ubuntu) to confirm cross-platform stability.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Fix a flaky reconciliation test on macOS by implementing deterministic timestamp handling to eliminate timing race conditions between message creation and sync window computation.

  • Anchors all time calculations to a single fixed baseTime to prevent timestamp drift
  • Aligns the reconciliation sync window with message timestamps using a calculated offset
  • Replaces relative timestamp calculations with absolute positioning based on the fixed base time

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +545 to +546
let inWinStart = Timestamp(syncWindowStart.int64 + 50_000_000'i64) # 50ms after window starts
let inWinEnd = Timestamp(syncWindowEnd.int64 - 50_000_000'i64) # 50ms before window ends
Copy link

Copilot AI Oct 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic number 50_000_000'i64 (50ms) is repeated. Consider extracting it as a named constant like BUFFER_MARGIN_NS to improve maintainability and make the buffer size configurable.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link

github-actions bot commented Oct 8, 2025

You can find the image built from this PR at

quay.io/wakuorg/nwaku-pr:3619

Built from 4f435a4

@SionoiS
Copy link
Contributor

SionoiS commented Oct 8, 2025

Ah clever! It use the offset as a way to fix the time window.

Copy link
Collaborator

@Ivansete-status Ivansete-status left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for it! 💯

@NagyZoltanPeter
Copy link
Contributor Author

Seems still not ok. Now ci/test-ubuntu-22.04 failed with this exact test case (While macos ci succeeded):

2025-10-08T16:33:09.8707124Z ===============================================================
2025-10-08T16:33:09.8714059Z   /home/runner/work/nwaku/nwaku/build/all_tests_waku 'Waku Sync: reconciliation::sync 2 nodes, 40 msgs: 18 in-window diff, 20 out-window ignored'
2025-10-08T16:33:09.8714677Z ---------------------------------------------------------------
2025-10-08T16:33:09.8715235Z     /home/runner/work/nwaku/nwaku/tests/waku_store_sync/test_protocol.nim(605, 26): Check failed: remoteNeeds.len == diffInWin
2025-10-08T16:33:09.8715888Z     remoteNeeds.len was 12
2025-10-08T16:33:09.8716090Z     diffInWin was 18
2025-10-08T16:33:09.8716551Z     /home/runner/work/nwaku/nwaku/tests/waku_store_sync/test_protocol.nim(609, 28): Check failed: deliveredHashes in inWinHashes
2025-10-08T16:33:09.8717055Z     deliveredHashes was 0x5862...86a6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants