Skip to content

fix(driver): patch polling crate for kqueue EVFILT_USER EV_CLEAR bug#931

Closed
paddor wants to merge 7 commits into
compio-rs:masterfrom
paddor:fix/kqueue-polling-evfilt-user
Closed

fix(driver): patch polling crate for kqueue EVFILT_USER EV_CLEAR bug#931
paddor wants to merge 7 commits into
compio-rs:masterfrom
paddor:fix/kqueue-polling-evfilt-user

Conversation

@paddor
Copy link
Copy Markdown
Contributor

@paddor paddor commented May 23, 2026

Summary

The polling crate's Poller::notify() on kqueue (macOS/BSD) triggers EVFILT_USER with EV_ADD but omits EV_CLEAR. Since EV_ADD modifies the existing filter, this switches it from edge-triggered (auto-clearing) to level-triggered after the first notification. Subsequent kevent() calls always return the event immediately, but Events::iter() filters it out (NOTIFY_KEY), so callers see empty results despite kevent() returning.

Additionally, wait_impl() doesn't reset the notified flag on EINTR retry, suppressing concurrent notify() calls.

This causes every TCP/IPC test in our downstream project (omq) to time out on macOS. The fix is in the polling crate: paddor/polling@fix/kqueue-evfilt-user-clear.

I will submit a PR to smol-rs/polling as well.

Test plan

  • Added TCP/IPC roundtrip regression tests in compio/tests/kqueue_repro.rs
  • Added waker timing tests in compio-runtime/tests/waker.rs

@github-actions github-actions Bot added bug Something isn't working package: driver Related to compio-driver labels May 23, 2026
polling's notify() on kqueue triggers EVFILT_USER with EV_ADD but
without EV_CLEAR, which switches the filter from edge-triggered to
level-triggered after the first notification. This causes kevent()
to return the notification event on every call, but Events::iter()
filters it out (NOTIFY_KEY), so callers see empty results despite
kevent() returning immediately. The driver misidentifies waker
notifications as timeouts, stalling task wakeups on macOS.

The fix is in the polling crate (paddor/polling@fix/kqueue-evfilt-user-clear):
preserve EV_CLEAR in the trigger changelist.

Added waker regression tests.
@paddor paddor force-pushed the fix/kqueue-polling-evfilt-user branch from 27ee4f8 to ae4f4f8 Compare May 23, 2026 15:50
paddor added 6 commits May 23, 2026 20:07
Model the ZMTP handshake pattern: two spawned tasks on one runtime,
each doing write-then-read for multiple rounds. Also add cancel-and-
resubmit variants that race reads against short timeouts, mirroring
the driver's select_biased! behavior.
Two fixes for a race where dropping a completed-but-unpolled read
future loses data already consumed by recv():

1. timeout: use select_biased! instead of select! so the inner
   future is always polled before the timer. When both are ready,
   the inner future wins, preventing the timer from dropping a
   completed read.

2. Submit::drop: skip cancel() when the key already has a result.
   A completed operation's buffer holds consumed data; canceling
   it would overwrite the result and drop the buffer.
AsyncFd's Read op always goes through the polling driver's Wait path
(never tries the syscall eagerly), unlike TcpStream's Recv op. This
matches omq's actual I/O path.
Matches omq's exact I/O pattern: writes go through compio TcpStream
(Send op with eager pre_submit), reads go through AsyncFd on a dup'd
fd (Read op with always-Wait pre_submit). Both reference the same
kernel fd via separate descriptors.
Uses the same raw fd number for both TcpStream writes and AsyncFd
reads, matching omq's SharedFd clone pattern.
Read op (via AsyncFd) always goes through kqueue (never eager recv).
If cancel+re-add loses the readability event for already-buffered
data, this test hangs.
@paddor
Copy link
Copy Markdown
Contributor Author

paddor commented May 24, 2026

Superseded by smol-rs/polling#265

@paddor paddor closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working package: driver Related to compio-driver

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant