fix(driver): patch polling crate for kqueue EVFILT_USER EV_CLEAR bug#931
Closed
paddor wants to merge 7 commits into
Closed
fix(driver): patch polling crate for kqueue EVFILT_USER EV_CLEAR bug#931paddor wants to merge 7 commits into
paddor wants to merge 7 commits into
Conversation
polling's notify() on kqueue triggers EVFILT_USER with EV_ADD but without EV_CLEAR, which switches the filter from edge-triggered to level-triggered after the first notification. This causes kevent() to return the notification event on every call, but Events::iter() filters it out (NOTIFY_KEY), so callers see empty results despite kevent() returning immediately. The driver misidentifies waker notifications as timeouts, stalling task wakeups on macOS. The fix is in the polling crate (paddor/polling@fix/kqueue-evfilt-user-clear): preserve EV_CLEAR in the trigger changelist. Added waker regression tests.
27ee4f8 to
ae4f4f8
Compare
Model the ZMTP handshake pattern: two spawned tasks on one runtime, each doing write-then-read for multiple rounds. Also add cancel-and- resubmit variants that race reads against short timeouts, mirroring the driver's select_biased! behavior.
Two fixes for a race where dropping a completed-but-unpolled read future loses data already consumed by recv(): 1. timeout: use select_biased! instead of select! so the inner future is always polled before the timer. When both are ready, the inner future wins, preventing the timer from dropping a completed read. 2. Submit::drop: skip cancel() when the key already has a result. A completed operation's buffer holds consumed data; canceling it would overwrite the result and drop the buffer.
AsyncFd's Read op always goes through the polling driver's Wait path (never tries the syscall eagerly), unlike TcpStream's Recv op. This matches omq's actual I/O path.
Matches omq's exact I/O pattern: writes go through compio TcpStream (Send op with eager pre_submit), reads go through AsyncFd on a dup'd fd (Read op with always-Wait pre_submit). Both reference the same kernel fd via separate descriptors.
Uses the same raw fd number for both TcpStream writes and AsyncFd reads, matching omq's SharedFd clone pattern.
Read op (via AsyncFd) always goes through kqueue (never eager recv). If cancel+re-add loses the readability event for already-buffered data, this test hangs.
Contributor
Author
|
Superseded by smol-rs/polling#265 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The
pollingcrate'sPoller::notify()on kqueue (macOS/BSD) triggersEVFILT_USERwithEV_ADDbut omitsEV_CLEAR. SinceEV_ADDmodifies the existing filter, this switches it from edge-triggered (auto-clearing) to level-triggered after the first notification. Subsequentkevent()calls always return the event immediately, butEvents::iter()filters it out (NOTIFY_KEY), so callers see empty results despitekevent()returning.Additionally,
wait_impl()doesn't reset thenotifiedflag on EINTR retry, suppressing concurrentnotify()calls.This causes every TCP/IPC test in our downstream project (omq) to time out on macOS. The fix is in the polling crate: paddor/polling@fix/kqueue-evfilt-user-clear.
I will submit a PR to smol-rs/polling as well.
Test plan
compio/tests/kqueue_repro.rscompio-runtime/tests/waker.rs