Return ErrNoCandidatePairs before ICE connects#918
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes Conn.Write behavior before ICE has selected a candidate pair (and no valid fallback exists) by returning ErrNoCandidatePairs instead of incorrectly returning (0, nil), and updates/extends tests to assert and tolerate this transient pre-connection error.
Changes:
- Update
Conn.Writeto returnErrNoCandidatePairswhen neither a selected pair nor a best valid pair is available. - Add a regression test covering writes immediately after
StartDial. - Update existing tests to explicitly assert/ignore the pre-connection
ErrNoCandidatePairsbehavior where appropriate.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| transport.go | Return ErrNoCandidatePairs when no selected/fallback candidate pair exists during Conn.Write. |
| transport_test.go | Add regression test ensuring early Conn.Write returns ErrNoCandidatePairs and sends 0 bytes. |
| selection_test.go | Update binding-request handler test to assert the explicit no-pair write failure. |
| connectivity_vnet_test.go | Treat ErrNoCandidatePairs as transient during retry-style write loop until connectivity is established. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #918 +/- ##
==========================================
- Coverage 88.56% 88.53% -0.04%
==========================================
Files 45 45
Lines 5886 5886
==========================================
- Hits 5213 5211 -2
- Misses 463 464 +1
- Partials 210 211 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
JoTurk
left a comment
There was a problem hiding this comment.
Hello, I agree with this change, but I'm worried that it might break something at the end user, or at pion/webrtc, so I'm going to soft-block this until i get to manually test pion/webrtc with this change, (feel free to test it instead).
It shouldn't break pion/webrtc because we do handle this error at the mux writer https://github.com/pion/webrtc/blob/9654161f9b698b2d9a2df3dd5a676b20e881a7b3/internal/mux/endpoint.go#L59 but you can't really know for sure sometimes :)
|
@fippo and cc @JoTurk, the mux behavior mentioned above will become a bug after WARP. If happens when
And due to SNAP, we consider data channel to be open immediately and the mutex will swallow the wrote silently because ICE is actually not writable. We will fix the bug for our scenario but if you agree, we are happy to upstream it. (I guess @JoTurk want to be more safe :) ) |
|
@zshang-oai Thank you for the report and fix, I just want to make sure that weren't breaking our normal path in webrtc, our dtls handshake/fsm is fragile and you can never know. We had a similar change few months ago that broke fragmented dtls handshake. I'll try to test this with pion/webrtc soon. If you can test it yourself (or maybe codex) that would be also cool. |
Summary
Return
ErrNoCandidatePairsfromConn.Writewhen ICE has not selected a pair yet and there is no valid fallback pair available.Why
Conn.Writepreviously returned(0, nil)in that case because the task-loop lookup itself succeeded even though no packet could be written. That makes an early caller believe the write was accepted even though zero bytes were sent.Changes
ErrNoCandidatePairswhenConn.Writehas no selected or valid pairStartDialTests
go test -run '^TestStartDialConnWriteBeforeConnectReturnsError$' -count=1 -timeout 120s .go test -run '^TestBindingRequestHandler$' -count=1 -timeout 120s .go test -run '^TestWriteUseValidPair$' -count=1 -timeout 120s .go test -run '^TestConn_Write_RejectsSTUN$' -count=1 -timeout 120s .