Skip to content

Return ErrNoCandidatePairs before ICE connects#918

Open
zshang-oai wants to merge 1 commit into
pion:mainfrom
zshang-oai:codex/err-no-candidate-pairs
Open

Return ErrNoCandidatePairs before ICE connects#918
zshang-oai wants to merge 1 commit into
pion:mainfrom
zshang-oai:codex/err-no-candidate-pairs

Conversation

@zshang-oai
Copy link
Copy Markdown
Contributor

Summary

Return ErrNoCandidatePairs from Conn.Write when ICE has not selected a pair yet and there is no valid fallback pair available.

Why

Conn.Write previously returned (0, nil) in that case because the task-loop lookup itself succeeded even though no packet could be written. That makes an early caller believe the write was accepted even though zero bytes were sent.

Changes

  • return ErrNoCandidatePairs when Conn.Write has no selected or valid pair
  • add a regression test for writes immediately after StartDial
  • update affected retry-style tests to treat the pre-connection error as transient
  • update the binding-request test to assert the explicit no-pair failure

Tests

  • go test -run '^TestStartDialConnWriteBeforeConnectReturnsError$' -count=1 -timeout 120s .
  • go test -run '^TestBindingRequestHandler$' -count=1 -timeout 120s .
  • go test -run '^TestWriteUseValidPair$' -count=1 -timeout 120s .
  • go test -run '^TestConn_Write_RejectsSTUN$' -count=1 -timeout 120s .

@zshang-oai zshang-oai marked this pull request as ready for review May 8, 2026 04:59
@zshang-oai zshang-oai requested review from JoTurk, Copilot and fippo May 8, 2026 04:59
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Conn.Write behavior before ICE has selected a candidate pair (and no valid fallback exists) by returning ErrNoCandidatePairs instead of incorrectly returning (0, nil), and updates/extends tests to assert and tolerate this transient pre-connection error.

Changes:

  • Update Conn.Write to return ErrNoCandidatePairs when neither a selected pair nor a best valid pair is available.
  • Add a regression test covering writes immediately after StartDial.
  • Update existing tests to explicitly assert/ignore the pre-connection ErrNoCandidatePairs behavior where appropriate.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.

File Description
transport.go Return ErrNoCandidatePairs when no selected/fallback candidate pair exists during Conn.Write.
transport_test.go Add regression test ensuring early Conn.Write returns ErrNoCandidatePairs and sends 0 bytes.
selection_test.go Update binding-request handler test to assert the explicit no-pair write failure.
connectivity_vnet_test.go Treat ErrNoCandidatePairs as transient during retry-style write loop until connectivity is established.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 8, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.53%. Comparing base (2f58c00) to head (71316d9).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #918      +/-   ##
==========================================
- Coverage   88.56%   88.53%   -0.04%     
==========================================
  Files          45       45              
  Lines        5886     5886              
==========================================
- Hits         5213     5211       -2     
- Misses        463      464       +1     
- Partials      210      211       +1     
Flag Coverage Δ
go 88.53% <100.00%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@JoTurk JoTurk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello, I agree with this change, but I'm worried that it might break something at the end user, or at pion/webrtc, so I'm going to soft-block this until i get to manually test pion/webrtc with this change, (feel free to test it instead).
It shouldn't break pion/webrtc because we do handle this error at the mux writer https://github.com/pion/webrtc/blob/9654161f9b698b2d9a2df3dd5a676b20e881a7b3/internal/mux/endpoint.go#L59 but you can't really know for sure sometimes :)

@zshang-oai
Copy link
Copy Markdown
Contributor Author

@fippo and cc @JoTurk, the mux behavior mentioned above will become a bug after WARP.

If happens when

  • DTLS finishes from piggybacked handshake traffic
  • ICE has not yet published a selected pair, or has no best valid pair available to ordinary writes
  • SCTP/SNAP immediately starts writing post-handshake traffic

And due to SNAP, we consider data channel to be open immediately and the mutex will swallow the wrote silently because ICE is actually not writable.

We will fix the bug for our scenario but if you agree, we are happy to upstream it. (I guess @JoTurk want to be more safe :) )

@JoTurk
Copy link
Copy Markdown
Member

JoTurk commented May 12, 2026

@zshang-oai Thank you for the report and fix, I just want to make sure that weren't breaking our normal path in webrtc, our dtls handshake/fsm is fragile and you can never know. We had a similar change few months ago that broke fragmented dtls handshake. I'll try to test this with pion/webrtc soon. If you can test it yourself (or maybe codex) that would be also cool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants