Skip to content

Fix stale CASE exchange blocking new session handshake#16

Open
jonlil wants to merge 2 commits into
sysgrok:mainfrom
viska-ab:fix/case-stale-exchange
Open

Fix stale CASE exchange blocking new session handshake#16
jonlil wants to merge 2 commits into
sysgrok:mainfrom
viska-ab:fix/case-stale-exchange

Conversation

@jonlil
Copy link
Copy Markdown

@jonlil jonlil commented Apr 11, 2026

Problem

When a Matter client restarts and sends a new CASESigma1 reusing the same exchange ID as an in-progress handshake, the old exchange is still awaiting CASESigma3. The new Sigma1 gets routed to the stale exchange, producing:

Invalid opcode: CASESigma1, expected: CASESigma3

The handshake fails and the client must wait through its retry/backoff cycle (typically 30-60s) before succeeding with a different exchange ID.

This is easily reproducible by restarting a Matter controller that connects to an rs-matter server — the first reconnect attempt always fails.

Fix

In Session::post_recv(), when an existing exchange receives a new session request (CASESigma1 or PBKDFParamRequest), mark the stale exchange as Dropped and fall through to create a fresh one. The new handshake proceeds immediately from scratch.

Uses the existing MessageMeta::is_new_session() check and Role::set_dropped_state() — no new types or APIs needed.

Testing

Tested with a Zigbee-to-Matter bridge (rs-matter server) and a Matter controller client (also rs-matter based). Before the fix, every server restart caused a 30-60s reconnect delay. After the fix, the client reconnects on its first attempt.

@jonlil jonlil force-pushed the fix/case-stale-exchange branch from 5043e71 to 30715b1 Compare April 11, 2026 09:58
When a Matter client restarts and sends a new CASESigma1 reusing
the same exchange ID as an in-progress handshake, the old exchange
is still awaiting CASESigma3. The new Sigma1 gets routed to the
stale exchange, producing "Invalid opcode: CASESigma1, expected:
CASESigma3" and the handshake fails.

The client must then wait through its retry/backoff cycle (typically
30-60s) before succeeding on a subsequent attempt with a different
exchange ID.

Fix: in Session::post_recv(), when an existing exchange receives a
new session request (CASESigma1 or PBKDFParamRequest), remove the
stale exchange via remove_exch() and fall through to create a fresh
one. remove_exch() either clears the slot immediately (allowing the
new exchange to reuse it) or marks it as Dropped if MRP operations
are pending.
@jonlil jonlil force-pushed the fix/case-stale-exchange branch from 30715b1 to 30dd689 Compare April 11, 2026 10:02
When a stale exchange is evicted by post_recv (new CASESigma1 on an
existing exchange), the slot is set to None. The old Exchange object
still holds a reference to that slot index. When it is eventually
dropped, Drop calls remove_exch on the already-cleared slot, causing
an unwrap panic.

Handle the None case gracefully by returning true (already cleaned up)
instead of panicking.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant