Skip to content

Run StreamableHTTP transport tests in process instead of over sockets#2767

Merged
maxisbey merged 2 commits into
mainfrom
maxisbey/deflake-streamable-http-tests
Jun 3, 2026
Merged

Run StreamableHTTP transport tests in process instead of over sockets#2767
maxisbey merged 2 commits into
mainfrom
maxisbey/deflake-streamable-http-tests

Conversation

@maxisbey
Copy link
Copy Markdown
Contributor

@maxisbey maxisbey commented Jun 2, 2026

Final installment of the in-process test migration (#2764, #2765): tests/shared/test_streamable_http.py was the last file spawning uvicorn subprocesses on bind-then-close ports, the pattern that races under pytest-xdist when two workers pick the same ephemeral port. Two of its tests have flaked exactly this way under parallel load (test_server_validates_protocol_version_header, test_streamable_http_client_mcp_headers_override_defaults).

Closes #2704.

Motivation and Context

Same mechanism and fix as the previous two PRs. All four subprocess servers (basic, JSON-response, event-store, context) become in-process apps driven through StreamingASGITransport; with this, no test in the repo binds a port outside the one pre-bound websocket smoke test.

What changed beyond the plumbing swap:

  • The 19 raw-HTTP validation tests were synchronous requests-library tests; they're now anyio + httpx over the bridge with the same methods, headers, bodies, and assertions (including suppressing the library-default Accept header where the old tests did).
  • wait_for_server is deleted from tests/test_helpers.py — zero users remain. run_uvicorn_in_thread stays for tests/shared/test_ws.py.
  • Three # pragma: no cover in src/mcp/server/streamable_http.py are removed (close_standalone_sse_stream, its session-message callback, the JSON-mode Accept rejection): those lines were only ever executed inside the untraced subprocess; the migrated tests now cover them, and keeping the pragmas would fail strict-no-cover.
  • One stale src docstring note is dropped: close_standalone_sse_stream claimed client reconnection for standalone GET streams "is NOT implemented — this is a known gap", but the note was self-contradictory from the day it landed — the same commit (Add SSE polling support (SEP-1699) #1654) implemented the auto-reconnect and the test that proves it.
  • The long_running_with_checkpoints tool and the slow:// resource branch had no callers anywhere; they're deleted, and tool_with_multiple_stream_closes moves out of the shared server into the reworked reconnection test's inline server, which together move the tools/list count assertions from 10 to 8 in five tests. Unreachable handler branches elsewhere become dispatch asserts, since they now fail branch coverage instead of hiding in an untraced subprocess.
  • test_streamable_http_client_resumption's flag-polling loop (while not flag: sleep(0.1), unbounded) is now two anyio.Events awaited under fail_after(5).
  • test_get_sse_stream's 409 assertion is deterministic by construction (the first GET is held open across the second), where the old version's comment admitted it could race.

One test was further reworked rather than carried over: test_streamable_http_multiple_reconnections previously paced its server tool with fixed sleeps against the client's retry interval — a timing margin that could silently merge two close/reconnect cycles on a starved runner and break its exact token count. The tool is now gated on the client-side resumption-token callback observing each cycle's reconnect (the priming event is only sent after the server has re-registered the resumed stream, so the count becomes a consequence of causality rather than margins). No sleep remains in it, its retry interval drops to 50ms, and a genuinely failed reconnect now fails loudly at a fail_after(5) instead of silently merging cycles.

Deliberately retained elsewhere: the remaining genuinely time-based tests (the elapsed >= 0.4 retry-interval assertion, tool sleeps that create disconnect windows for single-cycle reconnect tests) keep their real-clock behaviour and byte-identical assertions — in-process execution strictly tightens their margins versus the subprocess version. A follow-up may apply the same gating pattern to those remaining sleeps; that's kept out of this PR so the harness swap stays auditable.

How Has This Been Tested?

  • ./scripts/test green: 1526 passed, 100% line+branch coverage including tests/, strict-no-cover clean
  • The file 3× solo and together with the four previously-migrated files under pytest-xdist -n 4, 3× — stable; the nine resumption/reconnection tests 5× solo
  • Targeted coverage runs confirm the three de-pragma'd src lines (and both arcs of the branchy one) are executed by this file alone

Breaking Changes

None — test-only, plus pragma/docstring removals in src.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Several tests here overlap with tests/interaction/transports/ coverage (resumption-token replay, stream-close auto-reconnect, priming events, standalone GET delivery); deduplication is inventoried for a follow-up ticket, kept out of scope so this diff remains a pure harness swap.

AI Disclaimer

maxisbey added 2 commits June 2, 2026 20:55
Final installment of the in-process test migration: this was the last
file spawning uvicorn subprocesses on bind-then-close ports with
readiness polling, which races under pytest-xdist when two workers pick
the same ephemeral port. Two tests in this file have flaked exactly
that way under parallel load.

All four subprocess servers (basic, JSON-response, event-store,
context-aware) become in-process apps served through the interaction
suite's StreamingASGITransport, held open by the session manager's
run() context. Raw `requests` calls become httpx calls against the
bridge client; the sync request-validation tests become anyio tests.
The second-GET-409 test now holds the first stream open by
construction, where the subprocess version noted it "might fail if the
first stream fully closed before this runs".

Assertions are unchanged, with documented exceptions now that the
server handlers run as traced in-process code:

- The long_running_with_checkpoints tool and the slow:// resource
  branch had no callers and are removed, so the expected tools/list
  count drops from 10 to 9 in five tests.
- Dead defensive arms become asserts (sampling non-text fallback,
  close_sse_stream truthiness checks, the context server's unknown-tool
  fallthrough and request checks), and the event store's
  replay-from-unknown-event arm becomes a lookup that requires a stored
  event, since unreachable branches now fail branch coverage instead of
  hiding in an untraced subprocess.
- test_client_crash_handled no longer sleeps between crashing clients;
  the bridge drains each client's teardown before the next connects.

Three pragmas in src/mcp/server/streamable_http.py covered only by the
formerly untraced subprocess (close_standalone_sse_stream, its session
message callback, and the JSON-mode Accept rejection) are now executed
by traced tests and removed.

With the last wait_for_server user migrated, the helper is deleted from
tests/test_helpers.py; run_uvicorn_in_thread stays for the websocket
smoke test.
The tool slept a fixed duration after each close_sse_stream() so the
client's retry-interval reconnect could land before the next close. That
is a timing margin, not synchronization: a reconnect delayed past the
margin made the close a silent no-op, two cycles merged, and the exact
resumption-token count failed.

The tool now waits, bounded by fail_after(5), for the client-side
resumption-token callback to observe each cycle's two new tokens (the
checkpoint and the new connection's priming event). The priming event is
sent only after the server has re-registered the resumed stream, so once
the client holds its token the next close is guaranteed to sever a live
connection. The token count becomes a consequence of causality rather
than margins, no sleep remains, the retry interval drops from 500ms to
50ms, and a genuinely failed reconnect now fails loudly at the timeout
instead of silently merging cycles.

The test now defines its own server with the gated tool inline, since
the gate closes over per-test state; the shared server's
tool_with_multiple_stream_closes had no other users and is removed,
which moves the tools/list count assertions from 9 to 8.
@maxisbey maxisbey marked this pull request as ready for review June 3, 2026 11:06
Copy link
Copy Markdown
Contributor

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't find any bugs — the src changes are limited to pragma/docstring removals with no behavior change — but this is a large rewrite of the streamable HTTP test suite (including a redesigned reconnection test), so it's worth a human look before merging.

Extended reasoning...

Overview

This PR is the final installment of the in-process test migration series (#2764, #2765). It rewrites tests/shared/test_streamable_http.py to drive all four test servers (basic, JSON-response, event-store, context) through StreamingASGITransport instead of spawning uvicorn subprocesses on bind-then-close ports. Source changes in src/mcp/server/streamable_http.py are limited to removing three # pragma: no cover markers and one stale docstring note — no behavior change. tests/test_helpers.py drops wait_for_server, which I verified has zero remaining callers; run_uvicorn_in_thread is retained for the websocket test.

Security risks

None of substance. The in-process test fixtures disable DNS-rebinding protection (enable_dns_rebinding_protection=False), but that protection is exercised separately in tests/server/test_streamable_http_security.py, and the change is confined to test fixtures — production defaults are untouched.

Level of scrutiny

The src diff is trivial, but the test rewrite is large (the entire file is restructured), converts 19 synchronous requests-based validation tests to async httpx, deletes two unused server tools (shifting tools/list count assertions from 10 to 8), and substantially redesigns test_streamable_http_multiple_reconnections around milestone events shared between the client-side resumption-token callback and the server tool. That synchronization redesign is clever but nontrivial, and a maintainer should confirm the new tests still exercise the intended behaviors with equivalent strength to the subprocess-based originals.

Other factors

The pattern follows the two already-merged sibling PRs, the author reports a green full run with 100% line+branch coverage and strict-no-cover clean, and the bug hunting system found no issues. The main residual risk is test-fidelity (weaker coverage or subtly changed assertions), which is exactly the kind of judgment call a human reviewer of the previous two PRs is best positioned to make — hence deferring rather than approving despite the absence of findings.

@maxisbey maxisbey merged commit 19fe9fa into main Jun 3, 2026
54 of 56 checks passed
@maxisbey maxisbey deleted the maxisbey/deflake-streamable-http-tests branch June 3, 2026 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky streamable-HTTP/SSE tests: TOCTOU port race under pytest -n auto

2 participants