test(scripts): cross-platform MCP handshake tests for first-install (#125) by aeneasr · Pull Request #129 · ory/lumen

aeneasr · 2026-04-14T10:05:30Z

Summary

Adds a deliberately failing end-to-end test suite in scripts/test_run.sh and a new scripts/test_run_windows.ps1 that reproduce the first-install MCP failure from #125 on both POSIX and Windows launchers. Expect CI to go red on ubuntu, macos, and windows — that is the point of this PR.

Why the existing CI suite missed the bug

Two gaps, stacked:

No end-to-end invocation of the launchers. The previous stdio early-exit guard tests block reimplemented the guard condition inline as a local helper and asserted "the guard fires for the stdio arg" as correct behaviour — codifying the bug as the spec. Nothing actually ran bash run.sh stdio (let alone run.bat stdio) with a missing binary.
No Windows coverage at all for run.bat's first-install code path. run.bat has the identical fast-fail guard at lines 28–32. fix(scripts): download binary synchronously in stdio mode on first install #127 only fixes POSIX; a Windows-only fix still needs to land, and nothing in CI would have told us that.

What this PR adds

scripts/testdata/mock_mcp_server/main.go — a tiny pure-Go mock MCP server. Reads one line from stdin, and if it is a JSON-RPC initialize request it writes a valid MCP initialize response and exits 0. Cross-compiles on all three runners, no CGO.
Real MCP handshake in test_run.sh — replaces the old stub-curl-plus-exit 0 test. The test now builds the mock, stubs curl in PATH to drop the mock binary into place, pipes a real initialize request into bash run.sh stdio, and asserts the stdout response contains "jsonrpc":"2.0" and "name":"mock-lumen". A pass transitively proves the launcher didn't fast-exit, reached the download path, wrote the artefact where it would exec it, chmod'd it executable, and exec'd it with stdin/stdout inherited correctly.
scripts/test_run_windows.ps1 — same shape, against run.bat, via PowerShell. Uses a curl.bat stub on a PATH-prepended dir (cmd.exe's PATHEXT lookup finds the .bat first).
CI matrix — scripts job expanded to ubuntu-latest + macos-latest + windows-latest, fail-fast: false, actions/setup-go added. New pwsh step runs the Windows launcher test.

Bugs this kind of test catches that the old one couldn't

Beyond the #125 fast-fail, a stubbed-curl / stubbed-binary test can pass even when the launcher has any of these real bugs:

Launcher closes stdin or stdout before exec
Launcher writes the binary to one path but execs from another
Launcher forgets to chmod +x
Binary actually runs but doesn't speak MCP (broken flag parsing, crash on startup, etc.)
CRLF / quoting mishaps in run.bat that prevent the process from ever starting

A real MCP handshake against a real process is the only way to catch these.

Red / green

Against main today → RED on POSIX and Windows (both launchers still contain the stdio fast-fail)
Against fix(scripts): download binary synchronously in stdio mode on first install #127 + an equivalent run.bat fix → GREEN on all three OSes

Next steps

Merge #127 and add the matching run.bat change to flip both tests green, or fold these commits into that PR as its TDD red-phase.

🤖 Generated with Claude Code

Demonstrates the bug in #125: invoking `run.sh stdio` without a binary (the Claude Code first-install flow) fast-exits instead of downloading. Because Claude Code does not retry failed MCP servers, this leaves the `lumen` tools unavailable for the entire session. The existing `stdio early-exit guard tests` block never executes run.sh — it reimplements the guard condition inline and asserts "the guard fires for stdio" as correct behaviour, codifying the bug. This new integration test stubs `curl` in PATH, creates a temporary plugin root with a manifest but no binary, runs `bash run.sh stdio`, and asserts exit 0 + that the expected binary file is produced. Red now; turns green once the stdio fast-fail is removed (see #127). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The existing stdio first-install integration test (added earlier in this PR) stubbed curl to write `#!/bin/sh; exit 0` as the "downloaded binary" and only checked the launcher exited 0 with a file on disk. That passes on any launcher that reaches the download step — it can't distinguish a launcher that actually execs a working MCP server from one that swallows stdin/stdout, closes pipes early, or silently drops to the wrong binary path. It also tested only run.sh; run.bat (with the same fast-fail bug) was untested end-to-end. Replaces the stub with a real mock MCP server (scripts/testdata/ mock_mcp_server/main.go — pure Go, no CGO, cross-compiles on all three OSes). The test now: - Builds the mock for the current OS/arch - Creates a clean plugin root with a manifest but no bin/ - Stubs curl (POSIX) / curl.bat (Windows) to copy the mock into the path the launcher will exec - Pipes a real JSON-RPC `initialize` request on stdin - Asserts the stdout response contains `jsonrpc: "2.0"` AND `serverInfo.name == "mock-lumen"` A passing test transitively proves the launcher did not fast-exit in stdio mode, reached the download code path, wrote the artefact where it would exec it, set it executable, and exec'd it with stdin/stdout inherited correctly — the actual MCP-server-alive contract. Adds scripts/test_run_windows.ps1 (same shape, against run.bat) and wires it into the CI `scripts` matrix as a new pwsh step. Also expands the matrix to ubuntu-latest + macos-latest + windows-latest and adds actions/setup-go so the mock can be built. Both the POSIX test and the Windows test are RED against the current launchers because run.sh AND run.bat both still contain the stdio fast-fail guard — merging #127 plus an equivalent run.bat change will flip them green. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previous version used `return` inside a try/finally block in a dot-sourced script, which exits the script without running the summary or `exit 1` line — making CI mark a real failure as success. Also relied on Start-Process -RedirectStandardInput, which doesn't reliably propagate the child's exit code back through PowerShell. Rewrites both: uses System.Diagnostics.Process directly for deterministic stdin/stdout/stderr wiring and exit-code reporting, and restructures into an if/elseif chain so all paths flow through the final summary + exit logic. The test will now correctly fail the Windows job when run.bat fast- exits in stdio mode. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When run.bat hits the stdio fast-fail and `exit /b 1`s, its stdin pipe is already closed by the time PowerShell tries to write the initialize request — producing a `pipe is being closed` exception that masks the real diagnostic. Catch it and let the exit-code check produce the proper "#125 — MCP server would be dead for the session" failure message. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

aeneasr and others added 2 commits April 14, 2026 12:05

aeneasr changed the title ~~test(scripts): failing integration test for stdio first-install bug (#125)~~ test(scripts): cross-platform MCP handshake tests for first-install (#125) Apr 14, 2026

aeneasr and others added 2 commits April 14, 2026 12:23

aeneasr merged commit 656bb47 into main Apr 14, 2026
6 of 9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(scripts): cross-platform MCP handshake tests for first-install (#125)#129

test(scripts): cross-platform MCP handshake tests for first-install (#125)#129
aeneasr merged 4 commits intomainfrom
test/stdio-first-install-red

aeneasr commented Apr 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aeneasr commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why the existing CI suite missed the bug

What this PR adds

Bugs this kind of test catches that the old one couldn't

Red / green

Next steps

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aeneasr commented Apr 14, 2026 •

edited

Loading