Skip to content

Commit 9b9131d

Browse files
jasonLasterclaudecodexjuliusmarminge
authored
Stabilize runtime orchestration and fix flaky CI tests (pingdotgg#488)
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: codex <codex@users.noreply.github.com> Co-authored-by: Julius Marminge <julius0216@outlook.com>
1 parent afb3ca3 commit 9b9131d

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

47 files changed

+2255
-624
lines changed

.docs/architecture.md

Lines changed: 125 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,139 @@ T3 Code runs as a **Node.js WebSocket server** that wraps `codex app-server` (JS
55
```
66
┌─────────────────────────────────┐
77
│ Browser (React + Vite) │
8-
│ Connected via WebSocket │
8+
│ wsTransport (state machine) │
9+
│ Typed push decode at boundary │
910
└──────────┬──────────────────────┘
1011
│ ws://localhost:3773
1112
┌──────────▼──────────────────────┐
1213
│ apps/server (Node.js) │
1314
│ WebSocket + HTTP static server │
14-
│ ProviderManager │
15-
│ CodexAppServerManager │
15+
│ ServerPushBus (ordered pushes) │
16+
│ ServerReadiness (startup gate) │
17+
│ OrchestrationEngine │
18+
│ ProviderService │
19+
│ CheckpointReactor │
20+
│ RuntimeReceiptBus │
1621
└──────────┬──────────────────────┘
1722
│ JSON-RPC over stdio
1823
┌──────────▼──────────────────────┐
1924
│ codex app-server │
2025
└─────────────────────────────────┘
2126
```
27+
28+
## Components
29+
30+
- **Browser app**: The React app renders session state, owns the client-side WebSocket transport, and treats typed push events as the boundary between server runtime details and UI state.
31+
32+
- **Server**: `apps/server` is the main coordinator. It serves the web app, accepts WebSocket requests, waits for startup readiness before welcoming clients, and sends all outbound pushes through a single ordered push path.
33+
34+
- **Provider runtime**: `codex app-server` does the actual provider/session work. The server talks to it over JSON-RPC on stdio and translates those runtime events into the app's orchestration model.
35+
36+
- **Background workers**: Long-running async flows such as runtime ingestion, command reaction, and checkpoint processing run as queue-backed workers. This keeps work ordered, reduces timing races, and gives tests a deterministic way to wait for the system to go idle.
37+
38+
- **Runtime signals**: The server emits lightweight typed receipts when important async milestones finish, such as checkpoint capture, diff finalization, or a turn becoming fully quiescent. Tests and orchestration code wait on these signals instead of polling internal state.
39+
40+
## Event Lifecycle
41+
42+
### Startup and client connect
43+
44+
```mermaid
45+
sequenceDiagram
46+
participant Browser
47+
participant Transport as WsTransport
48+
participant Server as wsServer
49+
participant Layers as serverLayers
50+
participant Ready as ServerReadiness
51+
participant Push as ServerPushBus
52+
53+
Browser->>Transport: Load app and open WebSocket
54+
Transport->>Server: Connect
55+
Server->>Layers: Start runtime services
56+
Server->>Ready: Wait for startup barriers
57+
Ready-->>Server: Ready
58+
Server->>Push: Publish server.welcome
59+
Push-->>Transport: Ordered welcome push
60+
Transport-->>Browser: Hydrate initial state
61+
```
62+
63+
1. The browser boots [`WsTransport`][1] and registers typed listeners in [`wsNativeApi`][2].
64+
2. The server accepts the connection in [`wsServer`][3] and brings up the runtime graph defined in [`serverLayers`][7].
65+
3. [`ServerReadiness`][4] waits until the key startup barriers are complete.
66+
4. Once the server is ready, [`wsServer`][3] sends `server.welcome` from the contracts in [`ws.ts`][6] through [`ServerPushBus`][5].
67+
5. The browser receives that ordered push through [`WsTransport`][1], and [`wsNativeApi`][2] uses it to seed local client state.
68+
69+
### User turn flow
70+
71+
```mermaid
72+
sequenceDiagram
73+
participant Browser
74+
participant Transport as WsTransport
75+
participant Server as wsServer
76+
participant Provider as ProviderService
77+
participant Codex as codex app-server
78+
participant Ingest as ProviderRuntimeIngestion
79+
participant Engine as OrchestrationEngine
80+
participant Push as ServerPushBus
81+
82+
Browser->>Transport: Send user action
83+
Transport->>Server: Typed WebSocket request
84+
Server->>Provider: Route request
85+
Provider->>Codex: JSON-RPC over stdio
86+
Codex-->>Ingest: Provider runtime events
87+
Ingest->>Engine: Normalize into orchestration events
88+
Engine-->>Server: Domain events
89+
Server->>Push: Publish orchestration.domainEvent
90+
Push-->>Browser: Typed push
91+
```
92+
93+
1. A user action in the browser becomes a typed request through [`WsTransport`][1] and the browser API layer in [`nativeApi`][12].
94+
2. [`wsServer`][3] decodes that request using the shared WebSocket contracts in [`ws.ts`][6] and routes it to the right service.
95+
3. [`ProviderService`][8] starts or resumes a session and talks to `codex app-server` over JSON-RPC on stdio.
96+
4. Provider-native events are pulled back into the server by [`ProviderRuntimeIngestion`][9], which converts them into orchestration events.
97+
5. [`OrchestrationEngine`][10] persists those events, updates the read model, and exposes them as domain events.
98+
6. [`wsServer`][3] pushes those updates to the browser through [`ServerPushBus`][5] on channels defined in [`orchestration.ts`][11].
99+
100+
### Async completion flow
101+
102+
```mermaid
103+
sequenceDiagram
104+
participant Server as wsServer
105+
participant Worker as Queue-backed workers
106+
participant Cmd as ProviderCommandReactor
107+
participant Checkpoint as CheckpointReactor
108+
participant Receipt as RuntimeReceiptBus
109+
participant Push as ServerPushBus
110+
participant Browser
111+
112+
Server->>Worker: Enqueue follow-up work
113+
Worker->>Cmd: Process provider commands
114+
Worker->>Checkpoint: Process checkpoint tasks
115+
Checkpoint->>Receipt: Publish completion receipt
116+
Cmd-->>Server: Produce orchestration changes
117+
Checkpoint-->>Server: Produce orchestration changes
118+
Server->>Push: Publish resulting state updates
119+
Push-->>Browser: User-visible push
120+
```
121+
122+
1. Some work continues after the initial request returns, especially in [`ProviderRuntimeIngestion`][9], [`ProviderCommandReactor`][13], and [`CheckpointReactor`][14].
123+
2. These flows run as queue-backed workers using [`DrainableWorker`][16], which helps keep side effects ordered and test synchronization deterministic.
124+
3. When a milestone completes, the server emits a typed receipt on [`RuntimeReceiptBus`][15], such as checkpoint completion or turn quiescence.
125+
4. Tests and orchestration code wait on those receipts instead of polling git state, projections, or timers.
126+
5. Any user-visible state changes produced by that async work still go back through [`wsServer`][3] and [`ServerPushBus`][5].
127+
128+
[1]: ../apps/web/src/wsTransport.ts
129+
[2]: ../apps/web/src/wsNativeApi.ts
130+
[3]: ../apps/server/src/wsServer.ts
131+
[4]: ../apps/server/src/wsServer/readiness.ts
132+
[5]: ../apps/server/src/wsServer/pushBus.ts
133+
[6]: ../packages/contracts/src/ws.ts
134+
[7]: ../apps/server/src/serverLayers.ts
135+
[8]: ../apps/server/src/provider/Layers/ProviderService.ts
136+
[9]: ../apps/server/src/orchestration/Layers/ProviderRuntimeIngestion.ts
137+
[10]: ../apps/server/src/orchestration/Layers/OrchestrationEngine.ts
138+
[11]: ../packages/contracts/src/orchestration.ts
139+
[12]: ../apps/web/src/nativeApi.ts
140+
[13]: ../apps/server/src/orchestration/Layers/ProviderCommandReactor.ts
141+
[14]: ../apps/server/src/orchestration/Layers/CheckpointReactor.ts
142+
[15]: ../apps/server/src/orchestration/Layers/RuntimeReceiptBus.ts
143+
[16]: ../packages/shared/src/DrainableWorker.ts

.docs/provider-architecture.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@
33
The web app communicates with the server via WebSocket using a simple JSON-RPC-style protocol:
44

55
- **Request/Response**: `{ id, method, params }``{ id, result }` or `{ id, error }`
6-
- **Push events**: `{ type: "push", channel, data }` for orchestration read-model updates
6+
- **Push events**: typed envelopes with `channel`, `sequence` (monotonic per connection), and channel-specific `data`
7+
8+
Push channels: `server.welcome`, `server.configUpdated`, `terminal.event`, `orchestration.domainEvent`. Payloads are schema-validated at the transport boundary (`wsTransport.ts`). Decode failures produce structured `WsDecodeDiagnostic` with `code`, `reason`, and path info.
79

810
Methods mirror the `NativeApi` interface defined in `@t3tools/contracts`:
911

@@ -12,3 +14,17 @@ Methods mirror the `NativeApi` interface defined in `@t3tools/contracts`:
1214
- `shell.openInEditor`, `server.getConfig`
1315

1416
Codex is the only implemented provider. `claudeCode` is reserved in contracts/UI.
17+
18+
## Client transport
19+
20+
`wsTransport.ts` manages connection state: `connecting``open``reconnecting``closed``disposed`. Outbound requests are queued while disconnected and flushed on reconnect. Inbound pushes are decoded and validated at the boundary, then cached per channel. Subscribers can opt into `replayLatest` to receive the last push on subscribe.
21+
22+
## Server-side orchestration layers
23+
24+
Provider runtime events flow through queue-based workers:
25+
26+
1. **ProviderRuntimeIngestion** — consumes provider runtime streams, emits orchestration commands
27+
2. **ProviderCommandReactor** — reacts to orchestration intent events, dispatches provider calls
28+
3. **CheckpointReactor** — captures git checkpoints on turn start/complete, publishes runtime receipts
29+
30+
All three use `DrainableWorker` internally and expose `drain()` for deterministic test synchronization.

.docs/workspace-layout.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@
33
- `/apps/server`: Node.js WebSocket server. Wraps Codex app-server, serves the built web app, and opens the browser on start.
44
- `/apps/web`: React + Vite UI. Session control, conversation, and provider event rendering. Connects to the server via WebSocket.
55
- `/apps/desktop`: Electron shell. Spawns a desktop-scoped `t3` backend process and loads the shared web app.
6-
- `/packages/contracts`: Shared Zod schemas and TypeScript contracts for provider events, WebSocket protocol, and model/session types.
6+
- `/packages/contracts`: Shared effect/Schema schemas and TypeScript contracts for provider events, WebSocket protocol, and model/session types.
7+
- `/packages/shared`: Shared runtime utilities consumed by both server and web. Uses explicit subpath exports (e.g. `@t3tools/shared/git`, `@t3tools/shared/DrainableWorker`) — no barrel index.

.mise.toml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
[tools]
2+
node = "24.13.1"
3+
bun = "1.3.9"
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Plan: Provider-Neutral Runtime Determinism and Flake Elimination
2+
3+
## Summary
4+
Replace timing-sensitive websocket and orchestration behavior with explicit typed runtime boundaries, ordered push delivery, and server-owned completion receipts. The cutover is broad and single-shot: no compatibility shim, no mixed old/new transport. The design must reduce flakes without baking Codex-specific lifecycle semantics into generic runtime code.
5+
6+
## Implementation Status
7+
8+
All 7 sections are implemented. CI passes (format, lint, typecheck, test, browser test, build). One deferred item remains: the shared `WsTestClient` helper from section 7 — tests use direct transport subscription and receipt-based waits instead.
9+
10+
### New files
11+
12+
| File | Purpose |
13+
|------|---------|
14+
| `packages/shared/src/DrainableWorker.ts` | Queue-based Effect worker with deterministic `drain` signal |
15+
| `packages/shared/src/schemaJson.ts` | Two-phase JSON→Schema decode helpers (`decodeJsonResult`, `formatSchemaError`) |
16+
| `apps/server/src/wsServer/pushBus.ts` | `ServerPushBus` — ordered typed push pipeline with auto-incrementing sequence |
17+
| `apps/server/src/wsServer/readiness.ts` | `ServerReadiness` — Deferred-based barriers for startup sequencing |
18+
| `apps/server/src/orchestration/Services/RuntimeReceiptBus.ts` | Receipt schema union: checkpoint captured, diff finalized, turn quiesced |
19+
| `apps/server/src/orchestration/Layers/RuntimeReceiptBus.ts` | PubSub-backed receipt bus implementation |
20+
| `apps/server/src/watchFileWithStatPolling.ts` | Stat-polling file watcher for containers where `fs.watch` is unreliable |
21+
| `apps/server/vitest.config.ts` | Server-specific test config (timeout bumps) |
22+
| `apps/server/src/wsServer/pushBus.test.ts` | Push bus serialization and welcome-gating tests |
23+
| `packages/shared/src/DrainableWorker.test.ts` | Drainable worker enqueue/drain lifecycle tests |
24+
25+
### Key modifications
26+
27+
| File | Change |
28+
|------|--------|
29+
| `packages/contracts/src/ws.ts` | Channel-indexed `WsPushPayloadByChannel` map, `WsPush` union schema, `WsPushSequence` |
30+
| `apps/server/src/wsServer.ts` | Integrated `ServerPushBus` and `ServerReadiness`; welcome gated on readiness |
31+
| `apps/server/src/keybindings.ts` | Explicit runtime with `start`/`ready`/`snapshot`; dual `fs.watch` + stat-polling watcher |
32+
| `apps/web/src/wsTransport.ts` | Connection state machine (`connecting``open``reconnecting``closed``disposed`); two-phase decode at boundary; cached latest push by channel |
33+
| `apps/web/src/wsNativeApi.ts` | Removed decode logic; delegates to pre-validated transport messages |
34+
| `apps/server/src/orchestration/Layers/CheckpointReactor.ts` | Uses `DrainableWorker`; publishes completion receipts |
35+
| `apps/server/src/orchestration/Layers/ProviderCommandReactor.ts` | Uses `DrainableWorker` for command processing |
36+
| `apps/server/src/orchestration/Layers/ProviderRuntimeIngestion.ts` | Uses `DrainableWorker` for event ingestion |
37+
| `apps/server/integration/OrchestrationEngineHarness.integration.ts` | Receipt-based waits replace polling loops |
38+
39+
## Key Changes
40+
### 1. Strengthen the generic boundaries, not the Codex boundary — DONE
41+
- `ProviderRuntimeEvent` remains the canonical provider event contract; `ProviderService` remains the only cross-provider facade.
42+
- Raw Codex payloads and event ordering stay isolated in `CodexAdapter.ts` and `codexAppServerManager.ts`.
43+
- `ProviderKind` was not expanded. The runtime stays provider-neutral by contract.
44+
45+
### 2. Replace loose websocket envelopes with channel-indexed typed pushes — DONE
46+
- `packages/contracts/src/ws.ts` now derives push messages from a `WsPushPayloadByChannel` channel-to-schema map. `WsPush` is a union schema replacing `channel: string` + `data: unknown`.
47+
- Every server push carries `sequence: number`, auto-incremented in `ServerPushBus`.
48+
- `packages/shared/src/schemaJson.ts` provides structured decode diagnostics via `formatSchemaError`.
49+
- `packages/contracts/src/ws.test.ts` covers typed push envelope validation and channel/payload mismatch rejection.
50+
51+
### 3. Introduce explicit server readiness and a single push pipeline — DONE
52+
- `apps/server/src/wsServer/pushBus.ts`: `ServerPushBus` with `publishAll` (broadcast) and `publishClient` (targeted) methods, backed by one ordered path. All pushes flow through it.
53+
- `apps/server/src/wsServer/readiness.ts`: `ServerReadiness` with Deferred-based barriers for HTTP listening, push bus, keybindings, terminal subscriptions, and orchestration subscriptions.
54+
- `server.welcome` is emitted only after connection-scoped and server-scoped readiness is complete.
55+
- `wsServer.ts` no longer publishes directly from ad hoc background streams.
56+
57+
### 4. Turn background watchers into explicit runtimes — DONE
58+
- `apps/server/src/keybindings.ts` refactored as explicit `KeybindingsShape` service with `start`, `ready`, `snapshot` semantics.
59+
- Initial config load, cache warmup, and dual watcher attachment (`fs.watch` + `watchFileWithStatPolling`) complete before `ready` resolves.
60+
- `watchFileWithStatPolling.ts` is the thin adapter for environments where `fs.watch` is unreliable.
61+
62+
### 5. Replace polling-based orchestration waiting with receipts — DONE
63+
- `RuntimeReceiptBus` service defines three receipt types: `CheckpointBaselineCapturedReceipt`, `CheckpointDiffFinalizedReceipt` (with `status: "ready"|"missing"|"error"`), and `TurnProcessingQuiescedReceipt`.
64+
- `CheckpointReactor`, `ProviderCommandReactor`, and `ProviderRuntimeIngestion` use `DrainableWorker` and publish receipts on completion.
65+
- Integration harness and checkpoint tests await receipts instead of polling snapshots and git refs.
66+
67+
### 6. Centralize client transport state and decoding — DONE
68+
- `apps/web/src/wsTransport.ts` implements an explicit connection state machine: `connecting`, `open`, `reconnecting`, `closed`, `disposed`.
69+
- Two-phase decode (JSON parse → Schema validate) happens at the transport boundary. `wsNativeApi.ts` receives pre-validated messages.
70+
- Cached latest welcome/config modeled as explicit `latestPushByChannel` state.
71+
72+
### 7. Replace ad hoc test helpers with semantic test clients — MOSTLY DONE
73+
- `DrainableWorker` replaces timing-sensitive `Effect.sleep` with deterministic `drain()` across reactor tests.
74+
- Orchestration harness waits on receipts/barriers instead of `waitForThread`, `waitForGitRef`, and retry loops.
75+
- Behavioral assertions moved to deterministic unit-style harnesses; narrow integration tests kept for real filesystem/socket behavior.
76+
- **Deferred:** Shared `WsTestClient` helper (connect, awaitSemanticWelcome, awaitTypedPush, trackSequence, matchRpcResponseById). Tests use direct transport subscription instead.
77+
78+
## Provider-Coupling Guardrails
79+
- No generic runtime API may depend on Codex-native event names, thread IDs, or request payload shapes.
80+
- No readiness barrier may be defined as "Codex has emitted X." Readiness is owned by the server runtime, not by provider event order.
81+
- No websocket channel payload may contain raw provider-native payloads unless the channel is explicitly debug/internal.
82+
- Any provider-specific divergence must be exposed through provider capabilities from `ProviderService.getCapabilities()`, not `if provider === "codex"` branches in shared runtime code.
83+
- Generic tests must use canonical `ProviderRuntimeEvent` fixtures. Codex-specific ordering and translation tests stay in adapter/app-server suites only.
84+
- Keep UI/provider-specific knobs such as Codex-only options scoped to provider UX code. Do not pull them into generic transport or orchestration state.
85+
86+
## Test Plan
87+
- Contracts:
88+
- schema tests for typed push envelopes and structured decode diagnostics
89+
- ordering tests for `sequence`
90+
- Server:
91+
- readiness tests proving `server.welcome` cannot precede runtime readiness
92+
- push bus tests proving terminal/config/orchestration pushes are serialized and typed
93+
- keybindings runtime tests with fake watch source plus one real watcher integration test
94+
- Orchestration:
95+
- receipt tests proving checkpoint refs and projections are complete before completion signals resolve
96+
- replacement of polling-based checkpoint/integration waits with receipt-based waits
97+
- Web:
98+
- transport tests for invalid JSON, invalid envelope, invalid payload, reconnect queue flushing, cached semantic state
99+
- Validation gate:
100+
- `bun run lint`
101+
- `bun run typecheck`
102+
- `mise exec -- bun run test`
103+
- repeated full-suite run after cutover to confirm flake removal
104+
105+
## Assumptions and Defaults
106+
- This remains a single-provider product during the cutover, but the runtime contracts must stay provider-neutral.
107+
- No backward-compatibility layer is required for old websocket push envelopes.
108+
- The goal is deterministic runtime behavior first; reducing retries and sleeps in tests is a consequence, not the primary mechanism.
109+
- If a completion signal cannot be expressed provider-neutrally, it does not belong in the shared runtime layer and must stay adapter-local.

0 commit comments

Comments
 (0)