Skip to content

Commit a71b57d

Browse files
committed
Stabilize websocket/runtime orchestration flows
Add typed websocket push envelopes with ordered sequence numbers, structured decode diagnostics, and a client transport state machine. On the server, route pushes through a shared push bus, gate welcome delivery on explicit readiness barriers, and make keybindings startup an explicit runtime with a polling fallback when file watching is unavailable. Add orchestration runtime receipts for checkpoint baseline capture, checkpoint diff finalization, and turn quiescence, then update the integration harness/tests to wait on those receipts instead of timing-sensitive polling and provider-specific turn ids. This removes the websocket/config update and checkpoint integration flakes while keeping the runtime contracts provider-neutral.
1 parent 9ca6145 commit a71b57d

37 files changed

Lines changed: 1808 additions & 652 deletions

.docs/architecture.md

Lines changed: 125 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,17 +5,139 @@ T3 Code runs as a **Node.js WebSocket server** that wraps `codex app-server` (JS
55
```
66
┌─────────────────────────────────┐
77
│ Browser (React + Vite) │
8-
│ Connected via WebSocket │
8+
│ wsTransport (state machine) │
9+
│ Typed push decode at boundary │
910
└──────────┬──────────────────────┘
1011
│ ws://localhost:3773
1112
┌──────────▼──────────────────────┐
1213
│ apps/server (Node.js) │
1314
│ WebSocket + HTTP static server │
14-
│ ProviderManager │
15-
│ CodexAppServerManager │
15+
│ ServerPushBus (ordered pushes) │
16+
│ ServerReadiness (startup gate) │
17+
│ OrchestrationEngine │
18+
│ ProviderService │
19+
│ CheckpointReactor │
20+
│ RuntimeReceiptBus │
1621
└──────────┬──────────────────────┘
1722
│ JSON-RPC over stdio
1823
┌──────────▼──────────────────────┐
1924
│ codex app-server │
2025
└─────────────────────────────────┘
2126
```
27+
28+
## Components
29+
30+
- **Browser app**: The React app renders session state, owns the client-side WebSocket transport, and treats typed push events as the boundary between server runtime details and UI state.
31+
32+
- **Server**: `apps/server` is the main coordinator. It serves the web app, accepts WebSocket requests, waits for startup readiness before welcoming clients, and sends all outbound pushes through a single ordered push path.
33+
34+
- **Provider runtime**: `codex app-server` does the actual provider/session work. The server talks to it over JSON-RPC on stdio and translates those runtime events into the app's orchestration model.
35+
36+
- **Background workers**: Long-running async flows such as runtime ingestion, command reaction, and checkpoint processing run as queue-backed workers. This keeps work ordered, reduces timing races, and gives tests a deterministic way to wait for the system to go idle.
37+
38+
- **Runtime signals**: The server emits lightweight typed receipts when important async milestones finish, such as checkpoint capture, diff finalization, or a turn becoming fully quiescent. Tests and orchestration code wait on these signals instead of polling internal state.
39+
40+
## Event Lifecycle
41+
42+
### Startup and client connect
43+
44+
```mermaid
45+
sequenceDiagram
46+
participant Browser
47+
participant Transport as WsTransport
48+
participant Server as wsServer
49+
participant Layers as serverLayers
50+
participant Ready as ServerReadiness
51+
participant Push as ServerPushBus
52+
53+
Browser->>Transport: Load app and open WebSocket
54+
Transport->>Server: Connect
55+
Server->>Layers: Start runtime services
56+
Server->>Ready: Wait for startup barriers
57+
Ready-->>Server: Ready
58+
Server->>Push: Publish server.welcome
59+
Push-->>Transport: Ordered welcome push
60+
Transport-->>Browser: Hydrate initial state
61+
```
62+
63+
1. The browser boots [`WsTransport`][1] and registers typed listeners in [`wsNativeApi`][2].
64+
2. The server accepts the connection in [`wsServer`][3] and brings up the runtime graph defined in [`serverLayers`][7].
65+
3. [`ServerReadiness`][4] waits until the key startup barriers are complete.
66+
4. Once the server is ready, [`wsServer`][3] sends `server.welcome` from the contracts in [`ws.ts`][6] through [`ServerPushBus`][5].
67+
5. The browser receives that ordered push through [`WsTransport`][1], and [`wsNativeApi`][2] uses it to seed local client state.
68+
69+
### User turn flow
70+
71+
```mermaid
72+
sequenceDiagram
73+
participant Browser
74+
participant Transport as WsTransport
75+
participant Server as wsServer
76+
participant Provider as ProviderService
77+
participant Codex as codex app-server
78+
participant Ingest as ProviderRuntimeIngestion
79+
participant Engine as OrchestrationEngine
80+
participant Push as ServerPushBus
81+
82+
Browser->>Transport: Send user action
83+
Transport->>Server: Typed WebSocket request
84+
Server->>Provider: Route request
85+
Provider->>Codex: JSON-RPC over stdio
86+
Codex-->>Ingest: Provider runtime events
87+
Ingest->>Engine: Normalize into orchestration events
88+
Engine-->>Server: Domain events
89+
Server->>Push: Publish orchestration.domainEvent
90+
Push-->>Browser: Typed push
91+
```
92+
93+
1. A user action in the browser becomes a typed request through [`WsTransport`][1] and the browser API layer in [`nativeApi`][12].
94+
2. [`wsServer`][3] decodes that request using the shared WebSocket contracts in [`ws.ts`][6] and routes it to the right service.
95+
3. [`ProviderService`][8] starts or resumes a session and talks to `codex app-server` over JSON-RPC on stdio.
96+
4. Provider-native events are pulled back into the server by [`ProviderRuntimeIngestion`][9], which converts them into orchestration events.
97+
5. [`OrchestrationEngine`][10] persists those events, updates the read model, and exposes them as domain events.
98+
6. [`wsServer`][3] pushes those updates to the browser through [`ServerPushBus`][5] on channels defined in [`orchestration.ts`][11].
99+
100+
### Async completion flow
101+
102+
```mermaid
103+
sequenceDiagram
104+
participant Server as wsServer
105+
participant Worker as Queue-backed workers
106+
participant Cmd as ProviderCommandReactor
107+
participant Checkpoint as CheckpointReactor
108+
participant Receipt as RuntimeReceiptBus
109+
participant Push as ServerPushBus
110+
participant Browser
111+
112+
Server->>Worker: Enqueue follow-up work
113+
Worker->>Cmd: Process provider commands
114+
Worker->>Checkpoint: Process checkpoint tasks
115+
Checkpoint->>Receipt: Publish completion receipt
116+
Cmd-->>Server: Produce orchestration changes
117+
Checkpoint-->>Server: Produce orchestration changes
118+
Server->>Push: Publish resulting state updates
119+
Push-->>Browser: User-visible push
120+
```
121+
122+
1. Some work continues after the initial request returns, especially in [`ProviderRuntimeIngestion`][9], [`ProviderCommandReactor`][13], and [`CheckpointReactor`][14].
123+
2. These flows run as queue-backed workers using [`DrainableWorker`][16], which helps keep side effects ordered and test synchronization deterministic.
124+
3. When a milestone completes, the server emits a typed receipt on [`RuntimeReceiptBus`][15], such as checkpoint completion or turn quiescence.
125+
4. Tests and orchestration code wait on those receipts instead of polling git state, projections, or timers.
126+
5. Any user-visible state changes produced by that async work still go back through [`wsServer`][3] and [`ServerPushBus`][5].
127+
128+
[1]: ../apps/web/src/wsTransport.ts
129+
[2]: ../apps/web/src/wsNativeApi.ts
130+
[3]: ../apps/server/src/wsServer.ts
131+
[4]: ../apps/server/src/wsServer/readiness.ts
132+
[5]: ../apps/server/src/wsServer/pushBus.ts
133+
[6]: ../packages/contracts/src/ws.ts
134+
[7]: ../apps/server/src/serverLayers.ts
135+
[8]: ../apps/server/src/provider/Layers/ProviderService.ts
136+
[9]: ../apps/server/src/orchestration/Layers/ProviderRuntimeIngestion.ts
137+
[10]: ../apps/server/src/orchestration/Layers/OrchestrationEngine.ts
138+
[11]: ../packages/contracts/src/orchestration.ts
139+
[12]: ../apps/web/src/nativeApi.ts
140+
[13]: ../apps/server/src/orchestration/Layers/ProviderCommandReactor.ts
141+
[14]: ../apps/server/src/orchestration/Layers/CheckpointReactor.ts
142+
[15]: ../apps/server/src/orchestration/Layers/RuntimeReceiptBus.ts
143+
[16]: ../packages/shared/src/DrainableWorker.ts

.docs/provider-architecture.md

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,9 @@
33
The web app communicates with the server via WebSocket using a simple JSON-RPC-style protocol:
44

55
- **Request/Response**: `{ id, method, params }``{ id, result }` or `{ id, error }`
6-
- **Push events**: `{ type: "push", channel, data }` for orchestration read-model updates
6+
- **Push events**: typed envelopes with `channel`, `sequence` (monotonic per connection), and channel-specific `data`
7+
8+
Push channels: `server.welcome`, `server.configUpdated`, `terminal.event`, `orchestration.domainEvent`. Payloads are schema-validated at the transport boundary (`wsTransport.ts`). Decode failures produce structured `WsDecodeDiagnostic` with `code`, `reason`, and path info.
79

810
Methods mirror the `NativeApi` interface defined in `@t3tools/contracts`:
911

@@ -12,3 +14,17 @@ Methods mirror the `NativeApi` interface defined in `@t3tools/contracts`:
1214
- `shell.openInEditor`, `server.getConfig`
1315

1416
Codex is the only implemented provider. `claudeCode` is reserved in contracts/UI.
17+
18+
## Client transport
19+
20+
`wsTransport.ts` manages connection state: `connecting``open``reconnecting``closed``disposed`. Outbound requests are queued while disconnected and flushed on reconnect. Inbound pushes are decoded and validated at the boundary, then cached per channel. Subscribers can opt into `replayLatest` to receive the last push on subscribe.
21+
22+
## Server-side orchestration layers
23+
24+
Provider runtime events flow through queue-based workers:
25+
26+
1. **ProviderRuntimeIngestion** — consumes provider runtime streams, emits orchestration commands
27+
2. **ProviderCommandReactor** — reacts to orchestration intent events, dispatches provider calls
28+
3. **CheckpointReactor** — captures git checkpoints on turn start/complete, publishes runtime receipts
29+
30+
All three use `DrainableWorker` internally and expose `drain()` for deterministic test synchronization.

.docs/workspace-layout.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,4 +3,5 @@
33
- `/apps/server`: Node.js WebSocket server. Wraps Codex app-server, serves the built web app, and opens the browser on start.
44
- `/apps/web`: React + Vite UI. Session control, conversation, and provider event rendering. Connects to the server via WebSocket.
55
- `/apps/desktop`: Electron shell. Spawns a desktop-scoped `t3` backend process and loads the shared web app.
6-
- `/packages/contracts`: Shared Zod schemas and TypeScript contracts for provider events, WebSocket protocol, and model/session types.
6+
- `/packages/contracts`: Shared effect/Schema schemas and TypeScript contracts for provider events, WebSocket protocol, and model/session types.
7+
- `/packages/shared`: Shared runtime utilities consumed by both server and web. Uses explicit subpath exports (e.g. `@t3tools/shared/git`, `@t3tools/shared/DrainableWorker`) — no barrel index.
Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Plan: Provider-Neutral Runtime Determinism and Flake Elimination
2+
3+
## Summary
4+
Replace timing-sensitive websocket and orchestration behavior with explicit typed runtime boundaries, ordered push delivery, and server-owned completion receipts. The cutover is broad and single-shot: no compatibility shim, no mixed old/new transport. The design must reduce flakes without baking Codex-specific lifecycle semantics into generic runtime code.
5+
6+
## Key Changes
7+
### 1. Strengthen the generic boundaries, not the Codex boundary
8+
- Keep `ProviderRuntimeEvent` as the canonical provider event contract and `ProviderService` as the only cross-provider facade.
9+
- Do not expose raw Codex payloads or Codex event ordering outside `CodexAdapter.ts` and `codexAppServerManager.ts`.
10+
- Do not expand `ProviderKind` in this change. The runtime stays provider-neutral by contract, while the product remains Codex-only in concrete support.
11+
12+
### 2. Replace loose websocket envelopes with channel-indexed typed pushes
13+
- Refactor `packages/contracts/src/ws.ts` so push messages are derived from a channel-to-schema map instead of `channel: string` plus `data: unknown`.
14+
- Add `sequence: number` to every server push. Ordering becomes explicit and testable.
15+
- Add structured decode diagnostics with stable machine fields: `code`, `reason`, `expected`, `actual`, `path`, optional `jsonOffset`.
16+
- Remove runtime/test dependence on engine-specific pretty strings. Human-readable formatting remains a logging helper only.
17+
18+
### 3. Introduce explicit server readiness and a single push pipeline
19+
- Add a `ServerPushBus` service in `apps/server` backed by one ordered queue/pubsub path. All pushes go through it: `server.welcome`, `server.configUpdated`, terminal events, orchestration domain events.
20+
- Add a `ServerReadiness` service with explicit barriers for:
21+
- HTTP listening
22+
- push bus ready
23+
- keybindings runtime ready
24+
- terminal subscriptions ready
25+
- orchestration subscriptions ready
26+
- Strengthen `server.welcome` semantics: it is emitted only after connection-scoped and server-scoped readiness is complete.
27+
- `wsServer` should never publish directly from ad hoc background streams once the bus exists.
28+
29+
### 4. Turn background watchers into explicit runtimes
30+
- Extract keybindings watching into a `KeybindingsRuntime` service with `start`, `ready`, `snapshot`, and `changes`.
31+
- Initial config load, startup sync, cache warmup, and watcher attachment complete before `ready` resolves.
32+
- Keep the real `fs.watch` adapter thin. Most behavior tests use a fake watch source and deterministic change stream.
33+
34+
### 5. Replace polling-based orchestration waiting with receipts
35+
- Add server-owned completion receipts for operations that tests currently infer by polling:
36+
- checkpoint capture complete
37+
- checkpoint diff finalized
38+
- turn processing quiesced
39+
- A receipt means append, projection, and required side effects are complete. It must not mean the provider emitted a certain event sequence.
40+
- Update the orchestration harness and checkpoint tests to await receipts/barriers instead of polling snapshots and git refs.
41+
42+
### 6. Centralize client transport state and decoding
43+
- Refactor `apps/web/src/wsTransport.ts` into an explicit connection state machine: `connecting`, `open`, `reconnecting`, `closed`, `disposed`.
44+
- Decode and validate typed push payloads at the transport boundary, not downstream in `apps/web/src/wsNativeApi.ts`.
45+
- Keep cached latest welcome/config behavior only if it is modeled as explicit state, not as a late-subscriber race workaround.
46+
47+
### 7. Replace ad hoc test helpers with semantic test clients
48+
- Add a shared `WsTestClient` for server/websocket tests:
49+
- connect
50+
- await semantic welcome
51+
- await typed push by channel and optional predicate
52+
- track `sequence`
53+
- match RPC responses by id
54+
- Add one orchestration test harness that waits on receipts/barriers instead of custom `waitForThread`, `waitForGitRef`, and arbitrary retry loops.
55+
- Keep only a narrow set of integration tests for real filesystem/watch/socket behavior. Move behavioral assertions to deterministic unit-style harnesses.
56+
57+
## Provider-Coupling Guardrails
58+
- No generic runtime API may depend on Codex-native event names, thread IDs, or request payload shapes.
59+
- No readiness barrier may be defined as “Codex has emitted X.” Readiness is owned by the server runtime, not by provider event order.
60+
- No websocket channel payload may contain raw provider-native payloads unless the channel is explicitly debug/internal.
61+
- Any provider-specific divergence must be exposed through provider capabilities from `ProviderService.getCapabilities()`, not `if provider === "codex"` branches in shared runtime code.
62+
- Generic tests must use canonical `ProviderRuntimeEvent` fixtures. Codex-specific ordering and translation tests stay in adapter/app-server suites only.
63+
- Keep UI/provider-specific knobs such as Codex-only options scoped to provider UX code. Do not pull them into generic transport or orchestration state.
64+
65+
## Test Plan
66+
- Contracts:
67+
- schema tests for typed push envelopes and structured decode diagnostics
68+
- ordering tests for `sequence`
69+
- Server:
70+
- readiness tests proving `server.welcome` cannot precede runtime readiness
71+
- push bus tests proving terminal/config/orchestration pushes are serialized and typed
72+
- keybindings runtime tests with fake watch source plus one real watcher integration test
73+
- Orchestration:
74+
- receipt tests proving checkpoint refs and projections are complete before completion signals resolve
75+
- replacement of polling-based checkpoint/integration waits with receipt-based waits
76+
- Web:
77+
- transport tests for invalid JSON, invalid envelope, invalid payload, reconnect queue flushing, cached semantic state
78+
- Validation gate:
79+
- `bun run lint`
80+
- `bun run typecheck`
81+
- `mise exec -- bun run test`
82+
- repeated full-suite run after cutover to confirm flake removal
83+
84+
## Assumptions and Defaults
85+
- This remains a single-provider product during the cutover, but the runtime contracts must stay provider-neutral.
86+
- No backward-compatibility layer is required for old websocket push envelopes.
87+
- The goal is deterministic runtime behavior first; reducing retries and sleeps in tests is a consequence, not the primary mechanism.
88+
- If a completion signal cannot be expressed provider-neutrally, it does not belong in the shared runtime layer and must stay adapter-local.

0 commit comments

Comments
 (0)