Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
420 changes: 420 additions & 0 deletions reason/260525-0055-ccxray-auth-design/candidate-A.md

Large diffs are not rendered by default.

658 changes: 658 additions & 0 deletions reason/260525-0055-ccxray-auth-design/candidate-AB.md

Large diffs are not rendered by default.

703 changes: 703 additions & 0 deletions reason/260525-0055-ccxray-auth-design/candidate-B.md

Large diffs are not rendered by default.

23 changes: 23 additions & 0 deletions reason/260525-0055-ccxray-auth-design/critique-A.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Critique of Candidate A

WEAKNESS-1 [FATAL]: "WebSocket upgrade: Accepts ONLY `Authorization: Bearer`, never cookie, never `?token=`. Codex sets this when launched via ccxray's launcher." — Browsers cannot set arbitrary headers on WebSocket handshakes via the `WebSocket` constructor. The dashboard's own browser-side code therefore cannot authenticate any WS upgrade under this rule. The candidate hand-waves "Codex sets this" but says nothing about how the dashboard UI itself opens WS connections (e.g., for live streaming features), and the threat model explicitly distinguishes "browser" from "CLI." Either the dashboard has no WS (in which case why specify this gate at all?) or the rule is unimplementable for the dashboard — the design never resolves which.

WEAKNESS-2 [FATAL]: "CLI/curl: Bearer authenticated. Exempt from Origin/Host CSRF check because bearer cannot be sent by a browser victim cross-origin without explicit JS that must already have the bearer (OWASP custom header pattern)." — This is wrong about the bearer header and wrong about the Host check. (a) The Host header IS sent by browsers and a DNS-rebind attacker controls it; exempting bearer-auth requests from the Host check leaves an open hole for any victim who has the bearer cached in a browser extension, password manager autofill, or a service worker — and more critically, exempts the very requests Codex/Claude Code send through localhost from rebind protection. (b) Saying "Host check enforced" for upstream-proxy in §2 directly contradicts "Origin check skipped for upstream-proxy" combined with the table in §4 claiming Host allowlist defends rebind — the candidate states both that Host is checked and that bearer requests are exempt from CSRF checks (which the §3 text bundles together as "for every cookie-auth'd request AND every non-GET regardless of auth method"). Which is it?

WEAKNESS-3 [FATAL]: "User opens `http://localhost:5577/?token=<AUTH_TOKEN>`. Server 302s to `/_auth?token=<AUTH_TOKEN>&next=/`." — The token is now leaked into the server access log, the browser history, the Referer header of any resource the redirect target loads (Referrer-Policy on `/_auth` is too late — the leak happens on the inbound `/` request, and the redirect chain itself is logged), and any process listing observing the URL bar. The candidate claims "Token in URL exactly once at `/_auth`" but in fact the token appears at `/?token=`, then `/_auth?token=`, i.e., twice, and the first hop has no Referrer-Policy control because it's the entry point. The mitigation row #3 is materially false.

WEAKNESS-4 [MAJOR]: "stores `sha256(S)` in in-memory `Set<string>` with expiry now+8h" — Storing in a `Set<string>` has no expiry mechanism; Sets don't evict. The text says "with expiry now+8h" but never describes the sweep. With an 8h window and unbounded entries (every reload of an expired bookmark redeems a new session), the Set grows monotonically until restart. Worse, "Server restart: in-memory sessions wiped" means every hub idle-shutdown (5s per the project notes) invalidates all browser sessions — this directly contradicts the smooth-UX claim and makes the cookie approach near-useless in hub mode where idle shutdown is the norm.

WEAKNESS-5 [MAJOR]: "Hub mode: hub holds canonical session set, clients proxy through hub. New hub on crash recovery generates fresh ipcSecret and fresh session set." — This is incompatible with the project's documented hub model where `~/.ccxray/hub.json` is the discovery file and clients connect directly to the hub port. If the hub holds the session set, then a crash-recovery fork (per the project notes: clients monitor pid every 5s and fork a new hub) means every browser session dies every time any client restarts the hub. Combined with idle shutdown after 5s, the cookie has an effective lifetime measured in seconds, not 8 hours. The "Max-Age=28800" is theater.

WEAKNESS-6 [MAJOR]: "SameSite=Strict + Origin check on non-GET (defense in depth)" + "Origin check skipped for upstream-proxy" — The upstream-proxy path is exactly where state-changing requests to Anthropic go (POST /v1/messages, etc.). If the classifier routes a request to `upstream-proxy` based on path, an attacker who controls a victim's browser and knows the bearer (or can ride a cookie session, since the unified middleware "accepts either the bearer header or the cookie") can POST to `/v1/messages` cross-origin with no Origin check. The candidate's own unified middleware undermines the CSRF claim: cookie auth + upstream-proxy classification + Origin skip = CSRF-able expensive API calls billed to the user.

WEAKNESS-7 [MAJOR]: "The CLI launcher injects `ANTHROPIC_AUTH_TOKEN=<AUTH_TOKEN>` as a request header via the launcher's env." — `ANTHROPIC_AUTH_TOKEN` is the Anthropic API credential variable; conflating it with ccxray's `AUTH_TOKEN` means either (a) the proxy's shared secret is now identical to the user's Anthropic key, which is a credential-coupling disaster, or (b) the launcher is overwriting the user's real Anthropic token with ccxray's shared secret, breaking upstream auth. The candidate does not say which. Codex's `-c request_headers` claim is similarly hand-waved — no actual header name is specified for the bearer injection into Codex's HTTP layer, and Codex's WS transport (per the project notes, the main session uses WS on `/v1/responses`) cannot accept arbitrary headers from `-c` config in the way implied.

WEAKNESS-8 [MAJOR]: "treats absence of `AUTH_TOKEN` as '127.0.0.1-only, anonymous OK'" combined with "`AUTH_TOKEN` unset + bound to 0.0.0.0 now requires explicit `CCXRAY_ALLOW_ANONYMOUS_REMOTE=1` opt-in." — Any local process on a multi-user machine (shared dev box, CI runner, devcontainer with other tenants) can hit 127.0.0.1:5577 and exfiltrate every recorded request/response, including the user's Anthropic key in captured headers. "127.0.0.1-only" is not a security boundary on shared hosts, and the candidate elsewhere acknowledges XSS-in-conversation as a vector — meaning conversation content can include hostile data — yet the default posture exposes everything to any local UID with zero auth.

WEAKNESS-9 [MINOR]: "tests: 3 new files (auth-cookie, auth-csrf, auth-rebind), <80 lines each" — 240 lines of tests for a security middleware that handles bearer, cookie, Origin, Host, WS upgrade, hub IPC, anonymous mode, constant-time comparison, session expiry, and rotation is wildly under-scoped. The candidate offers a line budget instead of a coverage matrix; this is a tell that the design hasn't been mentally executed against its own threat table.

WEAKNESS-10 [MINOR]: "mismatched length runs fake hash before short-circuit" — Performing a "fake hash" before short-circuit does not equalize timing meaningfully unless the fake work is calibrated to the real work's distribution. `timingSafeEqual` requires equal-length buffers; the actual safe pattern is to hash both sides to a fixed-width digest and compare those. The candidate's phrasing reveals a misunderstanding of where the timing channel lives.

VERDICT: The design's CSRF/rebind defense collapses because the unified middleware exempts bearer/upstream-proxy from Origin checks while accepting cookies on the same paths, and the cookie session lifetime is contradicted by the project's hub idle-shutdown and crash-recovery behavior — making both the threat-model claims and the UX claims false.
141 changes: 141 additions & 0 deletions reason/260525-0055-ccxray-auth-design/errata.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Errata — review findings against candidate-AB.md

**Reviewer:** Codex CLI v0.133.0-alpha.1 (gpt-5.5), independent pass on this PR.
**Date:** 2026-05-25.

This file records concrete corrections to `candidate-AB.md` that surfaced during external review and empirical verification. The original `candidate-AB.md` is preserved as the historical record of the reason loop's winning synthesis; the deviations below are what the implementation will actually ship.

---

## 1. Blocking corrections

### 1.1 HttpOnly cookie + JS `document.cookie` check are mutually exclusive

`candidate-AB.md` §2.3 Flow B (around L132) and §3.2 (around L261) describe:

- `Set-Cookie: ccxray_s=...; HttpOnly; SameSite=Strict; Path=/; Max-Age=28800`
- Inline bootstrap script: `else if (!document.cookie.includes('ccxray_s=')) { ... }`

These are incompatible. `HttpOnly` cookies are intentionally invisible to JavaScript; `document.cookie` will never contain `ccxray_s=`, so the "no session" branch fires regardless of whether a valid cookie exists.

**Implementation deviation.** Replace the `document.cookie.includes(...)` probe with a server-side auth-status endpoint:

```js
const status = await fetch('/_auth/status', { credentials: 'same-origin' });
if (status.status === 401) document.body.textContent = 'No session. Run `ccxray open`.';
```

The cookie remains `HttpOnly` (XSS-in-conversation defense preserved). Cost: one extra GET on cold load, no server-side state change. The new endpoint costs ~10 LOC and lives in `server/routes/auth.js`.

Affected commits: **1.3** (bootstrap flow), **3.1** (final cleanup).

### 1.2 `net.Socket._handle.getpeereid` is not a public Node API

`candidate-AB.md` §3.7 (around L394) claims Node ≥ 18 on Linux/macOS exposes `socket._handle.getpeereid`. Verified locally on Node v22.22.2/darwin — the method does not exist on `pipe_wrap.Pipe.prototype` nor on the `net` public API. There is no public Node interface to read `SO_PEERCRED`/`getpeereid(2)` without a native addon.

**Implementation deviation.** Defend peer identity at the **filesystem layer**, not the Node API:

- `~/.ccxray/` mode `0700` (already in plan).
- `~/.ccxray/hub.sock` mode `0600`.
- Other UIDs receive `EACCES` from `connect(2)` at the kernel; the connection never reaches Node code.

The peer-UID claim downgrades from "primary gate" to "belt-and-suspenders we cannot ship without a native addon (out of scope per zero-new-deps constraint)." The threat model is preserved because filesystem permissions are the actual access control on local Unix sockets — peer-credential checks would only catch a same-UID attacker, which is outside this design's scope.

Affected commit: **2.3** (Unix socket hub IPC).

### 1.3 Codex CLI key name is `http_headers`, not `request_headers`

`candidate-AB.md` §3.5 (around L366) writes:

> `-c request_headers='X-Ccxray-Auth=<K_upstream>'` via the codex per-request-header config. Verified to propagate to the WebSocket upgrade HTTP request.

Empirical verification against Codex v0.133.0-alpha.1:

```
$ codex exec --strict-config -c 'request_headers={X-Ccxray-Auth="test"}' "x"
Error loading config.toml: unknown configuration field `request_headers` in -c/--config override
```

Codex does support header injection, but through `model_providers.<name>.http_headers` (plus `env_http_headers` for env-derived headers) and a top-level `model_provider = "<name>"` selecting which provider applies. Verified by spy-server test — `X-Ccxray-Auth: test-value-123` did appear on the outbound HTTP request.

**Implementation deviation.** ccxray's Codex launcher (Phase 1.4) should construct overrides like:

```js
codex \
-c 'model_providers.ccxray={name="ccxray", base_url="http://localhost:5577/v1", wire_api="responses", http_headers={"X-Ccxray-Auth"="<K_upstream>"}}' \
-c 'model_provider="ccxray"' \
...args
```

**Spike resolved (2026-05-25).** Codex v0.133.0-alpha.1 hard-rejects any attempt to override builtin providers:

```
$ codex exec -c 'model_providers.openai.http_headers={"X-Test"="v"}' ...
Error: model_providers contains reserved built-in provider IDs: `openai`.
Built-in providers cannot be overridden. Rename your custom provider
(for example, `openai-custom`).
```

Probed all plausible shortcut keys (`openai_http_headers`, `chatgpt_http_headers`, `default_http_headers`, etc.) — none exist. Only `chatgpt_base_url` and `openai_base_url` are top-level builtin shortcuts; there is no header equivalent.

**Decision: spawn-time auth-mode detection in `server/providers.js`.**

ccxray's Codex launcher detects which Codex auth path the user is on and chooses:

- **API-key Codex** (env `OPENAI_API_KEY` set): inject `-c 'model_providers.ccxray={…, http_headers={"X-Ccxray-Auth"="…"}}'` + `-c 'model_provider="ccxray"'`. Header enforcement active end-to-end.
- **ChatGPT-OAuth Codex** (no `OPENAI_API_KEY`): skip the `model_provider` override entirely (would break OAuth login). Continue with `-c 'openai_base_url=…'` and `-c 'chatgpt_base_url=…'` as today. **The upstream domain's `X-Ccxray-Auth` requirement does not apply to this path** — ccxray's upstream verifier additionally accepts loopback-unauth requests that look like Codex-on-ChatGPT (presence of `chatgpt-account-id` + JWT-shaped `Authorization`).

Why this is acceptable for the threat model:

- The only attacker class who can reach this path is a same-machine process on a different UID. Same-UID attackers are already trusted (they can read `~/.ccxray/local-secret`).
- Such an attacker cannot exfiltrate credentials: they would have to supply their own `Authorization` + `chatgpt-account-id`, which they cannot forge without having already compromised the ccxray user's ChatGPT session (a same-UID attack).
- Residual risk reduces to **cost amplification / log pollution by other-UID local attackers**, not credential theft. The mitigation primitive is to bind ccxray to a Unix socket (out of scope of this migration; deferred as a future hardening item) or run ccxray under a dedicated UID.

Affected commits:
- **1.4** (warn-only launcher injection): detect API-key vs ChatGPT-OAuth mode and inject accordingly. WS gate accepts loopback-unauth for ChatGPT-shaped requests with a warn log.
- **2.1** (enforcement): block bearer + `?token=` on `/v1/*`; **except** retain loopback-unauth allowance for the ChatGPT-OAuth Codex path, gated by a single helper `isLoopbackChatGPTCodex(req)` that checks (a) `req.socket.localAddress` is loopback, (b) `Authorization` is JWT-shaped, (c) `chatgpt-account-id` header present.

---

## 2. Non-blocking corrections

### 2.1 Threat-table residual risks overstated

`candidate-AB.md` §4 (around L429–L434) marks the residual risk for threats 1, 2, and 6 as "None." That phrasing is stronger than the architecture warrants: "no residual risk if Host allowlist, CORS denial, and header injection are implemented as specified" is more accurate. The mitigations are structural and high-quality, but calling them "None" reads as a guarantee, not a defense.

**Implementation deviation.** None at the code level. README + commit messages use the more careful phrasing.

### 2.2 Cookie name inconsistency between `overview.md` and `candidate-AB.md`

- `overview.md` L19: `ccxray_session=<payload>.<HMAC(K_session, payload)>`
- `candidate-AB.md` L231: `ccxray_s = base64url(payload) "." base64url(hmac)`

**Implementation deviation.** Pick **`ccxray_s`** (shorter, lower per-request header overhead; matches the more detailed spec). Update `overview.md` to match — single-line fix.

### 2.3 Phase 1 "no breakage" claim relies on launcher injection working

If Phase 1.4 launcher injection fails (e.g. user's Codex version doesn't take `model_providers.X.http_headers`), Phase 1 verifyUpstream still accepts legacy bearer/no-credential so spawned CLIs remain unbroken. The "no breakage" property holds during Phase 1 specifically because enforcement is deferred to Phase 2. This is correct in the design but worth restating in commit messages.

**Implementation deviation.** None.

---

## 3. Out-of-scope finding worth filing separately

During the empirical Codex test, spawning Codex with `model_provider="ccxray"` and a `base_url` pointing at a localhost spy server caused Codex to attach the user's **ChatGPT OAuth JWT** (full bearer + `chatgpt-account-id`) to the outbound request, despite the provider config not naming `chatgpt`.

This is a credential-leak surface in Codex itself, not in ccxray. It is unrelated to this migration but should be filed upstream and is a reminder that ccxray's own logging must always redact `Authorization` and `chatgpt-account-id` headers (already in plan).

---

## 4. What changes in the implementation plan

| Commit | Original plan | Revised plan |
|---|---|---|
| 1.3 | Inline `document.cookie.includes('ccxray_s=')` probe | `GET /_auth/status` probe (+10 LOC, new endpoint in `server/routes/auth.js`) |
| 1.4 | `-c request_headers='X-Ccxray-Auth=…'` for Codex | API-key Codex: `-c 'model_providers.ccxray={…, http_headers={…}}'` + `-c 'model_provider="ccxray"'`. ChatGPT-OAuth Codex: skip `model_provider` override; upstream verifier accepts loopback-unauth for that path. Spike completed; no further investigation needed before Commit 1.4 |
| 2.3 | peer-UID via `socket._handle.getpeereid` | Filesystem mode `0600` socket + `0700` parent dir as the real gate; no Node-API peer-credential check |
| Docs | Threat table claims "None" residuals | Use "low residual risk if X is implemented as specified" |
| Docs | `ccxray_session` (in `overview.md`) | Standardize on `ccxray_s` |

No commit is added or removed. The total surface remains 8 commits across Phases 1–3.
Loading
Loading