Skip to content

fix(shell): Windows pre-CEF cache-lock wait to stop relaunch panic (TAURI-RUST-F)#3210

Merged
senamakel merged 2 commits into
tinyhumansai:mainfrom
oxoxDev:fix/tauri-rust-f-cef-singleton-wait
Jun 3, 2026
Merged

fix(shell): Windows pre-CEF cache-lock wait to stop relaunch panic (TAURI-RUST-F)#3210
senamakel merged 2 commits into
tinyhumansai:mainfrom
oxoxDev:fix/tauri-rust-f-cef-singleton-wait

Conversation

@oxoxDev
Copy link
Copy Markdown
Contributor

@oxoxDev oxoxDev commented Jun 2, 2026

Summary

  • Add a Windows pre-CEF cache-lock wait so a relaunch never calls cef::initialize() while a dying prior instance still holds the CEF user-data-dir.
  • Fixes a fatal Windows-only startup panic — assertion left == right failed, left: 0, right: 1 — raised when the vendored tauri-runtime-cef asserts cef::initialize() == 1 and gets 0 on a locked cache (Sentry TAURI-RUST-F, ~3.2k events, live on 0.57.5).
  • New cef_singleton_wait module: bounded (5 s, exponential backoff) wait for prior-instance processes to exit, then proceed; clean exit if still held. Windows analog of the existing macOS reap_stale_openhuman_processes guard.

Problem

The existing pre-CEF Win32 named-mutex guard (run() in lib.rs) stops a concurrent second launch. It does not cover the sequential relaunch race:

  1. Instance A's run() returns → the RAII mutex guard drops → the named mutex is released.
  2. A is still tearing down — its CEF browser process keeps the cache lock for a short window.
  3. Instance B launches (auto-update relaunch, fast quit+reopen, a restart flow), acquires the now-free mutex, and calls cef::initialize() while A's cache lock is live → returns 0 → the vendored runtime's assert_eq!(result, 1) panics.

prepare_process_cache_path() creates the per-user cache dir but never checks whether it is actively locked, and Chromium's SingletonLock symlink that cef_preflight reads on macOS/Linux does not exist on Windows — so Windows had no cache-lock guard beyond the lifetime-mismatched mutex.

Solution

  • New app/src-tauri/src/cef_singleton_wait.rs:
    • Pure decide(other_instances, elapsed, budget) + backoff_delay(attempt) — unit-tested on any host.
    • #[cfg(windows)] win::count_other_app_instances() — Toolhelp32 snapshot counting processes with our exe basename, excluding self.
    • #[cfg(windows)] wait_for_cache_release() — poll/backoff up to 5 s; Proceed when count is 0, else KeepWaiting, else GiveUp → forward deep links + exit(0).
  • Wired in lib.rs after the Win32 mutex + prepare_process_cache_path, before the Tauri builder — beside the macOS reap.
  • Why counting same-exe PIDs is correct despite CEF spawning same-exe subprocesses: this runs before cef::initialize(), so our own CEF subprocesses don't exist yet, and it runs after the mutex guard, which guarantees we're the only top-level instance — so any other same-exe PIDs are the dying prior instance's stragglers, dropping to 0 once it exits.
  • This prevents the panic, it does not suppress it: cef::initialize() is simply never called against a live lock. No catch_unwind, no assert masking.
  • Follow-up (not in this PR): the true root fix is making tauri-runtime-cef return an Err instead of asserting on cef::initialize() == 0; tracked as a vendored-CEF change.

Submission Checklist

If a section does not apply to this change, mark the item as N/A with a one-line reason. Do not delete items.

  • Tests added or updated (happy path + at least one failure / edge case) per Testing Strategy
  • Diff coverage ≥ 80% — the decision logic (decide, backoff_delay) is covered by 5 host unit tests (proceed / keep-waiting / give-up boundary / backoff cap / overflow). The #[cfg(windows)] Toolhelp enumeration + wait loop are not instrumentable on the Linux coverage runner (compiled out off-Windows); validated by the Windows CI build + manual repro instead.
  • N/A: behaviour-only platform-guard fix — no added/removed/renamed coverage-matrix feature row.
  • N/A: no coverage-matrix feature IDs affected.
  • No new external network dependencies introduced (mock backend used per Testing Strategy)
  • N/A: no release-cut smoke surface changed — startup guard only; no user-facing flow.
  • N/A: Sentry-only issue, no GitHub issue to close — Sentry-Issue: TAURI-RUST-F (see Related).

Impact

  • Windows desktop only. Zero cost in the common case (no prior instance → proceed immediately); a ≤5 s bounded wait only inside the rare relaunch-race window, then proceed or clean-exit.
  • Removes a fatal startup crash; no behaviour change on macOS/Linux (module is cfg(windows); pure logic also compiled under test).
  • Validation note: the bug is a Windows-only CEF init race that cannot be reproduced on a macOS dev host. Locally verified: cargo fmt, Tauri-shell cargo check/clippy/test (5 new tests pass). The cfg(windows) glue compiles via the Windows CI job; the actual race is being reproduced/smoked on real Windows by @sanil-23.

Related

  • Closes: N/A — Sentry-only
  • Sentry-Issue: TAURI-RUST-F
  • Follow-up: vendored tauri-runtime-cef cef::initialize() assert → Err (true root fix; broader blast radius).

Summary by CodeRabbit

  • Bug Fixes
    • Improved Windows startup stability by implementing a resource-release wait mechanism during application launch. The application now waits for prior instances to fully release critical resources before proceeding with initialization, preventing crashes that could occur during rapid sequential relaunches or repeated quick restarts.

oxoxDev added 2 commits June 2, 2026 17:07
The vendored tauri-runtime-cef asserts cef::initialize()==1 and panics
(left:0 right:1) when the CEF user-data-dir is still locked by another
OpenHuman process. The existing Win32 mutex only covers concurrent second
launches; a sequential relaunch (auto-update, fast quit+reopen, restart)
can still call cef::initialize() while the prior instance's cache lock is
held during teardown.

Add cef_singleton_wait: a bounded (5s, exponential backoff) wait that counts
straggler processes from a dying prior instance (same exe, excluding self —
correct here because our own CEF subprocesses don't exist pre-init and the
mutex guarantees we're the singleton) and proceeds once they exit, or exits
cleanly if still held. The panic is prevented (initialize never runs against
a live lock), not suppressed. Pure decide()/backoff logic is unit-tested on
any host; the Win32 Toolhelp enumeration is windows-only.
…-RUST-F)

Call cef_singleton_wait::wait_for_cache_release() after the Win32 mutex and
prepare_process_cache_path, before the Tauri builder constructs the CEF
runtime — the Windows analog of the macOS reap_stale_openhuman_processes
guard. Closes the relaunch-race window that re-triggers the cef::initialize
assert panic.
@oxoxDev oxoxDev requested review from a team and sanil-23 June 2, 2026 11:38
@oxoxDev
Copy link
Copy Markdown
Contributor Author

oxoxDev commented Jun 2, 2026

@sanil-23 could you repro on real Windows once? This is a Windows-only CEF init race I can't reproduce on macOS. Repro: launch OpenHuman, then trigger a fast relaunch — auto-update relaunch, or quit (Cmd/Alt+F4) and immediately reopen, or a restart flow — while the prior instance is still tearing down. Pre-fix that hits the cef::initialize assert panic (left:0 right:1, TAURI-RUST-F); with this PR the new instance should wait briefly then open (logs: [cef-singleton-wait] ...). Thanks!

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: dd1de759-fc4f-414d-8b5e-56377f3424c1

📥 Commits

Reviewing files that changed from the base of the PR and between cac09cb and b7d8b2f.

📒 Files selected for processing (3)
  • app/src-tauri/Cargo.toml
  • app/src-tauri/src/cef_singleton_wait.rs
  • app/src-tauri/src/lib.rs

📝 Walkthrough

Walkthrough

The PR introduces a Windows-specific pre-CEF initialization guard that prevents panics when sequential relaunches occur while a prior process still holds the CEF cache lock. It adds bounded polling with exponential backoff, process enumeration via ToolHelp APIs, and clean process exit with deep-link forwarding if the wait budget is exhausted.

Changes

Windows CEF singleton cache-lock wait guard

Layer / File(s) Summary
Dependency and wait decision logic
app/src-tauri/Cargo.toml, app/src-tauri/src/cef_singleton_wait.rs (lines 1–94)
Windows ToolHelp feature added; constants, WaitDecision enum, and pure decide and backoff_delay functions introduced to govern polling behavior based on other instance count and elapsed time budget.
Windows CEF singleton wait polling and process enumeration
app/src-tauri/src/cef_singleton_wait.rs (lines 95–227)
Main wait_for_cache_release() polling loop repeatedly counts other same-executable instances via ToolHelp snapshot APIs; uses decide and backoff_delay to either proceed immediately, sleep/retry, or exit cleanly with attempted deep-link forwarding after budget exhaustion.
Tests and startup integration
app/src-tauri/src/cef_singleton_wait.rs (lines 229–283), app/src-tauri/src/lib.rs
Unit tests validate pure decision outcomes and exponential-backoff timing under boundary and extreme conditions; module declared behind cfg(any(target_os = "windows", test)) and wait_for_cache_release() invoked during Windows startup before CEF initialization.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • tinyhumansai/openhuman#1723: Modifies Windows pre-CEF startup via named mutex for single-instance enforcement; this PR adds a cache-lock wait alongside similar startup guards with corresponding windows-sys feature changes.
  • tinyhumansai/openhuman#2669: Hardens existing mutex-based single-instance guard in the same pre-CEF initialization path that this PR extends with a new cache-lock wait mechanism.
  • tinyhumansai/openhuman#2469: Adds Windows deep-link IPC forwarding (deep_link_ipc_windows), which this PR relies on to forward pending openhuman:// URLs when the wait budget expires.

Suggested labels

bug

Suggested reviewers

  • sanil-23
  • M3gA-Mind
  • graycyrus

Poem

🐇 A CEF cache lock stood in the way,
So the rabbit built a wait—with backoff at play!
Count instances, sleep, check the time,
If the lock's still held, exit so fine.
Deep links forwarded, the path runs clear! 🎯

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a Windows pre-CEF cache-lock wait mechanism to prevent relaunch panics, which matches the core objective of the changeset.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the bug label Jun 2, 2026
@oxoxDev
Copy link
Copy Markdown
Contributor Author

oxoxDev commented Jun 2, 2026

Correction to the PR body: there is no Windows-target CI job in this repo, so the #[cfg(windows)] code in cef_singleton_wait.rs (the Toolhelp32 enumeration + wait loop) is not compiled by CI — Linux/macOS runners compile it out. The pure decide/backoff_delay logic is host-tested + in CI; the Windows glue's compilation rests on a local Windows build.

@sanil-23 — two asks when you get a Windows box: (1) confirm it compiles (pnpm rust:check / cargo build for the shell on Windows), and (2) the runtime repro from the earlier comment. (1) is the more important gate since CI can't catch a Windows compile error here.

@senamakel senamakel merged commit f2154de into tinyhumansai:main Jun 3, 2026
23 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants