fix(shell): Windows pre-CEF cache-lock wait to stop relaunch panic (TAURI-RUST-F)#3210
Conversation
The vendored tauri-runtime-cef asserts cef::initialize()==1 and panics (left:0 right:1) when the CEF user-data-dir is still locked by another OpenHuman process. The existing Win32 mutex only covers concurrent second launches; a sequential relaunch (auto-update, fast quit+reopen, restart) can still call cef::initialize() while the prior instance's cache lock is held during teardown. Add cef_singleton_wait: a bounded (5s, exponential backoff) wait that counts straggler processes from a dying prior instance (same exe, excluding self — correct here because our own CEF subprocesses don't exist pre-init and the mutex guarantees we're the singleton) and proceeds once they exit, or exits cleanly if still held. The panic is prevented (initialize never runs against a live lock), not suppressed. Pure decide()/backoff logic is unit-tested on any host; the Win32 Toolhelp enumeration is windows-only.
…-RUST-F) Call cef_singleton_wait::wait_for_cache_release() after the Win32 mutex and prepare_process_cache_path, before the Tauri builder constructs the CEF runtime — the Windows analog of the macOS reap_stale_openhuman_processes guard. Closes the relaunch-race window that re-triggers the cef::initialize assert panic.
|
@sanil-23 could you repro on real Windows once? This is a Windows-only CEF init race I can't reproduce on macOS. Repro: launch OpenHuman, then trigger a fast relaunch — auto-update relaunch, or quit (Cmd/Alt+F4) and immediately reopen, or a restart flow — while the prior instance is still tearing down. Pre-fix that hits the |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
📝 WalkthroughWalkthroughThe PR introduces a Windows-specific pre-CEF initialization guard that prevents panics when sequential relaunches occur while a prior process still holds the CEF cache lock. It adds bounded polling with exponential backoff, process enumeration via ToolHelp APIs, and clean process exit with deep-link forwarding if the wait budget is exhausted. ChangesWindows CEF singleton cache-lock wait guard
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
Correction to the PR body: there is no Windows-target CI job in this repo, so the @sanil-23 — two asks when you get a Windows box: (1) confirm it compiles ( |
Summary
cef::initialize()while a dying prior instance still holds the CEF user-data-dir.assertion left == right failed, left: 0, right: 1— raised when the vendoredtauri-runtime-cefassertscef::initialize() == 1and gets0on a locked cache (Sentry TAURI-RUST-F, ~3.2k events, live on 0.57.5).cef_singleton_waitmodule: bounded (5 s, exponential backoff) wait for prior-instance processes to exit, then proceed; clean exit if still held. Windows analog of the existing macOSreap_stale_openhuman_processesguard.Problem
The existing pre-CEF Win32 named-mutex guard (
run()inlib.rs) stops a concurrent second launch. It does not cover the sequential relaunch race:run()returns → the RAII mutex guard drops → the named mutex is released.cef::initialize()while A's cache lock is live → returns0→ the vendored runtime'sassert_eq!(result, 1)panics.prepare_process_cache_path()creates the per-user cache dir but never checks whether it is actively locked, and Chromium'sSingletonLocksymlink thatcef_preflightreads on macOS/Linux does not exist on Windows — so Windows had no cache-lock guard beyond the lifetime-mismatched mutex.Solution
app/src-tauri/src/cef_singleton_wait.rs:decide(other_instances, elapsed, budget)+backoff_delay(attempt)— unit-tested on any host.#[cfg(windows)] win::count_other_app_instances()— Toolhelp32 snapshot counting processes with our exe basename, excluding self.#[cfg(windows)] wait_for_cache_release()— poll/backoff up to 5 s;Proceedwhen count is 0, elseKeepWaiting, elseGiveUp→ forward deep links +exit(0).lib.rsafter the Win32 mutex +prepare_process_cache_path, before the Tauri builder — beside the macOS reap.cef::initialize(), so our own CEF subprocesses don't exist yet, and it runs after the mutex guard, which guarantees we're the only top-level instance — so any other same-exe PIDs are the dying prior instance's stragglers, dropping to 0 once it exits.cef::initialize()is simply never called against a live lock. Nocatch_unwind, no assert masking.tauri-runtime-cefreturn anErrinstead of asserting oncef::initialize() == 0; tracked as a vendored-CEF change.Submission Checklist
decide,backoff_delay) is covered by 5 host unit tests (proceed / keep-waiting / give-up boundary / backoff cap / overflow). The#[cfg(windows)]Toolhelp enumeration + wait loop are not instrumentable on the Linux coverage runner (compiled out off-Windows); validated by the Windows CI build + manual repro instead.Sentry-Issue: TAURI-RUST-F(see Related).Impact
cfg(windows); pure logic also compiled undertest).cargo fmt, Tauri-shellcargo check/clippy/test(5 new tests pass). Thecfg(windows)glue compiles via the Windows CI job; the actual race is being reproduced/smoked on real Windows by @sanil-23.Related
Sentry-Issue: TAURI-RUST-Ftauri-runtime-cefcef::initialize()assert →Err(true root fix; broader blast radius).Summary by CodeRabbit