fix(db): avoid long flush stall on restart by EddieHouston · Pull Request #211 · Blockstream/electrs

EddieHouston · 2026-04-23T12:03:52Z

Summary

Fix a flush stall that occurs on restart of a mature electrs DB built from 91b883a or later.

On restart, DB::open applied bulk-load L0 triggers (64/256/512) unconditionally. Later in
update(), enable_auto_compaction() reset them to RocksDB defaults (4/20/36). If L0 held
more than 36 files at that point — normal after a bulk-load that was interrupted or followed
by a restart — the tightened level0_stop_writes_trigger immediately put the DB into
pre-flush stall territory. The subsequent db.flush() parked inside
WaitUntilFlushWouldNotStallWrites until background compaction drained L0 below 36.

On testnet with 173 L0 files this caused over 1 hour of indexer freeze on restart.

Fix

DB::open() checks the F sentinel before choosing triggers. When F is absent
(initial sync incomplete), it widens L0 triggers for bulk load (64/256/512). When F is
present (normal restart), it leaves RocksDB defaults (4/20/36) in place — no runtime
set_options() needed.
apply_bulk_load_triggers() — new method that widens L0 triggers and disables
pending-compaction-bytes stalls for initial sync throughput.
apply_steady_state_triggers() — restores RocksDB defaults after full_compaction()
drains L0. Called once inside the F sentinel gate in start_auto_compactions().
start_auto_compactions() replaces the old enable_auto_compaction() trigger-reset
logic. On first completion of initial sync (F absent), it runs a one-time full compaction,
restores steady-state triggers, and writes the F sentinel. On subsequent calls (F
present), it only enables auto-compaction.
Operator logging at info level for which trigger profile is active and whether full
compaction runs or is skipped.

Test plan

cargo check clean
cargo test --lib — 8/8 pass, including new_index::db::tests::*
cargo test --test electrum — 4/4 pass
cargo test --test rest — 22/22 pass
Deploy to testnet and verify restart completes flush in under a second
Confirm Manual flush start → Manual flush finished in RocksDB LOG are within ms
Verify synced tip resumes advancing from ZMQ notifications immediately after restart

philippem · 2026-04-30T21:01:14Z

the description is misleading because update() is called once on initial startup, not in the main loop. The steady state compaction parameters are applied once regardless of the number of L0 files that happen to be on disk.

The PR code no longer applies the steady state options that we had previously, which means we end up with defaults after a restart, not the steady state. They are written to an options file on disk but it needs to be explicitly loaded and reapplied.

https://github.com/facebook/rocksdb/wiki/RocksDB-Options-File

Are you able to repro L0 accumulation locally? I attempted by repeatedly killing electrs after startup but L0 did not increase (it was loading a full bitcoin mainnet).

EddieHouston · 2026-05-05T13:04:02Z

the description is misleading because update() is called once on initial startup, not in the main loop. The steady state compaction parameters are applied once regardless of the number of L0 files that happen to be on disk.

Updating description.

The PR code no longer applies the steady state options that we had previously, which means we end up with defaults after a restart, not the steady state. They are written to an options file on disk but it needs to be explicitly loaded and reapplied.

The DB always opens with Options::default() (not from the OPTIONS file...RocksDB still writes an OPTIONS file to disk automatically (it does that on every open/set_options call), but electrs never reads it back. Every startup builds options from scratch in code).

So the steady-state values (4/20/36, 64 GiB/256 GiB) are already in effect on every startup. The apply_steady_state_triggers() only needs to run after apply_bulk_load_triggers() within the same process.

https://github.com/facebook/rocksdb/wiki/RocksDB-Options-File

Are you able to repro L0 accumulation locally? I attempted by repeatedly killing electrs after startup but L0 did not increase (it was loading a full bitcoin mainnet).

TBD, looking into how to create a bunch of L0 files locally to reproduce this.

…entinel DB::open now checks the full-compaction sentinel 'F' before choosing L0 triggers: bulk-load (64/256/512) when absent, RocksDB defaults (4/20/36) when present. This eliminates the wide-to-tight trigger transition that parked db.flush() in WaitUntilFlushWouldNotStallWrites when L0 exceeded 36 files after restart.

EddieHouston requested review from DeviaVir, Randy808 and philippem April 23, 2026 12:03

EddieHouston self-assigned this Apr 23, 2026

EddieHouston force-pushed the fix/restart-flush-stall branch from d102699 to bef02e3 Compare April 23, 2026 12:10

philippem reviewed Apr 27, 2026

View reviewed changes

Comment thread src/new_index/schema.rs

philippem reviewed Apr 27, 2026

View reviewed changes

Comment thread src/new_index/db.rs Outdated

philippem reviewed Apr 27, 2026

View reviewed changes

Comment thread src/new_index/db.rs Outdated

EddieHouston force-pushed the fix/restart-flush-stall branch from 1ebdfd6 to 5e80660 Compare April 28, 2026 12:50

Randy808 previously approved these changes Apr 29, 2026

View reviewed changes

philippem reviewed Apr 30, 2026

View reviewed changes

Comment thread src/new_index/schema.rs Outdated

philippem reviewed Apr 30, 2026

View reviewed changes

Comment thread src/new_index/schema.rs Outdated

philippem reviewed Apr 30, 2026

View reviewed changes

Comment thread src/new_index/db.rs Outdated

philippem requested changes Apr 30, 2026

View reviewed changes

EddieHouston dismissed Randy808’s stale review via 7b8eb91 May 6, 2026 10:27

EddieHouston force-pushed the fix/restart-flush-stall branch from 5e80660 to 7b8eb91 Compare May 6, 2026 10:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(db): avoid long flush stall on restart#211

fix(db): avoid long flush stall on restart#211
EddieHouston wants to merge 1 commit intoBlockstream:new-indexfrom
EddieHouston:fix/restart-flush-stall

EddieHouston commented Apr 23, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philippem commented Apr 30, 2026

Uh oh!

EddieHouston commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

EddieHouston commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Test plan

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

philippem commented Apr 30, 2026

Uh oh!

EddieHouston commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

EddieHouston commented Apr 23, 2026 •

edited

Loading

EddieHouston commented May 5, 2026 •

edited

Loading