Skip to content

Conversation

@AaronRM
Copy link
Contributor

@AaronRM AaronRM commented Dec 5, 2025

This pull request introduces an implementation of the Quiver write-ahead log (WAL). The most significant change from the initial spec includes a rewrite and clarification of the WAL file rotation and checkpointing mechanism. Documentation has been updated to reflect the new design.

@github-actions github-actions bot added the rust Pull requests that update Rust code label Dec 5, 2025
@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 96.69247% with 94 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.74%. Comparing base (90148f9) to head (d6096d2).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1537      +/-   ##
==========================================
+ Coverage   83.48%   83.74%   +0.26%     
==========================================
  Files         428      433       +5     
  Lines      118652   121877    +3225     
==========================================
+ Hits        99054   102068    +3014     
- Misses      19064    19275     +211     
  Partials      534      534              
Components Coverage Δ
otap-dataflow 85.08% <96.69%> (+0.36%) ⬆️
query_abstraction 80.61% <ø> (ø)
query_engine 90.26% <ø> (+0.01%) ⬆️
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 53.50% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

"MIT-0",
"Apache-2.0",
"Unicode-3.0",
"BSD-2-Clause",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Required due to the array-ref transitive dependency (from blake3) using this license. BSD-2-Clause is even more permissive than BSD-3-Clause (which is already allowed here), so I don't believe this should be a concern.

@AaronRM AaronRM marked this pull request as ready for review December 5, 2025 22:57
@AaronRM AaronRM requested a review from a team as a code owner December 5, 2025 22:57
test_crashed: false,
};
writer.next_sequence = writer.coordinator.detect_next_sequence()?;
Ok(writer)
Copy link
Member

@lalitb lalitb Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an issue if the collector crashes mid-write? The WAL file could have a partial/corrupt record at the end. After restart, WalWriter::open() will seek to physical EOF, and subsequent writes happening after corrupted data could result in:

Before crash:  [Batch 1] [Batch 2] [Batch 3] [partial...]
After restart: [Batch 1] [Batch 2] [Batch 3] [garbage] [Batch 4] [Batch 5]

Maybe this is more theoretical. Either way, would be good to have a test that simulates a partial write and verifies recovery behavior (whether that's truncation, detection, or something else).

}

let entry_len = u32::from_le_bytes(len_buf) as usize;
self.buffer.resize(entry_len, 0);
Copy link
Member

@lalitb lalitb Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the reader is trusting the length got from the file, and doing the allocation. In case of the WAL file is corrupted/or malicious attack, the length is large enough ( say 0xFFFF..), the reader will try allocate that size. While the 4 bytes will limit the allocation to 4GB, all df_instance doing this allocation can result in OOM crash. Should we have some kind of max limit check (say WAL size won't be more than 64MB) ?

}

fn wal_path(config: &QuiverConfig) -> PathBuf {
config.data_dir.join("wal").join("quiver.wal")
Copy link
Member

@lalitb lalitb Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just confirming - each df_engine instance has a separate data_dir, right? Otherwise multiple instances would conflict on this path.

flush_policy,
max_wal_size: u64::MAX,
max_rotated_files: 8,
rotation_target_bytes: 64 * 1024 * 1024,
Copy link
Member

@lalitb lalitb Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both above two magic numbers seems to be important operational defaults that affect capacity behavior. Good to define them as named constants to make them more discoverable along with doc comments to explain the rational behind using these default values.

path,
segment_cfg_hash,
flush_policy,
max_wal_size: u64::MAX,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More of a question - what will happen if we have reached the limit of 8 wal rotated files of 64MB each, while this max_wal_size limit is still not reached?

}
}
Ok(highest.map_or(0, |seq| seq.wrapping_add(1)))
}
Copy link
Member

@lalitb lalitb Dec 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detect_next_sequence() scans all entries in all WAL files on startup. For the default ( 64MB * 8 = 512MB ) capacity this is fine, but would it make sense to persist the last sequence in the checkpoint sidecar for faster recovery?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants