-
Notifications
You must be signed in to change notification settings - Fork 62
[otap-df-quiver] Add Write-Ahead-Log (WAL) implementation to Quiver #1537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
… be based on logical cursor.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1537 +/- ##
==========================================
+ Coverage 83.48% 83.74% +0.26%
==========================================
Files 428 433 +5
Lines 118652 121877 +3225
==========================================
+ Hits 99054 102068 +3014
- Misses 19064 19275 +211
Partials 534 534
🚀 New features to boost your workflow:
|
| "MIT-0", | ||
| "Apache-2.0", | ||
| "Unicode-3.0", | ||
| "BSD-2-Clause", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required due to the array-ref transitive dependency (from blake3) using this license. BSD-2-Clause is even more permissive than BSD-3-Clause (which is already allowed here), so I don't believe this should be a concern.
| test_crashed: false, | ||
| }; | ||
| writer.next_sequence = writer.coordinator.detect_next_sequence()?; | ||
| Ok(writer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an issue if the collector crashes mid-write? The WAL file could have a partial/corrupt record at the end. After restart, WalWriter::open() will seek to physical EOF, and subsequent writes happening after corrupted data could result in:
Before crash: [Batch 1] [Batch 2] [Batch 3] [partial...]
After restart: [Batch 1] [Batch 2] [Batch 3] [garbage] [Batch 4] [Batch 5]
Maybe this is more theoretical. Either way, would be good to have a test that simulates a partial write and verifies recovery behavior (whether that's truncation, detection, or something else).
| } | ||
|
|
||
| let entry_len = u32::from_le_bytes(len_buf) as usize; | ||
| self.buffer.resize(entry_len, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the reader is trusting the length got from the file, and doing the allocation. In case of the WAL file is corrupted/or malicious attack, the length is large enough ( say 0xFFFF..), the reader will try allocate that size. While the 4 bytes will limit the allocation to 4GB, all df_instance doing this allocation can result in OOM crash. Should we have some kind of max limit check (say WAL size won't be more than 64MB) ?
| } | ||
|
|
||
| fn wal_path(config: &QuiverConfig) -> PathBuf { | ||
| config.data_dir.join("wal").join("quiver.wal") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming - each df_engine instance has a separate data_dir, right? Otherwise multiple instances would conflict on this path.
| flush_policy, | ||
| max_wal_size: u64::MAX, | ||
| max_rotated_files: 8, | ||
| rotation_target_bytes: 64 * 1024 * 1024, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both above two magic numbers seems to be important operational defaults that affect capacity behavior. Good to define them as named constants to make them more discoverable along with doc comments to explain the rational behind using these default values.
| path, | ||
| segment_cfg_hash, | ||
| flush_policy, | ||
| max_wal_size: u64::MAX, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More of a question - what will happen if we have reached the limit of 8 wal rotated files of 64MB each, while this max_wal_size limit is still not reached?
| } | ||
| } | ||
| Ok(highest.map_or(0, |seq| seq.wrapping_add(1))) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detect_next_sequence() scans all entries in all WAL files on startup. For the default ( 64MB * 8 = 512MB ) capacity this is fine, but would it make sense to persist the last sequence in the checkpoint sidecar for faster recovery?
This pull request introduces an implementation of the Quiver write-ahead log (WAL). The most significant change from the initial spec includes a rewrite and clarification of the WAL file rotation and checkpointing mechanism. Documentation has been updated to reflect the new design.