Skip to content

Conversation

@AlexKehayov
Copy link
Contributor

Description:

  • Added FileBlockItemWriterBenchmark containing benchmark tests displaying throughput, latency, compression ratios, and write-only performance across various block configurations.
  • Tested performance with original approach (results below)
  • Added async compression - background thread with backpressure control
  • Consolidated Buffers - single 2MB buffer (instead of triple 1MB+256KB+4MB)
  • Increased GZIP Buffer - 1MB (from 256KB)
  • Modified benchmarks to fit the new async approach
  • Tested the optimized logic (results below)

Summarized results:

  • Average Speedup: 2.1-2.9x faster across all configurations
  • Best Case: 3.06x faster (100 items × 5000 bytes)
  • Typical Block: 2.11x faster (500 items × 2000 bytes)
  • Write Performance: 1.43-1.62x faster (buffering improvements)
  • Original: Process 1,019 blocks in 1 second (998ms latency)
  • Optimized: Process 2,166 blocks in 1 second (473ms latency)
  • No backpressure risk - Compression completes in <0.5ms, well under 2s block time

Overall Performance (Block: 500 items × 2000 bytes)

Metric Original (Baseline) Optimized Improvement
Throughput 1.019 ops/ms
(1,019 blocks/sec)
2.166 ops/ms
(2,166 blocks/sec)
2.13x faster
Average Time 0.998 ms/block 0.473 ms/block 2.11x faster
Compression Ratio 15% (85% reduction) 15% (85% reduction) Same
Write-Only Time 97 µs 68 µs 1.43x faster

Original bench results:
Bench original.rtf

Optimized bench results:
Bench optimized.rtf

Related issue(s):
Closes #22128

Notes for reviewer:
Unit tests covering the changes and documentation will be added if the approach is considered useful.

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

@AlexKehayov AlexKehayov requested review from a team as code owners November 21, 2025 14:27
@lfdt-bot
Copy link

lfdt-bot commented Nov 21, 2025

Snyk checks have passed. No issues have been found so far.

Status Scanner Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@AlexKehayov AlexKehayov self-assigned this Nov 21, 2025
@derektriley
Copy link
Contributor

Could we make a new implementation in a class like FileBlockItemWriterV2?

Then could we have the benchmark run against both implementations?

And lets make sure the benchmark covers both small blocks and blocks up to say like 500mb

@codecov
Copy link

codecov bot commented Nov 24, 2025

Codecov Report

❌ Patch coverage is 1.20192% with 411 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...p/blocks/impl/streaming/FileBlockItemWriterV2.java 0.00% 200 Missing ⚠️
...p/blocks/impl/streaming/FileBlockItemWriterV3.java 0.00% 187 Missing ⚠️
...app/blocks/impl/streaming/FileBlockItemWriter.java 0.00% 15 Missing ⚠️
.../node/app/hapi/utils/blocks/BlockStreamAccess.java 38.46% 4 Missing and 4 partials ⚠️
.../com/hedera/node/app/blocks/BlockStreamModule.java 0.00% 1 Missing ⚠️

Impacted file tree graph

@@             Coverage Diff              @@
##               main   #22287      +/-   ##
============================================
- Coverage     70.97%   70.71%   -0.27%     
+ Complexity    24632    24631       -1     
============================================
  Files          2698     2700       +2     
  Lines        105072   105475     +403     
  Branches      11043    11088      +45     
============================================
+ Hits          74577    74582       +5     
- Misses        26472    26870     +398     
  Partials       4023     4023              
Files with missing lines Coverage Δ Complexity Δ
.../com/hedera/node/app/blocks/BlockStreamModule.java 84.21% <0.00%> (ø) 7.00 <0.00> (ø)
.../node/app/hapi/utils/blocks/BlockStreamAccess.java 43.47% <38.46%> (-2.31%) 21.00 <0.00> (+1.00) ⬇️
...app/blocks/impl/streaming/FileBlockItemWriter.java 40.10% <0.00%> (-1.56%) 21.00 <0.00> (ø)
...p/blocks/impl/streaming/FileBlockItemWriterV3.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)
...p/blocks/impl/streaming/FileBlockItemWriterV2.java 0.00% <0.00%> (ø) 0.00 <0.00> (?)

... and 13 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: Alex Kehayov <[email protected]>
@AlexKehayov
Copy link
Contributor Author

@derektriley I made some changes to compare both implementations side by side. Also added param for running the bench with 500mb blocks. Attaching the output here.
comparison bench.rtf

Summary of the results:

Block Size Items × Size Original (ops/ms) Optimized (ops/ms) Speedup
25 KB 50 × 500 5.385 12.665 2.35x
1 MB 500 × 2000 0.957 2.165 2.26x
10 MB 2000 × 5000 0.218 0.423 1.94x
100 MB 5000 × 20000 0.051 0.095 1.86x
500 MB 10000 × 50000 0.014 0.024 1.71x

Buffer performance:

Items × Size Original (µs) Optimized (µs) Speedup
2000 × 2000 150 117 1.28x
5000 × 5000 296 252 1.17x
5000 × 20000 18,457 366 50.4x
10000 × 20000 36,847 741 49.7x
10000 × 50000 67,640 1,645 41.1x

@AlexKehayov
Copy link
Contributor Author

After further revising the async approach, I found it causes incompatible behavior that would compromise other system components. Since blocks must be persisted to disk in order and be available on disk immediately when marked CLOSED, I switched to a different async approach that guarantees these conditions, but the performance gains weren't sufficient for the added complexity.
After that, I switched to a synchronous approach with only an in-memory buffering upgrade, which improved writing of items significantly. However, since compression takes more than 90% of the CPU time, this upgrade gave almost nothing to overall performance.
V3 uses a synchronous approach with ZSTD compression instead of GZIP, and performance is significantly better while compressed block size is even smaller. For testing and realism, the readers in this PR are updated to read both .gz and .zst files.
Will post a table with results once the benchmark completes.

@AlexKehayov AlexKehayov marked this pull request as draft November 26, 2025 13:43
@AlexKehayov
Copy link
Contributor Author

V3 results:
bench v3.rtf

Throughput (ops/ms) — V3 is faster
Small blocks (2KB avg): 8.6x faster (2.376 vs 0.277 ops/ms)
Medium blocks (20KB avg): 11.4x faster (1.471 vs 0.129 ops/ms)
Large blocks (50KB avg): 15.0x faster (1.019 vs 0.068 ops/ms)
Very large blocks (50KB avg, 10K items): 16.4x faster (0.213 vs 0.013 ops/ms)

Compression ratio — V3 is better
2KB avg, 2K items: Original (GZIP) = 113,541 bytes (15%), V3 (ZSTD) = 99,660 bytes (15%) — V3 is 12% smaller
2KB avg, 10K items: Original (GZIP) = 572,640 bytes (15%), V3 (ZSTD) = 501,519 bytes (15%) — V3 is 12% smaller
20KB avg, 2K items: Original (GZIP) = 159,834 bytes (9%), V3 (ZSTD) = 134,601 bytes (6%) — V3 is 16% smaller
20KB avg, 10K items: Original (GZIP) = 803,673 bytes (6%), V3 (ZSTD) = 679,230 bytes (6%) — V3 is 15% smaller
50KB avg, 2K items: Original (GZIP) = 236,457 bytes (6%), V3 (ZSTD) = 147,843 bytes (3%) — V3 is 37% smaller
50KB avg, 10K items: Original (GZIP) = 1,191,924 bytes (6%), V3 (ZSTD) = 738,951 bytes (3%) — V3 is 38% smaller

Performance: V3 is 8–16x faster across all block sizes.
Compression: V3 produces smaller files, especially for larger blocks (up to 38% smaller).
Compression ratio: V3 achieves 3–6% vs GZIP’s 6–15%, with larger blocks showing the biggest gains.
V3 (ZSTD Level 1) outperforms the original (GZIP Level 6) in both speed and compression ratio. The results show significant improvements, especially for larger blocks.

@timfn-hg
Copy link
Contributor

FWIW there is a ticket for PBJ to explore using Zstd. While that ticket focuses on the gRPC side of things, it may be nice to explore the Zstd implementations mentioned there so everything uses the same implementation. Just a thought.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create JMH Benchmark for FileBlockItemWriter and optimize performance

4 participants