Skip to content

Conversation

@BiteTheDDDDt
Copy link
Contributor

@BiteTheDDDDt BiteTheDDDDt commented Dec 1, 2025

What problem does this PR solve?

doris have crc32c from rocksdb now, but it has poorly performance than google/crc32c.

66663538 rows int
crc32c-rocksdb 684.879ms
crc32c-google 206.360ms

66663538 rows varchar
crc32c-rocksdb 1sec368ms
crc32c-google 391.290ms

We already have unit tests for rocksdb/crc32c(be/test/util/crc32c_test.cpp), so this change is safe

This pull request updates the codebase to use the more efficient and modern CRC32C hashing algorithm in place of the older CRC32 implementation. The changes include switching hash functions throughout the code, updating the CRC32C utility implementation to use an external library, and adding the required third-party dependency. This improves hash performance and consistency, and prepares the codebase for future compatibility.

Hashing algorithm migration:

  • Replaced all usages of HashUtil::crc_hash with HashUtil::crc32c_hash in block_bloom_filter.hpp, column_dictionary.h, and function_string.h to utilize CRC32C for better performance and reliability. [1] [2] [3] [4]

  • Added the new crc32c_hash method to HashUtil and marked the old crc_hash as deprecated, retaining it only for backward compatibility with historical data. [1] [2] [3]

CRC32C utility refactor and dependency management:

  • Refactored crc32c.cpp and crc32c.h to use the external crc32c library, removing the previous custom implementation and lookup tables. Added new utility functions for CRC32C operations. [1] [2]

  • Added the crc32c third-party dependency in the build configuration to support the new CRC32C utility.

Build and header updates:

  • Updated includes in hash_util.hpp to reference the new CRC32C utility.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@BiteTheDDDDt
Copy link
Contributor Author

run buildall

1 similar comment
@BiteTheDDDDt
Copy link
Contributor Author

run buildall

Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we have some test to keep the result consistency? and would this affect storage distribution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants