Remove bounds checks from extend_from_within_overlapping
#186
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I came across a challenge to remove bounds checks from this function in https://flexineering.com/posts/lz4-011/
This is the code currently in tree: https://godbolt.org/z/W3zqcbnd5
This is the proposed code: https://godbolt.org/z/an1sYhc4c
As you can see, all the bounds checks from the hot loop are gone. On the flip side, there is more assembly total. Not sure why, perhaps the compiler unrolls the loop more aggressively?
It's possible to get the amount of bounds checks down to 2, but that produces even more assembly: https://godbolt.org/z/dKfbcv98r
You can try this version if you check out commit dd23da8
I have not benchmarked whether this is actually faster or slower in practice. It's possible that the increase in the total amount of assembly loaded into instruction cache will offset the gains from removing the already perfectly predictable branches of the bounds checks.
This is likely going to run into all the same issues as #141, but I wanted to give it another try.