Remove bounds checks from `extend_from_within_overlapping` #186

Shnatsel · 2025-06-16T23:57:46Z

I came across a challenge to remove bounds checks from this function in https://flexineering.com/posts/lz4-011/

This is the code currently in tree: https://godbolt.org/z/W3zqcbnd5

This is the proposed code: https://godbolt.org/z/an1sYhc4c

As you can see, all the bounds checks from the hot loop are gone. On the flip side, there is more assembly total. Not sure why, perhaps the compiler unrolls the loop more aggressively?

It's possible to get the amount of bounds checks down to 2, but that produces even more assembly: https://godbolt.org/z/dKfbcv98r
You can try this version if you check out commit dd23da8

I have not benchmarked whether this is actually faster or slower in practice. It's possible that the increase in the total amount of assembly loaded into instruction cache will offset the gains from removing the already perfectly predictable branches of the bounds checks.

This is likely going to run into all the same issues as #141, but I wanted to give it another try.

…lapping

PSeitz · 2025-07-17T02:15:36Z

Thanks for the PR! Sadly this seems to be 1-2% slower.

I think it's a promising approach.

pub fn extend_from_within_overlapping(output: &mut [u8], pos: &mut usize, start: usize, num_bytes: usize) {
    let offset = *pos - start;
    for i in start + offset..start + offset + num_bytes {
        output[i] = output[i - offset];
    }
    *pos += num_bytes;
}

If offset is large enough, we can switch to copying larger chunks and the assembly indeed does seem to do this.

The main difference I think is that the original version does a 16byte check and copy, while the the Cell version has a 32byte version with some more checks.

The tradeoff for this function is probably better to have fewer checks, since we don't copy that much usually.
I wish we had some hints for the compiler to pass on that information.

Shnatsel · 2025-07-17T03:05:06Z

Sadly there doesn't seem to be a way to tell the compiler that the expected inputs are going to be small (other than profile-guided optimization, which is something the user of the library will have to do, and so is impractical here).

PSeitz · 2025-07-17T08:20:11Z

There is the nightly optimize parameter, which is already used, which could be used here.
If the assembly is small, it is usually suitable for small input.

#![feature(optimize_attribute)]

use std::cell::Cell;

#[optimize(size)]
pub fn extend_from_within_overlapping(
    slice: &mut [u8],
    pos: &mut usize,
    start: usize,
    num_bytes: usize,
) {
    let offset = *pos - start;

    if offset >= 8 && num_bytes >= 8 {
        extend_from_within_overlappin_large(slice, pos, start, num_bytes)
    } else {
        // fallback: byte-by-byte via Cell
        let cells: &[Cell<u8>] = Cell::from_mut(slice).as_slice_of_cells();
        let src = &cells[start..];
        let dst = &src[offset..];
        for (s, d) in src.iter().zip(dst.iter()).take(num_bytes) {
            d.set(s.get());
        }
        *pos += num_bytes;
    }
}

#[inline]
pub fn extend_from_within_overlappin_large(
    slice: &mut [u8],
    pos: &mut usize,
    start: usize,
    num_bytes: usize,
) {
    // ...
}

Shnatsel added 3 commits June 17, 2025 00:46

Remove bounds checks from the safe version of extend_from_within_over…

4be27bc

…lapping

Use a different formulation for 1 more slice check but shorter assembly

dd23da8

replace std with core

777ec72

Shnatsel closed this Jul 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove bounds checks from `extend_from_within_overlapping` #186

Remove bounds checks from `extend_from_within_overlapping` #186

Uh oh!

Shnatsel commented Jun 16, 2025 •

edited

Loading

Uh oh!

PSeitz commented Jul 17, 2025 •

edited

Loading

Uh oh!

Shnatsel commented Jul 17, 2025

Uh oh!

PSeitz commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Remove bounds checks from extend_from_within_overlapping #186

Remove bounds checks from extend_from_within_overlapping #186

Uh oh!

Conversation

Shnatsel commented Jun 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PSeitz commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shnatsel commented Jul 17, 2025

Uh oh!

PSeitz commented Jul 17, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Remove bounds checks from `extend_from_within_overlapping` #186

Remove bounds checks from `extend_from_within_overlapping` #186

Shnatsel commented Jun 16, 2025 •

edited

Loading

PSeitz commented Jul 17, 2025 •

edited

Loading