Skip to content

Commit c7c8f20

Browse files
TortarLilithHafner
andauthored
Unroll loop in _rand_slow_path (#171)
--------- Co-authored-by: Lilith Orion Hafner <[email protected]>
1 parent ffc7f7a commit c7c8f20

File tree

1 file changed

+3
-4
lines changed

1 file changed

+3
-4
lines changed

src/WeightVectors.jl

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -248,10 +248,9 @@ function _rand_slow_path(rng::AbstractRNG, m::Memory{UInt64}, i)
248248
m2 = signed(m[2])
249249
x = zero(UInt64)
250250
checkbounds(m, 2m2-2Sys.WORD_SIZE+2093:2m2+2093)
251-
@inbounds for i in Sys.WORD_SIZE:-1:0 # This loop is backwards so that memory access is forwards. TODO for perf, we can get away with shaving 1 to 10 off of this loop.
252-
# This can underflow from significand sums into weights, but that underflow is safe because it can only happen if all the latter weights are zero. Be careful about this when re-arranging the memory layout!
253-
x += m[2m2-2i+2093] >> (i - 1)
254-
end
251+
# TODO for perf, we can get away with shaving 1 to 10 off of these operations.
252+
# This can underflow from significand sums into weights, but that underflow is safe because it can only happen if all the latter weights are zero. Be careful about this when re-arranging the memory layout!
253+
Base.Cartesian.@nexprs 65 i -> (@inbounds x += m[2m2-2i+2+2093] >> (i - 2))
255254

256255
# x is computed by rounding down at a certain level and then summing (and adding 1)
257256
# m[5] will be computed by rounding up at a more precise level and then summing

0 commit comments

Comments
 (0)