Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "UnrolledUtilities"
uuid = "0fe1646c-419e-43be-ac14-22321958931b"
authors = ["CliMA Contributors <[email protected]>"]
version = "0.1.8"
version = "0.1.9"

[compat]
julia = "1.9"
Expand Down
41 changes: 29 additions & 12 deletions docs/src/developer_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ CurrentModule = UnrolledUtilities
## How to Unroll

There are two general ways to implement loop unrolling in Julia—recursively
splatting iterator contents and manually generating unrolled expressions. For
splatting iterator contents and manually constructing unrolled expressions. For
example, a recursively unrolled version of the `foreach` function is

```julia
Expand All @@ -14,23 +14,42 @@ _unrolled_foreach(f) = nothing
_unrolled_foreach(f, item, items...) = (f(item); _unrolled_foreach(f, items...))
```

In contrast, a generatively unrolled implementation of this function looks like
In contrast, a manually unrolled implementation of this function looks like

```julia
unrolled_foreach(f, itr) = _unrolled_foreach(Val(length(itr)), f, itr)
@generated _unrolled_foreach(::Val{N}, f, itr) where {N} =
Expr(:block, (:(f(generic_getindex(itr, $n))) for n in 1:N)..., nothing)
```

To switch between recursive and generative unrolling, this package defines the
following function:
Julia's compiler can only pass up to 32 values through function arguments
without allocating heap memory, so recursive unrolling is not type-stable for
iterators with lengths greater than 32. However, automatically generating
functions often requires more time and memory resources during compilation than
writing hard-coded functions. Recursive inlining adds overhead to compilation
as well, but this is typically smaller than the overhead of generated functions
for short iterators. To avoid sacrificing latency by using generated functions,
several hard-coded methods can be added to the manually unrolled implementation:

```@docs
rec_unroll
```julia
_unrolled_foreach(::Val{0}, f, itr) = nothing
_unrolled_foreach(::Val{1}, f, itr) = (f(generic_getindex(itr, 1)); nothing)
_unrolled_foreach(::Val{2}, f, itr) =
(f(generic_getindex(itr, 1)); f(generic_getindex(itr, 2)); nothing)
_unrolled_foreach(::Val{3}, f, itr) =
(f(generic_getindex(itr, 1)); f(generic_getindex(itr, 2)); f(generic_getindex(itr, 3)); nothing)
```

The default choice for `rec_unroll` is motivated by the benchmarks for
[Generative vs. Recursive Unrolling](@ref).
With this modification, manual unrolling does not exceed the compilation
requirements of recursive unrolling across a wide range of use cases. Since it
also avoids type instabilities for arbitrarily large iterators, a combination
of hard-coded and generated functions with manual unrolling serves as the basis
of all unrolled functions defined in this package. Similarly, the
[`ntuple(f, ::Val{N})`](https://github.com/JuliaLang/julia/blob/v1.11.0/base/ntuple.jl)
function in `Base` uses this strategy to implement loop unrolling.

For benchmarks that compare these two implementations, see
[Manual vs. Recursive Unrolling](@ref).

## Interface API

Expand All @@ -54,10 +73,8 @@ StaticSequence

To unroll over a statically sized iterator of some user-defined type `T`, follow
these steps:
- To enable recursive unrolling, add a method for `iterate(::T, [state])`
- To enable generative unrolling, add a method for `getindex(::T, n)` (or for
`generic_getindex(::T, n)` if `getindex` should not be defined for iterators
of type `T`)
- Add a method for `getindex(::T, n)`, or for `generic_getindex(::T, n)` if
`getindex` should not be defined for iterators of type `T`
- If every unrolled function that needs to construct an iterator when given an
iterator of type `T` can return a `Tuple` instead, stop here
- Otherwise, to return a non-`Tuple` iterator whenever it is efficient to do so,
Expand Down
39 changes: 17 additions & 22 deletions src/UnrolledUtilities.jl
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,14 @@ include("unrollable_iterator_interface.jl")
# Analogue of the non-public Base._InitialValue for reduction and accumulation.
struct NoInit end

@inline empty_reduction_value(::NoInit) =
error("unrolled_reduce requires an init value for empty iterators")
@inline empty_reduction_value(init) = init

@inline first_reduction_value(op, itr, ::NoInit) = generic_getindex(itr, 1)
@inline first_reduction_value(op, itr, init) =
op(init, generic_getindex(itr, 1))

# Analogue of ∘, but with only one function argument and guaranteed inlining.
# Base's ∘ leads to type instabilities in unit tests on Julia 1.10 and 1.11.
@inline ⋅(f1::F1, f2::F2) where {F1, F2} = x -> (@inline f1(f2(x)))
Expand Down Expand Up @@ -134,17 +142,16 @@ include("StaticBitVector.jl")
unrolled_drop_into(inferred_output_type(itr), itr, val_N)

##
## Functions unrolled using either recursion or generated expressions
## Functions unrolled using either hard-coded or generated expressions
##

include("recursively_unrolled_functions.jl")
include("generatively_unrolled_functions.jl")
include("manually_unrolled_functions.jl")

# The unrolled_map function could also be implemented in terms of ntuple, but
# then it would be subject to the same recursion limit as ntuple. On Julia 1.10,
# this leads to type instabilities in several unit tests for nested iterators.
@inline unrolled_map_into_tuple(f::F, itr) where {F} =
(rec_unroll(itr) ? rec_unrolled_map : gen_unrolled_map)(f, itr)
_unrolled_map(Val(length(itr)), f, itr)
@inline unrolled_map_into(output_type, f::F, itr) where {F} =
constructor_from_tuple(output_type)(unrolled_map_into_tuple(f, itr))
@inline unrolled_map(f::F, itr) where {F} =
Expand All @@ -154,32 +161,26 @@ include("generatively_unrolled_functions.jl")

@inline unrolled_any(itr) = unrolled_any(identity, itr)
@inline unrolled_any(f::F, itr) where {F} =
(rec_unroll(itr) ? rec_unrolled_any : gen_unrolled_any)(f, itr)
_unrolled_any(Val(length(itr)), f, itr)

@inline unrolled_all(itr) = unrolled_all(identity, itr)
@inline unrolled_all(f::F, itr) where {F} =
(rec_unroll(itr) ? rec_unrolled_all : gen_unrolled_all)(f, itr)
_unrolled_all(Val(length(itr)), f, itr)

@inline unrolled_foreach(f::F, itr) where {F} =
(rec_unroll(itr) ? rec_unrolled_foreach : gen_unrolled_foreach)(f, itr)
_unrolled_foreach(Val(length(itr)), f, itr)
@inline unrolled_foreach(f, itrs...) = unrolled_foreach(splat(f), zip(itrs...))

@inline unrolled_reduce(op::O, itr, init) where {O} =
isempty(itr) && init isa NoInit ?
error("unrolled_reduce requires an init value for empty iterators") :
(rec_unroll(itr) ? rec_unrolled_reduce : gen_unrolled_reduce)(op, itr, init)
_unrolled_reduce(Val(length(itr)), op, itr, init)
@inline unrolled_reduce(op::O, itr; init = NoInit()) where {O} =
unrolled_reduce(op, itr, init)

@inline unrolled_mapreduce(f::F, op::O, itrs...; init = NoInit()) where {F, O} =
unrolled_reduce(op, unrolled_map(f, itrs...), init)

@inline unrolled_accumulate_into_tuple(op::O, itr, init) where {O} =
(rec_unroll(itr) ? rec_unrolled_accumulate : gen_unrolled_accumulate)(
op,
itr,
init,
)
_unrolled_accumulate(Val(length(itr)), op, itr, init)
@inline unrolled_accumulate_into(output_type, op::O, itr, init) where {O} =
constructor_from_tuple(output_type)(
unrolled_accumulate_into_tuple(op, itr, init),
Expand Down Expand Up @@ -209,13 +210,7 @@ include("generatively_unrolled_functions.jl")
itr,
itrs...,
) where {F, I, E} =
(rec_unroll(itr) ? rec_unrolled_ifelse : gen_unrolled_ifelse)(
f,
get_if,
get_else,
itr,
itrs...,
)
_unrolled_ifelse(Val(length(itr)), f, get_if, get_else, itr, itrs...)

##
## Unrolled functions without any analogues in Base
Expand Down
78 changes: 0 additions & 78 deletions src/generatively_unrolled_functions.jl

This file was deleted.

Loading
Loading