Skip to content

Combining tests and benchmarks #10

@folkertdev

Description

@folkertdev

When trying to incrementally optimize and benchmarking a function, the benchmark and the regression test often use very similar code.

hashTriangleTest =
    let
        toList ( a, b, c ) =
            [ a, b, c ]
    in
        Test.fuzz fuzzTriangle "hash triangle behaves as before" <|
            \triangle ->
                AdjacencyList.hashTriangle triangle
                    |> toList
                    |> Expect.equal (hashTriangleOld triangle)

hashTriangleBenchmark =
        describe "SweepHull"
            [ -- nest as many descriptions as you like
              Benchmark.compare "sharesEdge shared"
                "old"
                (\_ -> AdjacencyList.hashTriangleOld triangle)
                "new"
                (\_ -> AdjacencyList.hashTriangle triangle)
            ]

Besides that fuzzing can find performance bottlenecks that would otherwise be missed (see also #3),
testing the benchmark (or benchmarking the test) gives free tests/benchmarks. It has happened to me that I made a mistake in the benchmark code and performance was looking way better than it actually was.

For completeness, here is my message from the slack

I've been incrementally optimizing an algorithm (for delaunay triangulation) using elm-benchmark and have some feedback.

My typical workflow for improving a function looks something like

  • write an improved (hopefully faster) version of the same function
  • write a test (often fuzz test) to check your new implementation is equivalent
  • write a benchmark to check that the new implementation is actually faster
  • remove the old code
  • remove the equivalency test (or maybe put the old version in the tests)
  • remove the benchmark

Most of these steps are tedious and repetitive.
Editor integration/code generation could help make this better, but another improvement would be to combine the benchmark and the test.
Additionally, for more complex functions, the performance between easy and difficult inputs can vary a lot, so benchmarking on a diverse set of inputs gives more accurate results. Is this something you've thought about?

Some other thoughts:

  • are the primitives available to write non-micro benchmarks?
  • UI that can start/cancel a particular benchmark (I'm not a big fan of running on pageload)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions