perf: optimize Uni.await() for known item/failure #2019

mkouba · 2025-11-28T17:18:09Z

optimize the case where the item/failure is known and subscription/synchronization is not needed

Background

In Quarkus, we often design non-blocking APIs with Mutiny. We also provide a blocking variant of the API that usually defaults to Uni.await().indefinitely(). However, UniBlockingAwait is fairly expensive in terms of performance.

It's not unusual that some parts of the API return Uni.createFrom().item(); i.e. if the default implementation is blocking.

In this pull request, we try to optimize this path and bypass the subscription/synchronization unless really needed.

I've created a simple benchmark for quarkus-flags: https://github.com/mkouba/quarkus-flags-benchmarks and the results are really good.

codecov · 2025-11-28T17:33:36Z

Codecov Report

❌ Patch coverage is 90.90909% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.16%. Comparing base (95dfed7) to head (215e727).
⚠️ Report is 5 commits behind head on main.

Files with missing lines	Patch %	Lines
.../main/java/io/smallrye/mutiny/groups/UniAwait.java	94.44%	1 Missing ⚠️
...mallrye/mutiny/operators/uni/UniBlockingAwait.java	50.00%	1 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #2019      +/-   ##
============================================
- Coverage     89.18%   89.16%   -0.03%     
- Complexity     3112     3122      +10     
============================================
  Files           412      412              
  Lines         13274    13295      +21     
  Branches       1684     1688       +4     
============================================
+ Hits          11838    11854      +16     
  Misses          812      812              
- Partials        624      629       +5

Files with missing lines	Coverage Δ
...rators/uni/builders/UniCreateFromKnownFailure.java	`100.00% <100.00%> (ø)`
...operators/uni/builders/UniCreateFromKnownItem.java	`100.00% <100.00%> (ø)`
.../main/java/io/smallrye/mutiny/groups/UniAwait.java	`96.00% <94.44%> (-4.00%)`	⬇️
...mallrye/mutiny/operators/uni/UniBlockingAwait.java	`86.04% <50.00%> (-9.20%)`	⬇️

... and 10 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jponge · 2025-11-28T18:34:02Z

Hi @mkouba, and thanks for looking into this.

Do you have any benchmark / test result to share here?

mkouba · 2025-12-01T08:40:46Z

Hi @mkouba, and thanks for looking into this.

Do you have any benchmark / test result to share here?

Yes, so there's a simple benchmark that basically tests Uni.createFrom().item(foo).await().indefinitely() (which is exactly the use case where a blocking implementation of an API returns Uni) and the results (for benchmark with throughput mode) look like:

RESULTS SUMMARY          |999-SNAPSHOT_2.9.5 (Base)|999-SNAPSHOT_999-SNAPSHOT
=========================|Score    |Error  |Diff   |Score    |Error  |Diff   
-------------------------|-------------------------|-------------------------
FlagComputeBenchmark     |     2569|     78|       |   118068|   3616| +4496%

999-SNAPSHOT_2.9.5 is using Mutiny 2.9.5. 999-SNAPSHOT_999-SNAPSHOT is using this patch.

Ofc, you should take these results with a grain of salt. I didn't use a dedicated machine etc. On the other hand, it's clear that if we avoid the allocations and synchronization in UniBlockingAwait#await() it should be much faster.

jponge · 2025-12-01T10:40:41Z

Thanks @mkouba (BTW I had a look at your JMH tests, they're correct).

My only real concern is that we'd be introducing a special case, but any extra operator like .map() will not get any boost, even if it's all a CPU-bound workload. Of course we could imagine further optimizations like RxJava or Reactor do, but we discussed that in https://inria.hal.science/hal-03409277/file/paper-author-version.pdf 😃

jponge · 2025-12-01T10:41:53Z

Just so I understand fully, can you please point me to some impacting places where we need that boost?

jponge · 2025-12-01T10:43:42Z

On the commit message, I'd use perf: as a prefix, see https://www.conventionalcommits.org/en/v1.0.0/#summary (the proposed set of prefixes here is not mandatory, but they are widespread and the tool we use to check the commit assumes these conventions)

mkouba · 2025-12-01T11:35:16Z

Thanks @mkouba (BTW I had a look at your JMH tests, they're correct).

My only real concern is that we'd be introducing a special case, but any extra operator like .map() will not get any boost, even if it's all a CPU-bound workload. Of course we could imagine further optimizations like RxJava or Reactor do, but we discussed that in https://inria.hal.science/hal-03409277/file/paper-author-version.pdf 😃

I do understand your concern, it's not nice at all 🤷.

Speaking of the referenced paper - I think that this optimization is more related to the reactive/imperative integration which can be perceived as a misuse, but it's IMO necessary for practical APIs.

Just so I understand fully, can you please point me to some impacting places where we need that boost?

Yes, so typically the API provides a method that returns Uni<Foo> ping(), but also a convenient method like Foo pingAndAwait(), or the users are instructed to call ping().await() if not using a reactive stream. However, ping() implementations are often blocking and so Uni.createFrom().item() is used instead. Now, this optimization is exactly about the use case where pingAndAwait() is called and the implementation is blocking, i.e. Uni.createFrom().item() is used.

If you're curious about real APIs, then for example HttpUpgradeCheck#perform() from quarkus-websockets-next, or InitialCheck#perform from the MCP server, or Flag#compute() that was used in the benchmark.

mkouba · 2025-12-01T12:00:46Z

I cannot reproduce the TCK failure locally 🤔.

jponge · 2025-12-01T13:56:56Z

I cannot reproduce the TCK failure locally 🤔.

Weird, it sometimes happens that CI runners get stuck, so timing checks get broken, etc

jponge · 2025-12-01T14:17:06Z

These CI failures do not make sense, especially as you haven't touched anything relevant to these.

jponge · 2025-12-01T14:18:34Z

@ozangunalp it looks like we have TCK failures in CI about pausable Multi 😉

ozangunalp · 2025-12-01T14:29:26Z

Oh well

ozangunalp · 2025-12-01T14:32:06Z

It is still the same NPE in queue.clear() . Maybe we need to clear inside the drain loop.

jponge · 2025-12-01T14:33:29Z

Return of the infamous drain loop 🤣

More seriously can I let you check and open a PR? There's no urgency, this isn't in a release yet.

jponge

I approve this, but I will always be cautious about such kind of protocol-bypass 😄

Let's merge after the TCK fix for pausable Multi is here.

mkouba · 2025-12-02T09:01:29Z

I approve this, but I will always be cautious about such kind of protocol-bypass 😄

Makes sense.

Let's merge after the TCK fix for pausable Multi is here.

👍

jponge · 2025-12-02T11:02:14Z

@mkouba it looks like your commit is not signed or the signature is not verified, and we need this (https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification#ssh-commit-signature-verification)

- optimize the case where the item/failure is known and subscription/synchronization is not needed

jponge self-assigned this Nov 28, 2025

mkouba force-pushed the uni-await-opt branch from 5ca9033 to cb747b2 Compare December 1, 2025 08:21

mkouba force-pushed the uni-await-opt branch from cb747b2 to f96838c Compare December 1, 2025 09:34

mkouba force-pushed the uni-await-opt branch from f96838c to e9b2044 Compare December 1, 2025 11:35

mkouba changed the title ~~optimization: Uni.await()~~ perf: optimize Uni.await() for known item/failure Dec 1, 2025

jponge mentioned this pull request Dec 1, 2025

feat: flag to control the Infrastructure executor shutdown on update #2020

Merged

jponge approved these changes Dec 1, 2025

View reviewed changes

ozangunalp mentioned this pull request Dec 2, 2025

fix: demand pauser clear queue inside drain loop to avoid concurrent read access #2021

Merged

mkouba force-pushed the uni-await-opt branch from 9dedb96 to 5116b62 Compare December 2, 2025 12:14

perf: optimize Uni.await() for known item/failure

215e727

- optimize the case where the item/failure is known and subscription/synchronization is not needed

mkouba force-pushed the uni-await-opt branch from 5116b62 to 215e727 Compare December 2, 2025 12:41

jponge merged commit 245cffe into smallrye:main Dec 2, 2025
8 checks passed

perf: optimize Uni.await() for known item/failure #2019

perf: optimize Uni.await() for known item/failure #2019

Conversation

mkouba commented Nov 28, 2025

Background

Uh oh!

codecov bot commented Nov 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jponge commented Nov 28, 2025

Uh oh!

mkouba commented Dec 1, 2025

Uh oh!

jponge commented Dec 1, 2025

Uh oh!

jponge commented Dec 1, 2025

Uh oh!

jponge commented Dec 1, 2025

Uh oh!

mkouba commented Dec 1, 2025

Uh oh!

mkouba commented Dec 1, 2025

Uh oh!

jponge commented Dec 1, 2025

Uh oh!

jponge commented Dec 1, 2025

Uh oh!

jponge commented Dec 1, 2025

Uh oh!

ozangunalp commented Dec 1, 2025

Uh oh!

ozangunalp commented Dec 1, 2025

Uh oh!

jponge commented Dec 1, 2025

Uh oh!

jponge left a comment

Choose a reason for hiding this comment

Uh oh!

mkouba commented Dec 2, 2025

Uh oh!

jponge commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Nov 28, 2025 •

edited

Loading

jponge commented Dec 2, 2025 •

edited

Loading