Avoid suspending ephemeral agents by vigoo · Pull Request #3339 · golemcloud/golem

vigoo · 2026-05-05T15:00:38Z

Resolves #3327

Golem suspends agents in various situations and before this PR, it did so for both durable and ephemeral agents equally. Ephemeral agents cannot recover from being suspended though. This PR changes how each of these suspend cases behave for ephemeral agents:

Short sleeps (but above the configured suspend threshold): ephemeral agents now sleep in-process when the requested sleep is within suspend.ephemeral_max_sleep
Long sleeps: if an ephemeral agent sleeps longer than suspend.ephemeral_max_sleep, the invocation fails with EphemeralSleepTooLong
Promise waits / all pollables blocked: ephemeral agents no longer take the durable-worker “all promise-backed pollables are blocked → suspend” shortcut. They wait in-process until the pollable becomes ready.
Fuel exhaustion: ephemeral agents no longer suspend when account fuel cannot be borrowed mid-invocation. If allowed, they continue using bounded local overdraft; otherwise the invocation fails with EphemeralFuelExhausted.
Fuel overdraft accounting: when an ephemeral invocation uses local overdraft, only the actually consumed overdraft is recorded as account debt at invocation end.
Monthly HTTP/RPC budget exhaustion: ephemeral agents fail immediately with EphemeralCannotSuspend instead of suspending until budget replenishment.
Quota throttling: ephemeral agents fail immediately with EphemeralCannotSuspend instead of suspending, and no resume action is scheduled for them.
External interrupt: interrupting an ephemeral agent still interrupts the running invocation; the CLI now warns/asks for confirmation because the agent cannot be resumed afterward.

The idea for failing on quota exhaustion is that it's the caller's responsibility to retry. If we would allow these ephemeral agents to run, and block them in-memory until they can resume, that could exhaust other resources (as they have to stay in memory for a potentially long time and so on)

thesparq · 2026-05-05T17:47:11Z

Pls, does this mean that, ephemeral agents can't do promises?, for instance, if want an agent that is created, does its work and maybe during the course of its work creates a promise which requires the user's action, let's assume the user delays for hours in which the agent suspends without consuming resources, the user later fulfills the promise, it resumes, completes the work and then dies. An example of this would be a multi step form or survey agent which does not need to be completely durable or everlasting, just durable for the lifetime of the form or survey, where the form the form state has to be saved on the backend (the agent) during the duration of the form and when the form is submitted, the ephemeral agent dies off.

vigoo · 2026-05-06T09:52:15Z

Pls, does this mean that, ephemeral agents can't do promises?, for instance, if want an agent that is created, does its work and maybe during the course of its work creates a promise which requires the user's action, let's assume the user delays for hours in which the agent suspends without consuming resources, the user later fulfills the promise, it resumes, completes the work and then dies. An example of this would be a multi step form or survey agent which does not need to be completely durable or everlasting, just durable for the lifetime of the form or survey, where the form the form state has to be saved on the backend (the agent) during the duration of the form and when the form is submitted, the ephemeral agent dies off.

Ephemeral agents are for short lived, non-durable scenarios, such as a request handler etc. They don't have any durability guarantees, so in case of any crash / rebalancing / etc they just fail (to finish). Because of this, they should not be used for anything long like waiting for an external promise. What you are looking for is the durable agent, I think there is a misunderstanding of what durable vs ephemeral agents differ in.

Both durable and ephemeral agents write to the oplog and leave a "trace" that can be observed later with tools like golem agent oplog etc. Ephemeral agents are more lazily writing this oplog to achieve higher performance, so in case they don't finish due to a restart etc (like mentioned above) it is possible that not every entry is written out yet. That's a trade-off.

Even though ephemeral agents write an oplog for observability reasons, they never can be recover state like durable agents. That's the difference. If an ephemeral agent goes out of memory, it's gone forever. Also an ephemeral agent can be invoked only once (technically they can be invoked more than once, but they start from empty state every time, and it does not make much sense to do so).

Note that the following features are available for durable agents, that may cover what you are looking for:

Idle agents are suspended. Whenever an agent is waiting for something, or it is not having anything in its invocation queue, it can go out of memory, not consuming any further resources. The only "cost" they have is storage space (but note that is completely the same for ephemeral agent as well).
Ephemeral agents are always phantom agents by default, which means they get an auto-generated uuid as part of their identity (beside the constructor parameters). But the phantom agent feature can also be used for durable agents, if needed.

mschuwalow · 2026-05-06T10:28:05Z

+                if record_ephemeral_promise_wait {
+                    inc_promise_waiting();
+                }
                let either_result = futures::future::select(poll, interrupt_signal).await;


We might need to introduce some timeout here, to avoid polluting the executor with ephemeral agents that are blocked on promises that are never getting completed

noise64 · 2026-05-06T10:35:34Z

                let poll = Host::poll(&mut io_data, in_);
                pin_mut!(poll);

+                if record_ephemeral_promise_wait {


is this guarenteed to be balanced?

Probably not in case of a crashing executor

thesparq · 2026-05-06T15:42:17Z

Pls, does this mean that, ephemeral agents can't do promises?, for instance, if want an agent that is created, does its work and maybe during the course of its work creates a promise which requires the user's action, let's assume the user delays for hours in which the agent suspends without consuming resources, the user later fulfills the promise, it resumes, completes the work and then dies. An example of this would be a multi step form or survey agent which does not need to be completely durable or everlasting, just durable for the lifetime of the form or survey, where the form the form state has to be saved on the backend (the agent) during the duration of the form and when the form is submitted, the ephemeral agent dies off.

Ephemeral agents are for short lived, non-durable scenarios, such as a request handler etc. They don't have any durability guarantees, so in case of any crash / rebalancing / etc they just fail (to finish). Because of this, they should not be used for anything long like waiting for an external promise. What you are looking for is the durable agent, I think there is a misunderstanding of what durable vs ephemeral agents differ in.

Both durable and ephemeral agents write to the oplog and leave a "trace" that can be observed later with tools like golem agent oplog etc. Ephemeral agents are more lazily writing this oplog to achieve higher performance, so in case they don't finish due to a restart etc (like mentioned above) it is possible that not every entry is written out yet. That's a trade-off.

Even though ephemeral agents write an oplog for observability reasons, they never can be recover state like durable agents. That's the difference. If an ephemeral agent goes out of memory, it's gone forever. Also an ephemeral agent can be invoked only once (technically they can be invoked more than once, but they start from empty state every time, and it does not make much sense to do so).

Note that the following features are available for durable agents, that may cover what you are looking for:
* Idle agents are suspended. Whenever an agent is waiting for something, or it is not having anything in its invocation queue, it can go out of memory, not consuming any further resources. The only "cost" they have is storage space (but note that is completely the same for ephemeral agent as well).

* Ephemeral agents are always _phantom agents_ by default, which means they get an auto-generated uuid as part of their identity (beside the constructor parameters). But the phantom agent feature can also be used for durable agents, if needed.

i now understand you better, i guess what i was looking for was an auto delete durable agent 😂, so its durable but once it finishes it deletes itself. this means for my use cases, i need to use durable agents but somehow deletes them after because they only serve one particular purpose so it wont be wise to just keep them around, consuming storage when i know it will never be invoked again.

vigoo added 3 commits May 5, 2026 16:56

Avoid suspending ephemeral agents

d2d2946

Fix

7e60d3c

Merge branch 'main' into ephemeral-no-suspend

9e2463f

mschuwalow approved these changes May 6, 2026

View reviewed changes

mschuwalow reviewed May 6, 2026

View reviewed changes

noise64 reviewed May 6, 2026

View reviewed changes

vigoo added 2 commits May 7, 2026 11:31

Fixes

c82c7b9

Merge branch 'main' into ephemeral-no-suspend

bfd8623

vigoo merged commit ef7c1fc into main May 7, 2026
51 checks passed

vigoo deleted the ephemeral-no-suspend branch May 7, 2026 11:17

github-actions Bot locked and limited conversation to collaborators May 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid suspending ephemeral agents#3339

Avoid suspending ephemeral agents#3339
vigoo merged 5 commits intomainfrom
ephemeral-no-suspend

vigoo commented May 5, 2026

Uh oh!

thesparq commented May 5, 2026

Uh oh!

vigoo commented May 6, 2026

Uh oh!

mschuwalow May 6, 2026

Uh oh!

noise64 May 6, 2026

Uh oh!

vigoo May 6, 2026

Uh oh!

thesparq commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

vigoo commented May 5, 2026

Uh oh!

thesparq commented May 5, 2026

Uh oh!

vigoo commented May 6, 2026

Uh oh!

mschuwalow May 6, 2026

Choose a reason for hiding this comment

Uh oh!

noise64 May 6, 2026

Choose a reason for hiding this comment

Uh oh!

vigoo May 6, 2026

Choose a reason for hiding this comment

Uh oh!

thesparq commented May 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants