Expire stale pending plan questions on cold app startup#862
Expire stale pending plan questions on cold app startup#862marvin-bitterlich wants to merge 2 commits intopingdotgg:mainfrom
Conversation
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| left: OrchestrationThreadActivity, | ||
| right: OrchestrationThreadActivity, | ||
| ): number { | ||
| if (left.sequence !== undefined && right.sequence !== undefined) { |
There was a problem hiding this comment.
🟢 Low src/pendingUserInput.ts:19
The compareActivitiesByOrder function sorts any activity with a sequence property after activities without one, which breaks chronological ordering in derivePendingUserInputs. For example, if a user-input.requested has a sequence but the later user-input.resolved does not (common for async acknowledgments), the resolution sorts before the request. The function processes the resolution first (no-op), then the request, leaving the input stuck as "pending" instead of "resolved".
Consider using createdAt as the primary sort key and only using sequence as a tiebreaker when both activities have it, or document if the current behavior is intentional for a specific ordering requirement.
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file packages/shared/src/pendingUserInput.ts around line 19:
The `compareActivitiesByOrder` function sorts any activity with a `sequence` property *after* activities without one, which breaks chronological ordering in `derivePendingUserInputs`. For example, if a `user-input.requested` has a `sequence` but the later `user-input.resolved` does not (common for async acknowledgments), the resolution sorts before the request. The function processes the resolution first (no-op), then the request, leaving the input stuck as "pending" instead of "resolved".
Consider using `createdAt` as the primary sort key and only using `sequence` as a tiebreaker when both activities have it, or document if the current behavior is intentional for a specific ordering requirement.
Evidence trail:
packages/shared/src/pendingUserInput.ts lines 16-29: The `compareActivitiesByOrder` function returns 1 when `left.sequence !== undefined` and `right.sequence === undefined` (line 22), placing activities with sequence AFTER those without.
packages/shared/src/pendingUserInput.ts lines 77-102: `derivePendingUserInputs` uses this sort order (line 79), then processes activities sequentially - adding to map for 'user-input.requested' (lines 88-96) and deleting for 'user-input.resolved' (lines 98-100). If resolved sorts before requested due to the sequence logic, the delete is a no-op and the request gets added after, leaving the input incorrectly marked as pending.
There was a problem hiding this comment.
Good catch. This ordering logic was extracted from the existing web-side derivePendingUserInputs implementation into the new shared helper, so the issue is not unique to the new cold-start path, but this PR does make it shared between web and server and is the right place to fix it.
I’m going to update the comparator so createdAt remains the primary ordering key, with sequence only used as a tiebreaker when both activities have it, and add a regression test for mixed sequenced / unsequenced request/close activities.
18a75b5 to
1808699
Compare
|
hmm as long as we send the right |
That was my first thought too, and I might be missing something about how Codex handles resumed pending prompts. From what I can tell, we already pass So my current assumption is:
But I may be missing a Codex guarantee there, so if resumed sessions are supposed to replay those pending requests, I’m happy to switch direction. Most of my work has been with Claude Code which does not resume pending tool calls at all so I presumed Codex does the same 😅 |
1808699 to
c521a68
Compare
|
(Did a force push to actually start signing my commits again on this device) |
Closes #856
Why
If T3 Code is fully closed while a thread is waiting on a plan-mode /
request_user_inputquestion, reopening the app can leave that thread stuck.The pending question is reconstructed from persisted thread activity, but the live provider-side request context does not survive a full app restart. As a result, the UI can show a question that no longer has any valid way to be answered.
The important part here is when to recover from that state. We did not want to wait until the user tries to submit an answer, because by that point they have already spent time reading the question and thinking through a response. The correct time to reconcile this mismatch is when the app cold-starts and resumes persisted state, so the UI is already in a clean state when the user returns to the thread.
How
This adds a dedicated cold-start lifecycle step to orchestration startup.
On startup, before the normal orchestration reactors begin processing, the server:
user-input.expiredactivityThis also extracts the existing pending user-input derivation logic into shared runtime code so the server and web use the same rules for deciding whether a prompt is still open. That keeps the new cold-start reconciliation behavior aligned with the existing UI state model instead of introducing a second copy of that logic.
Decisions
This is handled during cold-start reconciliation, not when the user submits an answer.
Expiring the stale prompt only after the user interacts would create a poor recovery flow and would surface the problem at the latest possible moment.
This lives in a dedicated cold-start lifecycle, not in
ProviderCommandReactor.The goal was not to add a one-off abstraction, but to give startup-only repair logic a clear home that is separate from normal event reaction. This keeps the boundary explicit and makes future cold-start cleanup work easier to add in the same place.
The fix is intentionally scoped to cold app restart.
The reported bug is specifically about reopening the app with persisted state from a previous process. Broader in-process session-loss recovery can be handled separately if needed, but it is a different lifecycle boundary.
Stale prompts are expired, not resolved.
After restart, the original provider-side request context is gone, so marking the prompt as resolved would imply a successful interaction that never actually happened.
This is a server-side state fix, not a UI-only workaround.
Hiding the stale prompt in the UI would leave the persisted orchestration state inconsistent with what the user sees.
Note
Expire stale pending plan questions on cold app startup
ColdStartLifecycleservice that runs at orchestration reactor startup, before other reactors, and expires any pendinguser-input.requestedprompts by appending auser-input.expiredactivity with reasonserver-restart.derivePendingUserInputslogic intopackages/shared/src/pendingUserInput.ts, which now includesturnIdin results and treatsuser-input.expiredas a closing event.derivePendingUserInputsnow delegates to the shared implementation, so expired prompts are no longer shown as pending in the UI.Macroscope summarized c521a68.