feat(view): Incremental view construction and conversation state view property#2141
feat(view): Incremental view construction and conversation state view property#2141
Conversation
…ties Replace @cached_property with a manually-invalidated cache for View.manipulation_indices. The cached_property was never invalidated when add_event or enforce_properties mutated self.events, causing stale indices to be returned. Now _invalidate_manipulation_indices() is called in every branch of add_event and enforce_properties that modifies the events list. Co-authored-by: openhands <openhands@all-hands.dev>
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
|
Hi! I started running the condenser tests on your PR. You will receive a comment with the results shortly. Note: These are non-blocking tests that validate condenser functionality across different LLMs. |
Condenser Test Results (Non-Blocking)
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📊 Summary
📋 Detailed Resultslitellm_proxy_anthropic_claude_opus_4_5_20251101
litellm_proxy_gpt_5.1_codex_max
Skipped Tests:
|
all-hands-bot
left a comment
There was a problem hiding this comment.
🟡 Acceptable - Solid design making View a first-class object. Manual cache invalidation is correct but fragile. No critical issues.
Key Insight: The real win here isn't performance—it's making View an explicit representation of the agent's attention window, accessible throughout the system. Good architectural move.
See inline comments for improvement opportunities.
| for property in ALL_PROPERTIES: | ||
| results &= property.manipulation_indices(self.events) | ||
|
|
||
| self._cached_manipulation_indices = results |
There was a problem hiding this comment.
🟠 Important: Manual cache invalidation is fragile. This works now, but it's easy to forget _invalidate_manipulation_indices() when adding new methods that modify self.events.
Suggestion: Consider documenting this invariant clearly, or use a different pattern like:
- Wrap
self.eventsin a custom list type that auto-invalidates on mutation - Use Pydantic's
@computed_field(cached=True)if the model config allows it
Not blocking—current implementation is correct—but this is a maintenance risk.
| Since enforcement is intended as a fallback to inductively maintaining the | ||
| properties via the associated manipulation indices, any time a property must be | ||
| enforced a warning is logged. | ||
|
|
||
| Modifies the view in-place. | ||
| """ | ||
| for property in ALL_PROPERTIES: | ||
| events_to_forget = property.enforce(current_view_events, all_events) | ||
| events_to_forget = property.enforce(self.events, all_events) | ||
| if events_to_forget: | ||
| logger.warning( | ||
| f"Property {property.__class__} enforced, " | ||
| f"{len(events_to_forget)} events dropped." | ||
| ) | ||
| return View.enforce_properties( | ||
| [ | ||
| event | ||
| for event in current_view_events | ||
| if event.id not in events_to_forget | ||
| ], | ||
| all_events, | ||
| ) | ||
| return current_view_events | ||
| self.events = [ | ||
| event for event in self.events if event.id not in events_to_forget | ||
| ] | ||
| self._invalidate_manipulation_indices() | ||
|
|
||
| @staticmethod | ||
| def from_events(events: Sequence[Event]) -> View: | ||
| """Create a view from a list of events, respecting the semantics of any | ||
| condensation events. | ||
| # If we've forgotten events to enforce the properties, we'll need to | ||
| # attempt to apply each property again. Once we get all the way through | ||
| # the properties without any kind of modification, we can exit the loop. | ||
| self.enforce_properties(all_events) |
There was a problem hiding this comment.
🟡 Suggestion: The recursive call with break is a bit harder to follow than necessary. The old tail-recursive pattern was clearer.
Consider making the recursion explicit:
def enforce_properties(self, all_events: Sequence[Event]) -> None:
"""Enforce all properties, recursively dropping events until stable."""
while True:
changed = False
for property in ALL_PROPERTIES:
events_to_forget = property.enforce(self.events, all_events)
if events_to_forget:
logger.warning(
f"Property {property.__class__} enforced, "
f"{len(events_to_forget)} events dropped."
)
self.events = [
event for event in self.events
if event.id not in events_to_forget
]
self._invalidate_manipulation_indices()
changed = True
break # Restart property checks
if not changed:
break # All properties satisfiedThis makes it clearer that we're iterating until a fixed point.
| logger.debug(f"Skipping non-LLMConvertibleEvent of type {type(event)}") | ||
|
|
||
| @staticmethod | ||
| def from_events(events: Sequence[Event]) -> View: |
There was a problem hiding this comment.
🟢 Nit: This incrementally builds the view, which is good. But there's no test coverage for the case where a non-LLMConvertibleEvent (that's not condensation-related) is added.
Consider adding a test that verifies these are silently skipped:
def test_add_event_skips_non_llm_convertible():
view = View()
view.add_event(ConversationStateUpdateEvent(...))
assert len(view.events) == 0Not critical, just helps document the behavior.
|
@OpenHands Do a /codereview-roasted on this PR. Post your review as a comment on the PR. Note that it will be rendered as markdown. Once you posted, take a step back and look with fresh eyes: re-read critically your review and the PR, and analyze every point of it, is it actually real, is it minor, is it major compared to what it seemed? Post your re-analysis as a new comment on the PR. |
|
I'm on it! enyst can track my progress at all-hands.dev |
/codereview-roastedTaste rating: 🔴 Needs improvementYou took a relatively simple data model ("the event log is the source of truth") and replaced it with two mutable sources of truth: That can be fine if you make it impossible to get out of sync. Right now it’s very possible, and the failure mode is the worst kind: silent wrong behavior. [CRITICAL ISSUES]
[IMPROVEMENT OPPORTUNITIES]
[TESTING GAPS]
VERDICT❌ Needs rework before I’d be comfortable calling this “safer/faster”. The concept (incremental view) is fine; the current implementation relies on “everyone always uses the right mutation path” and that’s not an engineering strategy. KEY INSIGHTYou didn’t add a |
Re-analysis of my /codereview-roasted (fresh eyes)Re-reading both the PR and my own comment, here’s what I think is actually important vs what’s more “defense-in-depth / taste”. Still major / real
If only one thing gets addressed, it should be this: either make Probably medium / depends on invariants
So: I still think a post-condensation sanity/enforcement hook is a good idea, but I’d downgrade this from “guaranteed breakage” to “cheap guardrail that prevents future weirdness / version skew issues”. Probably minor
NetMy original review wasn’t wrong about the core risk, but I was too harsh in implying the condensation enforcement change is necessarily a runtime bug. The API/invariant mismatch between |
This comment was marked as duplicate.
This comment was marked as duplicate.
|
Thank you for this! And sorry I didn’t get to review yet, brain frozen. 🙏 I would like to note here:
I have not yet seen issues with the local SDK alone, though I don’t know up to what number of events. This is O(n) though, I think that will show. Funny one was on V0, we had one call to That was last year, when to my knowledge, when conversations got to 2k events they qualified as “very long”. 😅 But I think we are not there anymore, by a… long shot. I think of 60k events as qualifying for long sessions now! 😂 [1] FWIW on V1 conversations I think max I had on the cloud was around ~2k, and sandboxes crashed, so for now I don’t know more. 😭 |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @csmith49, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
@OpenHands Tell me what this PR is doing, how it works, what it changes, and what you think of it? Respond to my question on github, using |
|
I'm on it! enyst can track my progress at all-hands.dev |
|
(OpenHands-GPT-5.2) What This PR DoesSummary: This PR introduces incremental construction and adds a persistent property to . How It Works
What I Think of ItThe Good:
The Concern (valid from the review):
The Verdict:
The performance "benefits" are negligible (as the PR acknowledges - we're 99% network-bound), but making first-class has long-term architectural value. |
This comment was marked as duplicate.
This comment was marked as duplicate.
enyst
left a comment
There was a problem hiding this comment.
Thank you! Let's do this!
|
[Automatic Post]: This PR seems to be currently waiting for review. @all-hands-bot @enyst, could you please take a look when you have a chance? |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @csmith49, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
2 similar comments
|
[Automatic Post]: It has been a while since there was any activity on this PR. @csmith49, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @csmith49, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
This has gone stale, closing in favor of a more incremental approach. See #2586 for the view construction logic -- the conversation state changes will be done in a separate PR. |
Summary
This PR updates the conversation state with a persistent
Viewobject that is incrementally updated as events are recorded.To support this,
Viewcreation has been refactored to be incremental. Events are added one at a time using theView.add_eventfunction, which updates the view in-place and handles condensation application, unhandled condensation request tracking, and so on.This should improve performance (we no longer need to rebuild a view from all events every time the agent takes a step) and make
Viewa first-class object accessible wherever the conversation state is.Design Decisions
Views are maintained by
ConversationStateobjects as a private field, and exposed as a property. This ensures the view is not serialized when the conversation state is saved. Instead, we rebuild the view from the whole list of events when the state is deserialized.The idea is to avoid accidentally serializing events outside the file system store we maintain, but there is the tradeoff that loading conversation states might be slightly more expensive. This is an easy decision to change should the need arise.
Performance "Benefits"
While reducing the number of
View.from_eventscalls will technically improve performance, in most use cases we expect it will be incredibly minor gains.While working on this PR I had a stack trace sampler monitoring the OpenHands ACP instance that was assisting me. On that trace, something like 99% of the samples from the
Agent.stepfunction were API calls to the LLM. Other traces have that number lower (to make room for critic calls and callback handlers), butViewconstruction has never been more than 0.5% of the duration ofAgent.step.Point is, we're hugely network-bound.
The real benefit of this PR is making
Viewa first-class object accessible to more of the system. It exposes precisely the list of events that are converted to messages and sent to the LLM, and so represents the agent's "attention window". That has to be helpful outside the condenser.Breaking Changes
View.condensationsfield removed.View.enforce_propertieschanged from a static method to an in-place modification of the calling view.prepare_llm_messagesno longer takes a list events as input. Instead of converting that list of events into a view it takes a view directly.Lastly, because
View.enforce_propertiesis only called when a view is constructed directly from a list of events (instead of being incrementally built), we now only enforce properties when conversation states are loaded from disk.Checklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:b6e828d-pythonRun
All tags pushed for this build
About Multi-Architecture Support
b6e828d-python) is a multi-arch manifest supporting both amd64 and arm64b6e828d-python-amd64) are also available if needed