Skip to content

Add TCP network transport to ACPAgent#2156

Closed
simonrosenberg wants to merge 12 commits intomainfrom
feat/acp-agent-tcp-transport
Closed

Add TCP network transport to ACPAgent#2156
simonrosenberg wants to merge 12 commits intomainfrom
feat/acp-agent-tcp-transport

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Feb 20, 2026

Summary

  • Add TCP transport mode to ACPAgent so it can connect to an already-running ACP server over the network via asyncio.open_connection(host, port), as an alternative to the existing subprocess mode
  • Introduce acp_host and acp_port fields with a @model_validator ensuring exactly one transport (subprocess or TCP) is configured
  • Refactor _start_acp_server() to branch on transport mode and update _cleanup() to close the TCP writer

Motivation

When using APIRemoteWorkspace, the agent runs inside a remote K8s pod whose container image doesn't include Node.js or the ACP command. Rather than requiring custom container images, this lets ACPAgent connect to an ACP server running elsewhere on the network.

Test plan

  • 4 new validation tests (TCP creation, both-set rejection, neither-set rejection, host-without-port rejection)
  • 2 new serialization tests (TCP round-trip, dict deserialization)
  • 1 new init state test (TCP transport connects with mocked executor)
  • 3 new cleanup tests (TCP writer close, skip process in TCP mode, error handling)
  • All 63 ACP agent tests pass
  • All 2343 SDK tests pass (no regressions)
  • ruff check and ruff format --check clean

Note: This PR is stacked on feat/acp-agent-ask-agent.

🤖 Generated with Claude Code


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:9d49428-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-9d49428-python \
  ghcr.io/openhands/agent-server:9d49428-python

All tags pushed for this build

ghcr.io/openhands/agent-server:9d49428-golang-amd64
ghcr.io/openhands/agent-server:9d49428-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:9d49428-golang-arm64
ghcr.io/openhands/agent-server:9d49428-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:9d49428-java-amd64
ghcr.io/openhands/agent-server:9d49428-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:9d49428-java-arm64
ghcr.io/openhands/agent-server:9d49428-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:9d49428-python-amd64
ghcr.io/openhands/agent-server:9d49428-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:9d49428-python-arm64
ghcr.io/openhands/agent-server:9d49428-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:9d49428-golang
ghcr.io/openhands/agent-server:9d49428-java
ghcr.io/openhands/agent-server:9d49428-python

About Multi-Architecture Support

  • Each variant tag (e.g., 9d49428-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 9d49428-python-amd64) are also available if needed

simonrosenberg and others added 8 commits February 19, 2026 16:45
Add ACPAgent, an AgentBase subclass that delegates to ACP-compatible
servers (Claude Code, Gemini CLI, etc.) instead of direct LLM calls.
The ACP server manages its own LLM, tools, and execution lifecycle.

Closes #590

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Record per-turn token usage from PromptResponse.usage in step()
- Record incremental cost from UsageUpdate notifications in session_update()
- Yield sentinel LLM from get_all_llms() for telemetry pipeline
- Wire _llm_ref on client for dynamic metrics/telemetry access
- Trigger stats callback after each step for GUI updates
- Extract notification drain delay to configurable constant
- Add audit logging for auto-approved permission requests
- Add debug logging in cleanup for process/executor errors
- Wrap example in try/finally for robust cleanup
- Add 8 telemetry tests in TestACPAgentTelemetry

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ition

Add ACP_NOTIFICATION_DRAIN_DELAY env var override, yield to the event
loop before the timed sleep, and add a TODO for protocol-level fix.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix RequestPermissionResponse constructor (outcome, not result)
- Add proper signatures to Client protocol stub methods
- Narrow content[0] type in tests with isinstance checks
- Pass required args in fs/terminal stub tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Renumber example 36 -> 40 to avoid conflict with
  36_event_json_to_openai_messages.py
- Add type: ignore[reportMissingImports] to all acp imports (optional dep)
- Add TYPE_CHECKING import in __init__.py to satisfy pyright __all__ check
- Add @requires_acp skip marker for tests needing the acp package

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move acp from optional to required deps, removing:
- Lazy import guard in init_state
- type: ignore[reportMissingImports] comments
- @requires_acp skip markers on tests
- test_missing_acp_sdk_raises_import_error test

The __getattr__ in __init__.py is kept to avoid a circular import.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The ask_agent API (stateless question answering) failed for ACPAgent
because its sentinel LLM can't make real LLM calls. This uses the ACP
protocol's fork_session to create a temporary copy of the session,
prompt it, collect the response, and discard the fork — preserving
read-only semantics.

- Add AgentBase.ask_agent() default method (returns None → use LLM path)
- Override in ACPAgent with fork_session logic + token usage tracking
- Add fork text routing in _OpenHandsACPClient.session_update()
- Dispatch agent override in LocalConversation.ask_agent()
- Add 7 tests for ask_agent and fork text routing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ACPAgent currently requires spawning the ACP server as a local subprocess,
which needs Node.js installed in the container. When using APIRemoteWorkspace
with K8s pods that lack Node.js, this forces custom container images.

Add an alternative TCP transport mode where ACPAgent connects to an
already-running ACP server over the network via asyncio.open_connection().
The ClientSideConnection works unchanged since it accepts any
StreamReader/StreamWriter pair.

Transport is configured via mutually exclusive fields validated by a
model_validator: either acp_command (subprocess) or acp_host+acp_port (TCP).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 20, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   acp_agent.py3375085%114–115, 169–172, 174–175, 202, 204, 208, 214, 225–226, 231, 298, 363, 419–420, 457, 467, 472, 483–486, 540–542, 544, 546–548, 550–551, 559–560, 562–563, 567, 573–575, 577–578, 615, 619–620, 754–755
TOTAL18790564869% 

@enyst enyst requested a review from all-hands-bot February 20, 2026 19:44
@simonrosenberg simonrosenberg force-pushed the feat/acp-agent-ask-agent branch 5 times, most recently from af8df4e to 2f59de1 Compare February 20, 2026 23:14
Base automatically changed from feat/acp-agent-ask-agent to main February 21, 2026 09:43
simonrosenberg and others added 2 commits February 21, 2026 06:59
Resolve conflicts between TCP transport feature and upstream changes
(ACPAgent refactoring from #2133, ask_agent from #2145).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Demonstrates the real-world use case: ACPAgent connects to a remote ACP
server over TCP while running inside a K8s pod provisioned via the
Runtime API. The pod image doesn't need Node.js since the ACP server
runs as a separate network service.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions
Copy link
Contributor

github-actions bot commented Feb 21, 2026

API breakage checks (Griffe)

Result: Passed

Action log

@enyst enyst requested review from all-hands-bot and removed request for all-hands-bot February 22, 2026 20:31
print("\nDone!")


if __name__ == "__main__":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this and make it a simple script, like all others are? It may be easier

Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Good taste - Clean solution to a real problem.

Linus-Style Assessment:

Solving the right problem: Remote K8s pods without Node.js is a legitimate deployment scenario. The TCP transport mode is pragmatic and well-motivated.

Data structures: Adding optional fields with a validator ensuring mutual exclusivity is the right approach. No unnecessary complexity.

Implementation: The branching in _start_acp_server() is clean and straightforward. Nested async functions are localized and readable. Cleanup properly handles both modes.

Tests: Comprehensive coverage (validation, serialization, both transports, error cases). All 63 existing tests pass.

Verdict: ✅ Worth merging. This is quality work—approve and ship it.

@enyst
Copy link
Collaborator

enyst commented Feb 22, 2026

@OpenHands Fetch docs about ACP emerging standard from Zed Industries, so you understand what this PR, and ACP Agent here, are compatible with. Note that it’s ACP from Zed, not another similar acronym.

Do a /codereview-roasted on the PR. Don’t exaggerate, think before you speak. When I ask you to summarize, include the full review instead. Really, during this session, if I ask you to summarize, include the full review instead. Note that it will be rendered as markdown.

@openhands-ai

This comment was marked as outdated.

@openhands-ai

This comment was marked as duplicate.

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonrosenberg
Could you please show some video or something, to see how it works?

I seem to see that the user needs to know some URL of the ACP server? When/how is the ACP server spawned? Maybe it’s just me, or maybe we could have a little more instructions, if possible?

@enyst
Copy link
Collaborator

enyst commented Feb 22, 2026

(OpenHands-GPT-5.2)

ACP (Agent Client Protocol) docs (Zed Industries) — what this PR is “compatible with”

Authoritative sources:

Key points relevant to this PR:

  1. ACP is JSON-RPC 2.0 over a bidirectional transport

    • Messages are JSON-RPC 2.0 requests/responses/notifications.
    • Messages MUST be UTF-8 encoded.
  2. What transports ACP currently defines

    • The spec currently defines:
      • stdio transport (normative)
      • Streamable HTTP (explicitly marked as a draft proposal in progress)
    • Transport doc explicitly says agents/clients SHOULD support stdio whenever possible.
  3. stdio transport framing requirements (important)

    • Messages are delimited by newlines (\n).
    • Messages MUST NOT contain embedded newlines.
    • Agent MUST NOT write non-ACP bytes to stdout; logs go to stderr (though the real world doesn’t always comply).
  4. Custom transports are allowed, but they’re “bring your own framing + conventions”

    • The spec is transport-agnostic in principle, and says implementations MAY implement custom transports, as long as they preserve JSON-RPC message format/lifecycle.
    • But: TCP is not a standardized ACP transport today. So “ACP over TCP” is, by spec, a custom transport convention, not something you can assume other ACP implementations support out-of-the-box.

That’s the lens I’m using below: this PR is implementing a custom TCP transport that happens to reuse the stdio-style stream model.


/codereview-roasted — PR #2156 “Add TCP network transport to ACPAgent”

Taste Rating

🟡 Acceptable — solves a real deployment problem with minimal code, but it quietly invents “ACP over TCP” despite the spec not defining TCP transport, and it bakes in framing/behavior assumptions that deserve to be explicit.


[CRITICAL ISSUES]

  • [openhands-sdk/openhands/sdk/agent/acp_agent.py:283-294, 436-467] Compatibility claim vs spec reality
    • ACP spec defines stdio and (draft) Streamable HTTP transports. TCP is not defined.
    • This code is fine as a custom transport, but the PR currently reads like “ACP supports TCP transport mode” rather than “OpenHands supports a TCP socket connection using a stream-compatible ACP framing”.
    • Why it matters: “ACP server” in the ecosystem is typically a subprocess speaking stdio; if someone points acp_host/acp_port at a random “ACP-compatible agent”, odds are it won’t be listening on a TCP socket at all.
    • Actionable fix (conceptual, not bikeshedding): be explicit in docstrings/field descriptions/example that this requires a server that implements a custom TCP listener speaking ACP over a newline-delimited stream.

[IMPROVEMENT OPPORTUNITIES]

  • [acp_agent.py:450-466] Framing assumptions are implicit

    • The stdio transport section requires newline-delimited JSON without embedded newlines. This TCP mode implicitly relies on the same “readline() gives one JSON-RPC message” assumption (either in OpenHands code or the ACP python library).
    • If the remote side ever uses a different framing (length-prefix, HTTP, websockets, etc.), this will just hang or explode in parsing.
    • Actionable: document the framing expectation for TCP mode (“newline-delimited JSON-RPC messages, stdio-style framing”). Right now it’s tribal knowledge.
  • [acp_agent.py:452-466] No timeout / cancellation story for network connect

    • asyncio.open_connection() can hang unpleasantly (DNS, SYN retries, blackholed routes).
    • In subprocess mode, failure is fast-ish; in TCP mode, you can block agent initialization indefinitely.
    • Actionable: allow a configurable connect timeout (even a simple default like 5–30s), or route a timeout through AsyncExecutor.run_async(..., timeout=...).
  • [acp_agent.py:646-652] TCP cleanup closes writer but doesn’t wait for close

    • StreamWriter.close() is not the full close; await writer.wait_closed() is the actual “it’s closed” barrier.
    • You already have an async executor; not awaiting is a “works until it doesn’t” resource leak / flake source.
    • Actionable: if the writer is asyncio’s StreamWriter, do close() then await wait_closed() in an async cleanup path (best-effort, swallow errors like you already do).
  • [acp_agent.py:320-333] Transport validator is okay, but it’s not fully symmetric

    • You require acp_port when acp_host is set. Good.
    • But you don’t reject acp_port when acp_host is unset (so users can provide meaningless config and wonder why it’s ignored).
    • Also, there’s still no guard against acp_command=[] (empty list) which will later blow up at self.acp_command[0]. You touched validation; it’s the obvious place to make it impossible to create a time bomb.
    • Actionable: enforce:
      • acp_command is a non-empty list when set
      • acp_port must be unset unless acp_host is set (or at least warn/raise)
  • [acp_agent.py:450-467] TCP mode doesn’t reuse stdout “junk filtering”

    • Subprocess mode filters non-JSON-RPC stdout lines because real implementations sometimes violate the rules.
    • TCP mode assumes the remote side is clean; in practice, someone will put socat/netcat in front of a CLI and you’ll get the same garbage problem, just now over a socket.
    • Actionable: consider applying the same _filter_jsonrpc_lines concept to TCP reads too (or at least make the limitation explicit).
  • [examples/01_standalone_sdk/41_acp_agent_remote_example.py] The example is good, but it subtly overpromises

    • It talks about “ACP server running as a separate service on the network.”
    • Most ACP “agents” people know today are stdio subprocesses, not daemons. So readers may assume they can just run claude-code-acp somewhere and point at it—no, they need something that wraps it into a TCP server, or an actual daemon implementation.
    • Actionable: one sentence in prerequisites: “You need an ACP implementation that listens on TCP and speaks newline-delimited JSON-RPC (custom transport).”

[TESTING GAPS]

  • Tests added are generally pragmatic (validation, serialization, cleanup behavior). 👍
  • One missing angle that would catch real breakage:
    • A test that asserts TCP mode’s initialization path actually calls asyncio.open_connection(host, port) and wires reader/writer into ClientSideConnection correctly.
    • Current “TCP transport connects” test mostly verifies state assignment via a mocked executor return tuple. That’s fine for unit structure, but it won’t catch swapping reader/writer or calling the wrong initializer.

Not mandatory, but it’s the one place where a tiny regression would be painful.


VERDICT

Worth merging, provided everyone is clear this is a custom TCP transport convention (not ACP-standard transport today). The implementation is straightforward and the validation/tests are sensible; the main risk is user confusion + interoperability assumptions.

KEY INSIGHT

You didn’t “add TCP transport to ACP” — you added a TCP socket connection option that assumes stdio-style ACP stream framing, and that assumption needs to be explicit so users don’t misconfigure it and blame ACP.

- Simplify example script to flat style matching other examples
- Clarify TCP is a custom transport (not ACP-standard) in docstrings
- Document newline-delimited JSON-RPC framing expectations
- Add configurable connect timeout (default 30s) for TCP mode
- Use await writer.wait_closed() in TCP cleanup
- Reject acp_port without acp_host, guard against empty acp_command
- Apply _filter_jsonrpc_lines to TCP reads (same as subprocess mode)
- Add test asserting asyncio.open_connection is called correctly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@simonrosenberg simonrosenberg marked this pull request as draft February 23, 2026 18:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants