Last updated: 2026-02-28 21:49 (UTC+8)
ClawNet is not an agent framework.
ClawNet is a multi-agent interaction kernel that makes agent interaction:
- Observable — interactions are typed intents, not raw text
- Replayable — every run is event-sourced and deterministic
- Attributable — failures produce responsibility chains
When a multi-agent system fails, ClawNet can precisely identify:
- Which agent
- At which interaction
- Violated which invariant
ClawNet does not make agents smarter. It makes agent interaction controllable, accountable, and evolvable.
Multi-agent systems are increasingly used to solve complex tasks, but most existing frameworks fail at a fundamental engineering requirement: when the system fails, we cannot explain why in an actionable way.
Today, multi-agent failures are typically described as:
- "the model hallucinated"
- "agents misunderstood each other"
- "the prompt needs tuning"
These explanations are not acceptable in production engineering, because they provide:
- no reproducible root cause
- no regression test
- no concrete corrective action
The core problem is not intelligence — it is observability. Multi-agent systems are distributed decision systems:
- state is fragmented across agents
- decisions are interdependent
- errors emerge late and indirectly
Yet most frameworks treat agent interaction as unstructured chat logs. This is equivalent to running a distributed system without tracing, replay, or postmortems.
Every mature engineering domain went through the same transition:
| Domain | Before | After |
|---|---|---|
| Web | ad-hoc requests | HTTP specification |
| Microservices | logs only | tracing & spans |
| Distributed systems | best effort | consensus & invariants |
| Multi-agent | prompt chaining | interaction kernel |
ClawNet does not add complexity — it exposes the complexity that already exists.
go run ./cmd/clawnetdYou should see:
- a small 3-agent run (
Planner,Worker,Verifier) - an intentional failure (
INSUFFICIENT_EVIDENCE) - a generated FailureReport with a responsibility chain
ClawNet - Phase 1 Engineering Skeleton
Running examples/simple_task ...
-----
Run finished in: 109.292us
FailureReport:
RunID = 4c2add4c5332f0a6...
TaskID = simple_task_v0
Reason = INSUFFICIENT_EVIDENCE
FailingAgent = worker
FailingIntent = CLAIM
MessageID = de620323edd9b6e1...
UpstreamChain = de620323... -> e23e2bba...
Explanation = Verifier challenged the latest CLAIM due to
missing or weak EVIDENCE references (v0 heuristic).
This tells you: Worker made a CLAIM without providing evidence refs. Verifier challenged it. ClawNet traced the responsibility chain back to the exact message where the failure originated.
ClawNet replaces free-form chat with typed intents. Every agent interaction must declare its semantic purpose:
// internal/protocol/intent.go
type IntentType string
const (
IntentRequest IntentType = "REQUEST" // Ask another agent to do something
IntentResponse IntentType = "RESPONSE" // Answer a request
IntentClaim IntentType = "CLAIM" // Assert a result (requires evidence)
IntentEvidence IntentType = "EVIDENCE" // Provide supporting data for a claim
IntentChallenge IntentType = "CHALLENGE" // Dispute a claim
IntentStatus IntentType = "STATUS" // Report progress
IntentDecision IntentType = "DECISION" // Advance the task state machine
)Each message carries structured metadata for tracing and attribution:
// internal/protocol/message.go
type Message struct {
MessageID string // Unique ID for this interaction
RunID string // Groups messages into a single run
TaskID string // The task being worked on
FromAgent string // Who sent this
ToAgents []string // Who receives this
Intent IntentType // Semantic purpose (not free text)
Payload any // Content (variable, doesn't affect system layer)
Confidence float64 // Agent's self-assessed confidence
Refs []string // Responsibility chain — links to prior messages
IdempotencyKey string
Timestamp time.Time
}The moat is Intent + Refs, not Payload. Intents make interactions computable; Refs make failures traceable.
Readers familiar with distributed tracing may ask: "Why not just use OpenTelemetry?"
| OpenTelemetry | ClawNet | |
|---|---|---|
| Design for | Passive observation of existing systems | Active enforcement of interaction structure |
| Data model | Generic spans and events | Typed intents with semantic meaning |
| Agent awareness | None — treats agents as black boxes | First-class — agents have roles, constraints, responsibility |
| Failure analysis | "Span X took 5s" | "Agent Y made CLAIM Z without evidence, violating invariant" |
| Replay | Not supported | Event-sourced strict/soft replay |
| Attribution | Latency & error rates | Causal responsibility chains |
OpenTelemetry answers: "What happened and how long did it take?" ClawNet answers: "Which agent, at which interaction, violated which invariant, and why?"
ClawNet can export to OTel-compatible backends for visualization, but its core value is the typed interaction protocol and causal attribution that OTel was never designed to provide.
clawnet/
├── cmd/clawnetd/main.go # Entry point
├── docs/
│ ├── ClawNet_Master_Design.md # Comprehensive design spec
│ ├── ClawNet_Engineering_Architecture_Map.md
│ └── ClawNet_Whitepaper.md
├── examples/
│ ├── simple_task/simple_task.go # Minimal 3-agent failure scenario
│ └── benchmark_task/README.md # Future benchmark spec
└── internal/
├── protocol/ # Intent types & message schema
│ ├── intent.go # 7 IntentTypes (REQUEST, RESPONSE, CLAIM, ...)
│ └── message.go # Message struct with refs for attribution
├── eventstore/ # Append-only event log
│ ├── event.go # Event struct & 5 EventTypes
│ └── store.go # InMemoryStore (v0, thread-safe)
├── task/ # Task state machine
│ ├── state.go # 6 states: INIT→PLANNING→EXECUTING→VERIFYING→SUCCESS/FAILED
│ └── transition.go # FSM validation & Apply()
├── constraints/ # Resource constraints (rounds, messages, budget)
│ └── constraints.go
├── kernel/ # Central orchestration
│ └── kernel.go # Message routing, state transitions, event emission
└── attribution/ # Failure analysis
└── failure.go # FailureReport & 8 FailureReasons
CREATE TABLE traces (
trace_id UUID PRIMARY KEY,
created_at TIMESTAMPTZ DEFAULT now(),
total_token_in BIGINT DEFAULT 0,
total_token_out BIGINT DEFAULT 0,
total_cost_usd NUMERIC(18,8) DEFAULT 0,
total_latency_ms BIGINT DEFAULT 0,
max_depth INT DEFAULT 0,
fanout_max INT DEFAULT 0,
cycle_count INT DEFAULT 0
);CREATE TABLE interactions (
interaction_id UUID PRIMARY KEY,
trace_id UUID REFERENCES traces(trace_id),
parent_interaction_id UUID,
from_type TEXT,
from_id TEXT,
to_type TEXT,
to_id TEXT,
kind TEXT,
status TEXT,
ts_start TIMESTAMPTZ,
ts_end TIMESTAMPTZ,
token_in BIGINT DEFAULT 0,
token_out BIGINT DEFAULT 0,
cost_usd NUMERIC(18,8) DEFAULT 0,
latency_ms BIGINT DEFAULT 0,
tags JSONB DEFAULT '{}',
payload_id UUID
);type Tracer struct {
exporter Exporter
}
func (t *Tracer) StartTrace(ctx context.Context, name string, tags map[string]any) (context.Context, *Trace)
func (t *Tracer) StartSpan(ctx context.Context, kind string, from, to Endpoint, tags map[string]any) (context.Context, *Span)type Span struct {
InteractionID string
TraceID string
ParentID *string
}
func (s *Span) SetTokens(in, out int64)
func (s *Span) SetCostUSD(cost float64)
func (s *Span) SetStatus(status string)
func (s *Span) End(err error)type Exporter interface {
EmitInteraction(interaction Interaction) error
EmitBatch(interactions []Interaction) error
Close() error
}Accepts a batch of interactions.
Returns:
- full interaction list
- computed metrics (cost, depth, fan-out, cycles)
| Order | Phase | Name | Status |
|---|---|---|---|
| 1st | Phase 4 | Market Validation | NOT STARTED |
| 2nd | Phase 1 | Interaction Trace MVP | IN PROGRESS — Engineering Skeleton Complete |
| 3rd | Phase 3 | Formal Specification / Whitepaper | NOT STARTED |
| 4th | Phase 2 | Fundraising Narrative | NOT STARTED |
- Phase 4 first — Validate pain before building infrastructure. Confirm that debugging LLM workflows is a real, unsolved problem.
- Phase 1 second — Build the minimal working product that solves the validated pain.
- Phase 3 third — Formalize what we've learned into a specification, giving ClawNet academic and engineering credibility.
- Phase 2 last — With validation + product + spec in hand, fundraise with evidence, not promises.
Goal: Prove that multi-agent observability is a real, paying pain point before writing more infrastructure code.
- Identify 15-20 teams actively building multi-agent / LLM-orchestration systems
- Conduct structured interviews (30 min each) focusing on:
- How do they debug when an agent chain fails?
- What tools do they use today? (logs? LangSmith? custom?)
- How much time/money is lost on undiagnosable failures?
- Document recurring themes and pain severity scores
- Classify discovered pains into categories:
- Behavior observability — "I can't see what happened"
- Loop / fan-out detection — "Agents run in circles and burn tokens"
- Cost attribution — "I don't know which agent costs the most"
- Failure traceability — "I can't reproduce the bug"
- Governance / compliance — "I need audit trails for AI decisions"
- Rank by frequency and severity
- Validate assumption: "The first real pain is behavior observability, not multi-agent legality"
- Map existing solutions (LangSmith, Arize, Helicone, OpenTelemetry, custom solutions)
- Identify gaps: what do they NOT solve for multi-agent interaction?
- Define ClawNet's unique wedge: interaction-level tracing with causal attribution
- From interviews, identify 3-5 teams willing to be early design partners
- Define success criteria with each partner (what would make them use/pay for ClawNet?)
- Establish feedback loop for Phase 1 MVP testing
- Problem Discovery Report (interview summaries + patterns)
- Pain Point Matrix (ranked by frequency x severity)
- Competitive Gap Analysis
- List of 3-5 design partners with success criteria
Goal: Build a minimal, working interaction trace system that records, reconstructs, and analyzes multi-agent interactions.
The following components are fully implemented (~459 lines of Go):
| Component | Status | Description |
|---|---|---|
protocol/intent.go |
Done | 7 IntentTypes (REQUEST, RESPONSE, CLAIM, EVIDENCE, CHALLENGE, STATUS, DECISION) |
protocol/message.go |
Done | Message schema with refs for attribution chain |
eventstore/event.go |
Done | Event struct & 5 EventTypes |
eventstore/store.go |
Done | InMemoryStore (append-only, thread-safe) |
task/state.go |
Done | Task struct with 6 states |
task/transition.go |
Done | FSM validation with allowed transitions |
constraints/constraints.go |
Done | ConstraintSet (rounds, messages, budget, deadline) |
kernel/kernel.go |
Partial | Message routing + state transitions (only DECISION handled) |
attribution/failure.go |
Partial | FailureReport (only INSUFFICIENT_EVIDENCE implemented) |
examples/simple_task |
Done | Working 3-agent demo scenario |
- Handle all intent types in kernel (not just DECISION)
- Emit
MESSAGE_RECEIVEDevents (currently defined but unused) - Integrate
BUDGET_UPDATEDevent emission - Enforce budget constraint in
checkConstraints()(currently tracked but not enforced) - Implement all 8 FailureReasons in attribution:
-
TIMEOUT— deadline exceeded -
BUDGET_EXCEEDED— cost limit hit -
CONFLICTING_CLAIMS— agents disagree on facts -
TOOL_ERROR— external tool failure -
INVALID_STATE— illegal state transition attempted -
LOOP_DETECTED— circular interaction pattern -
POLICY_VIOLATION— governance rule broken
-
- Implement
TracerSDK (StartTrace / StartSpan API) - Implement
Spanwith token, cost, and status tracking - Implement
Exporterinterface with at least two backends:-
StdoutExporter— for local development -
HTTPExporter— for collector ingestion
-
- Record LLM calls and tool calls as spans
- Reconstruct parent-child relationships (interaction graph)
- Implement
POST /ingestendpoint (accept interaction batches) - Implement
GET /traces/{trace_id}endpoint (return full trace + metrics) - Compute trace-level metrics:
- Total tokens (in/out)
- Total cost (USD)
- Total latency
- Max interaction depth
- Max fan-out
- Cycle count
- Replace InMemoryStore with Postgres-backed store
- Implement
tracestable (schema above) - Implement
interactionstable (schema above) - Add database migrations
- Maintain backward compatibility with InMemoryStore for testing
- Detect loops (agent A → B → A cycles)
- Detect excessive fan-out (one agent spawning too many parallel interactions)
- Cost attribution per agent / per interaction path
- Alert / flag on anomalous patterns
- Implement strict replay — event-only, fully deterministic
- Implement soft replay — replay structure with live LLM calls
- Replay CLI:
clawnet replay <trace_id> [--mode strict|soft]
- Implement
benchmark_taskexample (multi-round, multi-agent stress test) - Add unit tests for all packages
- Add integration test: full trace → ingest → query round-trip
- Add failure scenario tests for each FailureReason
- Working Tracer SDK (Go)
- Collector service with REST API
- Postgres persistence
- Loop / fan-out / cost detection
- Replay (strict + soft)
- Comprehensive test suite
Goal: Formalize the interaction model into a rigorous specification. This gives ClawNet academic credibility and makes the protocol implementable by third parties.
- Define formal grammar for interaction types (intent taxonomy)
- Specify message envelope format (fields, types, constraints)
- Define interaction graph as a DAG with typed edges
- Specify causal ordering guarantees
- Define idempotency semantics
- Formalize how context propagates through interaction chains
- Define trace context format (analogous to W3C Trace Context for agents)
- Specify baggage propagation rules
- Define sampling strategies for high-volume systems
- Classify interaction patterns:
- Sequential — A → B → C
- Fan-out — A → {B, C, D}
- Fan-in — {B, C, D} → A
- Loop — A → B → A (with termination conditions)
- Delegation — A → B (with authority transfer)
- Challenge-Response — A claims, B verifies
- Define pattern detection algorithms
- Specify pattern-level invariants (e.g., "fan-out must converge")
- Define policy language for interaction constraints
- Specify role-based interaction permissions
- Define audit trail requirements
- Specify compliance reporting format
- Write formal whitepaper covering:
- Problem statement (with data from Phase 4 interviews)
- Interaction model (from 3.1)
- Architecture (from Phase 1 implementation)
- Evaluation (from Phase 1 benchmark results)
- Comparison with existing approaches
- Peer review with 2-3 distributed systems researchers
- Publish (arXiv or similar)
- Interaction Model Specification (versioned document)
- Pattern Taxonomy Reference
- Governance Policy Language Spec
- Published Whitepaper
Goal: Position ClawNet as "the control plane for AI agent ecosystems" with validated evidence.
- Craft core positioning: ClawNet = the control plane for AI agent ecosystems
- Build story arc:
- The problem (from Phase 4 market validation data)
- The insight (interaction observability, not agent intelligence)
- The product (from Phase 1 working MVP)
- The moat (from Phase 3 formal specification)
- Prepare one-liner, elevator pitch, and full narrative versions
- Compile design partner testimonials / case studies
- Document quantitative results:
- Debug time reduction (before/after ClawNet)
- Cost savings from loop/fan-out detection
- Failure attribution accuracy
- Prepare live demo script (trace a multi-agent failure end-to-end)
- Size the TAM/SAM/SOM for multi-agent observability tooling
- Define pricing model (usage-based? seat-based? hybrid?)
- Identify go-to-market strategy:
- Open-source core + commercial cloud
- Developer-first adoption (bottom-up)
- Enterprise sales (top-down)
- Pitch deck (12-15 slides)
- Financial model (projections, unit economics)
- Technical deep-dive appendix (for technical investors)
- Demo environment (hosted, one-click)
- Build target investor list (AI infra, developer tools, enterprise SaaS)
- Warm introductions through design partners and advisors
- Fundraising timeline and process management
- Pitch deck
- Financial model
- Live demo environment
- Evidence package (testimonials + metrics)
- Target investor list
WE ARE HERE
|
v
Phase 4: Market Validation .............. [ NOT STARTED ]
|
v
Phase 1: Interaction Trace MVP .......... [ IN PROGRESS — Skeleton Complete ]
| ~459 lines of Go
| Core: protocol, eventstore, task FSM,
| kernel, attribution
| Working demo: 3-agent failure scenario
| Next: Tracer SDK, Collector API, Postgres,
| pattern detection, replay
v
Phase 3: Formal Specification ........... [ NOT STARTED ]
|
v
Phase 2: Fundraising Narrative .......... [ NOT STARTED ]
ClawNet does not add complexity.
It exposes the complexity that already exists.