Skip to content

StormHub/Agents.Code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Agents.Code — Autonomous 3-Agent Coding Harness

A GAN-inspired autonomous coding harness built with the Claude Agent SDK. Takes a short prompt and autonomously builds a full-stack application (Next.js + .NET) using three specialized agents.

Architecture

User Prompt → [Planner] → spec.md → [Generator] → app → [Evaluator/QA] → feedback
                                          ↑                                    │
                                          └────────── fix round ◄──────────────┘
Agent Role Tools
Planner Expands 1-4 sentence prompt into full product spec File I/O
Generator Builds Next.js + .NET app from spec File, Bash, Git
Evaluator QA tests running app, grades against criteria Playwright MCP

Quick Start

# Install dependencies
npm install

# Set your API key
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY

# Install Playwright (for evaluator agent)
npx playwright install chromium

# Run the harness
npx tsx src/index.ts "Build a task management app with kanban boards"

Options

--output-dir <path>     Output directory (default: ./output)
--artifacts-dir <path>  Artifacts directory (default: ./artifacts)
--model <model>         Claude model (default: claude-opus-4-5-20250918)
--max-rounds <n>        Max QA rounds (default: 3)
--max-budget <usd>      Max budget in USD (default: 50)
--api-key <key>         API key (or set ANTHROPIC_API_KEY)

How It Works

  1. Planning — The planner agent takes your short prompt and expands it into an ambitious product spec with features, user stories, design direction, and AI-powered features.

  2. Building — The generator agent reads the spec and builds the complete application — Next.js frontend + .NET backend — committing to git at milestones.

  3. QA — The evaluator agent uses Playwright to interact with the running app like a real user. It grades against four criteria (Product Depth, Functionality, Visual Design, Code Quality) and files specific bugs with file/line references.

  4. Iteration — If QA fails, the generator gets the evaluator's feedback and fixes issues. This loop repeats up to --max-rounds times.

Target Stack

  • Frontend: Next.js 14+ (App Router), TypeScript, Tailwind CSS
  • Backend: .NET 10, ASP.NET Core Web API, Entity Framework Core
  • Database: SQLite (dev) / PostgreSQL (prod)

Skills & Plugins

# Frontend design skill
npx skills add vercel-labs/agent-skills

# Nextjs best practices
npx skills add vercel-labs/next-skills --skill next-best-practices

# Playwright MCP (used by evaluator)
npx playwright install chromium

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors