A GAN-inspired autonomous coding harness built with the Claude Agent SDK. Takes a short prompt and autonomously builds a full-stack application (Next.js + .NET) using three specialized agents.
User Prompt → [Planner] → spec.md → [Generator] → app → [Evaluator/QA] → feedback
↑ │
└────────── fix round ◄──────────────┘
| Agent | Role | Tools |
|---|---|---|
| Planner | Expands 1-4 sentence prompt into full product spec | File I/O |
| Generator | Builds Next.js + .NET app from spec | File, Bash, Git |
| Evaluator | QA tests running app, grades against criteria | Playwright MCP |
# Install dependencies
npm install
# Set your API key
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY
# Install Playwright (for evaluator agent)
npx playwright install chromium
# Run the harness
npx tsx src/index.ts "Build a task management app with kanban boards"--output-dir <path> Output directory (default: ./output)
--artifacts-dir <path> Artifacts directory (default: ./artifacts)
--model <model> Claude model (default: claude-opus-4-5-20250918)
--max-rounds <n> Max QA rounds (default: 3)
--max-budget <usd> Max budget in USD (default: 50)
--api-key <key> API key (or set ANTHROPIC_API_KEY)
-
Planning — The planner agent takes your short prompt and expands it into an ambitious product spec with features, user stories, design direction, and AI-powered features.
-
Building — The generator agent reads the spec and builds the complete application — Next.js frontend + .NET backend — committing to git at milestones.
-
QA — The evaluator agent uses Playwright to interact with the running app like a real user. It grades against four criteria (Product Depth, Functionality, Visual Design, Code Quality) and files specific bugs with file/line references.
-
Iteration — If QA fails, the generator gets the evaluator's feedback and fixes issues. This loop repeats up to
--max-roundstimes.
- Frontend: Next.js 14+ (App Router), TypeScript, Tailwind CSS
- Backend: .NET 10, ASP.NET Core Web API, Entity Framework Core
- Database: SQLite (dev) / PostgreSQL (prod)
# Frontend design skill
npx skills add vercel-labs/agent-skills
# Nextjs best practices
npx skills add vercel-labs/next-skills --skill next-best-practices
# Playwright MCP (used by evaluator)
npx playwright install chromium- Harness Design for Long-Running Apps — Anthropic Engineering
- Claude Agent SDK — Official Docs
- DotNet Skills — .NET agent skills