Vibetest takes a vibecoding app chat transcript, extracts formal functional requirements, tests them using Browser Use, and reports back. You can then pass this report to your vibe-coding agent to create a meta-agentic loop.
video-demo.mp4
Q-NCLC/
├── data/
│ ├── test-cases/ # Input test cases (URL + chat transcripts)
│ ├── results/ # Output logs from test runs
│ └── legacy/ # Old experiment files
└── packages/
├── agent1/ # UX task extraction from conversations
├── agent2/ # Browser-based UX testing
├── agent3/ # Test results grouping and analysis
├── vibetester/ # Pipeline orchestrating Agent 1 + Agent 2 + Agent 3
└── shared/ # Shared utilities (logging, LLM providers)The system consists of three agents working in a pipeline:
- Agent 1 (Requirement Extractor): Analyzes natural language conversations between a user and a coding assistant to extract formal testing requirements. It produces a structured list of atomic test steps. Uses DSPy for prompt optimization with few-shot learning.
- Agent 2 (Browser Tester): Takes the test steps from Agent 1 and executes them in a real browser environment to verify the application's behavior. It acts as an automated user and generates individual test results for each atomic step.
- Agent 3 (Test Grouper): Groups related atomic test steps from Agent 1 and their results from Agent 2 into meaningful, cohesive test scenarios. It identifies test types (validation, functional, integration, workflow) and provides aggregated reporting.
Together, they form the Vibetester pipeline, testing the application's behavior based on the user's intent and providing structured test reports.
- Python 3.12 or higher
- uv
-
Clone the repo
-
Sync dependencies:
uv sync
[!NOTE]
uv syncautomatically creates a virtual environment in.venv. If you use VSCode and Pylance cannot resolve packages, open the command palette, execute "Python: Select Interpreter", and choose the environment in.venv. -
Duplicate
.env.exampleto.envand add your API keys (e.g.,BROWSER_USE_API_KEY,OPENAI_API_KEY, etc.).
The full pipeline: extracts UX requirements from a test case and tests them in a browser.
# Recommended: Use unified test case file (contains URL and transcript)
uv run vibetester -tc pitch-humanity-simple.json
# Legacy: Separate transcript and URL
uv run vibetester -t my_transcript.json -u https://myapp.example.comThis will:
- Load the test case from
./data/test-cases/pitch-humanity-simple.json - Extract UX requirements using Agent 1
- Test them in a browser using Agent 2
- Group atomic test results into meaningful scenarios using Agent 3
- Save results to
./data/results/(when--loggingis enabled)
Run uv run vibetester --help for all options, or refer to the dedicated README.
uv run agent1Run uv run agent1 --help for all options, or refer to the dedicated README.
Agent 1 uses a pre-compiled DSPy model for fast inference. To retrain/optimize with new examples:
uv run agent1-compileSee the Agent 1 README for details on the compilation workflow.
uv run agent2Run uv run agent2 --help for all options, or refer to the dedicated README.
Note
Avoid interacting with the browser window spawned by Agent 2 to not disrupt the agent.
Logging is disabled by default. Enable it via:
- CLI flag:
--logging - Environment variable:
LOGGING=true
Logs are saved to ./data/results/ as JSON files with timestamps.
-
Add a dependency to a package (e.g. agent 1):
uv add --package agent1 <package-name>