This repository hosts a minimal SDK (plus CLI) for loading the new PA Bench scenarios directly into the Vibrant Labs worlds and running the verifier that ships with each scenario.
Use the supplied pyproject.toml when installing:
python -m pip install -e .This exposes the CLI entry point pa-bench, which delegates to the
SDK's pa_bench_sdk.cli module.
Each scenario folder already bundles the clone state under gomail
and gocalendar. Load one with:
python -m pa_bench_sdk.cli load-scenario scenario_001_multi_meeting_coordinationOr use the pa-bench script installed via pip:
pa-bench load-scenario scenario_005_conflict_detectionPass --data-path if your scenarios live elsewhere, --gomail-url
/ --gocalendar-url to override instance URLs, and --worlds-base-url to set the Worlds API URL.
After loading, run the verifier against the live clones:
python -m pa_bench_sdk.cli verify scenario_001_multi_meeting_coordinationThe CLI fetches the current state for each clone, imports the scenario's
verifier.py, and prints the reward plus each TaskVerifier's verdict.
It exits with a non-zero status if any check fails.
data/: Scenario folders as provided by the recent data batch. Each folder containstask.json,data.json, andverifier.py.pa_bench_sdk/: SDK modules (scenario,worlds,verifier,cli).gordon/: Local stand-in for the originalgordon.TaskVerifier.tests/: Pytest test cases verifying loader, verifier, and CLI.
If you already have clone URLs in environment variables or .env, the CLI
will reuse them. If they are missing, the SDK automatically creates clones
on Worlds Vibrant Labs by POSTing to /envs/<type>/create (same flow as
/datasets-anthropic/pa-bench/create_instances.py) and persists the new URLs
to .env.
WORLDS_BASE_URL=http://your-worlds-server:8080
GOMAIL_INSTANCE_URL=http://xxx.worlds.vibrantlabs.com
GOCALENDAR_INSTANCE_URL=http://yyy.worlds.vibrantlabs.com