Skip to content

vibrantlabsai/PA-Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PA Bench SDK

This repository hosts a minimal SDK (plus CLI) for loading the new PA Bench scenarios directly into the Vibrant Labs worlds and running the verifier that ships with each scenario.

Installing

Use the supplied pyproject.toml when installing:

python -m pip install -e .

This exposes the CLI entry point pa-bench, which delegates to the SDK's pa_bench_sdk.cli module.

Loading scenarios

Each scenario folder already bundles the clone state under gomail and gocalendar. Load one with:

python -m pa_bench_sdk.cli load-scenario scenario_001_multi_meeting_coordination

Or use the pa-bench script installed via pip:

pa-bench load-scenario scenario_005_conflict_detection

Pass --data-path if your scenarios live elsewhere, --gomail-url / --gocalendar-url to override instance URLs, and --worlds-base-url to set the Worlds API URL.

Verifying

After loading, run the verifier against the live clones:

python -m pa_bench_sdk.cli verify scenario_001_multi_meeting_coordination

The CLI fetches the current state for each clone, imports the scenario's verifier.py, and prints the reward plus each TaskVerifier's verdict. It exits with a non-zero status if any check fails.

Directory layout

  • data/: Scenario folders as provided by the recent data batch. Each folder contains task.json, data.json, and verifier.py.
  • pa_bench_sdk/: SDK modules (scenario, worlds, verifier, cli).
  • gordon/: Local stand-in for the original gordon.TaskVerifier.
  • tests/: Pytest test cases verifying loader, verifier, and CLI.

Preparing Vibrant Labs worlds

If you already have clone URLs in environment variables or .env, the CLI will reuse them. If they are missing, the SDK automatically creates clones on Worlds Vibrant Labs by POSTing to /envs/<type>/create (same flow as /datasets-anthropic/pa-bench/create_instances.py) and persists the new URLs to .env.

WORLDS_BASE_URL=http://your-worlds-server:8080
GOMAIL_INSTANCE_URL=http://xxx.worlds.vibrantlabs.com
GOCALENDAR_INSTANCE_URL=http://yyy.worlds.vibrantlabs.com

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages