Skip to content

Commit ef3080b

Browse files
committed
Initial commit with automatic conversion script from HELM eval to unified schema
1 parent 11725a9 commit ef3080b

File tree

14 files changed

+11654
-484
lines changed

14 files changed

+11654
-484
lines changed

README.md

Lines changed: 45 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Convert eval log from Inspect AI into json format with following command:
2626
uv run inspect log convert path_to_eval_file_generated_by_inspect --to json --output-dir inspect_json
2727
```
2828

29-
Then we can convert Inspect evaluation log into unified schema via eval_converters/inspect/converter.py. Conversion for example data can be generated via below script:
29+
Then we can convert Inspect evaluation log into unified schema via `eval_converters/inspect/converter.py`. Conversion for example data can be generated via below script:
3030

3131
```bash
3232
uv run python3 -m eval_converters.inspect.converter
@@ -63,7 +63,50 @@ options:
6363
--source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL
6464
```
6565

66-
## Tests
66+
67+
### HELM
68+
69+
Convert eval log from HELM into json format with following command:
70+
71+
```bash
72+
uv run inspect log convert path_to_eval_file_generated_by_inspect --to json --output-dir inspect_json
73+
```
74+
75+
You can convert HELM evaluation log into unified schema via `eval_converters/helm/converter.py`. For example:
76+
77+
```bash
78+
uv run python3 -m eval_converters.inspect.converter --log_path tests/data/helm
79+
```
80+
81+
The automatic conversion script requires following files generated by HELM to work correctly:
82+
- per_instance_stats.json
83+
- run_spec.json
84+
- scenario_state.json
85+
- scenario.json
86+
- stats.json
87+
88+
Full manual for conversion of your own HELM evaluation log into unified is available below:
89+
90+
```bash
91+
usage: converter.py [-h] [--log_dirpath LOG_DIRPATH] [--huggingface_dataset HUGGINGFACE_DATASET] [--output_dir OUTPUT_DIR] [--source_organization_name SOURCE_ORGANIZATION_NAME]
92+
[--evaluator_relationship {first_party,third_party,collaborative,other}] [--source_organization_url SOURCE_ORGANIZATION_URL]
93+
[--source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL]
94+
95+
options:
96+
-h, --help show this help message and exit
97+
--log_dirpath LOG_DIRPATH
98+
Path to directory with single evaluaion or multiple evaluations to convert
99+
--huggingface_dataset HUGGINGFACE_DATASET
100+
--output_dir OUTPUT_DIR
101+
--source_organization_name SOURCE_ORGANIZATION_NAME
102+
Orgnization which pushed evaluation to the evalHub.
103+
--evaluator_relationship {first_party,third_party,collaborative,other}
104+
Relationship of evaluation author to the model
105+
--source_organization_url SOURCE_ORGANIZATION_URL
106+
--source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL
107+
```
108+
109+
### Tests
67110

68111
Run below script to perform unit tests for all evaluation platforms.
69112

0 commit comments

Comments
 (0)