Skip to content

Conversation

@geemi725
Copy link
Collaborator

@geemi725 geemi725 commented Jun 6, 2025

This PR refactors

  • Grading + Eval scripts to work with new field evaluation mode
  • unit tests
  • pre-commit/ linting
  • README
  • run_zeroshot_evals.py -> generate_zeroshot_evals.py

@geemi725 geemi725 marked this pull request as ready for review June 11, 2025 22:41
@geemi725 geemi725 requested review from jonlaurent and ludomitch June 11, 2025 22:41
@geemi725 geemi725 self-assigned this Jun 11, 2025
@geemi725 geemi725 changed the title WIP: Update eval scripts with work with "answer and grading modes" Refactors BB dataset, eva + grading scripts Jun 12, 2025
@jonlaurent
Copy link
Collaborator

I can't comment much on the code, but looks good to me. Is the README still a TODO?

@geemi725
Copy link
Collaborator Author

@jonlaurent README is now updated

unsure=unsure,
partial_match=False,
llm_match=False,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to update the open_ended_prompt_template to ensure it outputs things that can be evaluated by the exact match grade_range_verifier? A bit scared the LLMs output may be correct but will fail on the grade_verifier method

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ludomitch yes, feel free to suggest changes here

@geemi725 geemi725 merged commit 5c54e35 into main Jun 12, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants