-
Notifications
You must be signed in to change notification settings - Fork 637
feat!: version 13 - dataset evaluators #9642
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
mikeldking
wants to merge
39
commits into
main
Choose a base branch
from
version-13
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
commit: |
RogerHYang
requested changes
Sep 25, 2025
Contributor
RogerHYang
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blocking feature branch
001f109 to
b65ea42
Compare
An error occurred while trying to automatically change base from
feat/version-12
to
main
September 29, 2025 18:13
fc49ed1 to
2b19c56
Compare
f4ab1f0 to
2ef94fd
Compare
* add minimal evaluators menu * handle selection * styling * add menu footer * set menu max width * replace footer button with link
* feat: Add name field to evaluators form * Reorganize choices and their default state * Disable prompt save, tools, response format * Move input mapping field from select to combobox * Update arrow icon * Persist input mapping fields across labels * Do not render response format or tools if they are saved as provider default
* Add dummy evaluation payloads to single playground run * Implement for chat mutations and subscription over dataset * Ruff 🐶 and update graphql schema * compile relay * frontend * Add dataset example id and repetition number * Address feedback * Update input typing * Update relay * Load and display real global evaluators --------- Co-authored-by: Alexander Song <[email protected]> Co-authored-by: Tony Powell <[email protected]>
* Add filter and sort capabilities to evaluators * Improve clarity of allowed sort columns
* add EvaluatorSelect to dataset page * stub out evaluator config dialog and rework data fetching * add readonly prompt messages to eval config modal * add output config to modal * add dataset example preview and input mapping section to modal * wire up add evaluator mutation * add suspense boundaries * Refactor promptVersionToInstance to depend on inline fragment * remove unnecessary type annotations: --------- Co-authored-by: Tony Powell <[email protected]>
…0152) * output config resolver * clean
* evaluator crud * clean * patch mutation * update * types * Revert "types" This reverts commit 25579b5. * type ignore * plural delete * clean * decorator * fix metadata * clean * clean * already exists * test * simplify * test * simplify
* add annotation name to eval select * address feedback
…s useful (#10187) * feat(evaluators): provide a useful correctness pre-built evaluator * feat(evaluators): provide a useful correctness pre-built evaluator * simplify
* evaluator prompt validation * cursor tests * clean * condense * test * clean * clean * test * parse pydantic errors * clean * validate mutations * fix tests * validate choices * test with form * test * type check * clean
* include only dataset-specific evaluators in playground eval selector * fix dataset page tab selection * add aria label to dialog * add annotation names to playground select * handle long annotation names * separate components for DatasetEvaluatorSelect and PlaygroundEvaluatorSelect * remove extra opacity css var * updates to Menu * updates to evaluator menus * fix menu item flicker * wip: enable mapping evaluator from playground * formatting
* add eval outputs to playground output cell * add evaluation details popover & trace link * include evals in output for non-streaming playground runs * fix unnecessary truncation of eval name * handle evaluations on error * fix evaluation name * rerun CI * prevent losing example data when handling tool chunk --------- Co-authored-by: Alexander Song <[email protected]>
* feat: Create distinct slideovers for evaluator use cases * fix: manually update updated_at when creating llm_evaluator * fix: global change to combobox, opens submenu on enter --------- Co-authored-by: Rick Steele <[email protected]>
289b75c to
c011874
Compare
* Spike out builtin evaluator interfaces * Get builtin evaluator if it exists * Refine data model * Simplify models * Implement literal/path mapping logic * Wire up builtin evaluators * Persist single-run evaluations as SpanAnnotations * Update gql schema and run relay compiler * Fix evaluation over playground dataset run * ruff * Fix queries w.r.t BuiltInEvaluator * Add built in evaluators to dataset evaluators query * Add xfail to dataset evaluator test * Ignore missing type stubs * fix evaluators over single chat * fix ts ci --------- Co-authored-by: Tony Powell <[email protected]> Co-authored-by: Alexander Song <[email protected]>
* wip: enable unassigning a dataset evaluator * update cached evaluator data upon assignment/unassignment * add confirmation dialog * wire up evaluator unlink with optional delete * remove row selectability * add comment * use alert banner instead of toast for errors * explicitly close dialog on successful delete/unlink
* fix evaluator config dialog header overflow * fix dataset select overflow * styling * dataset select styling
* feat: Add builtin evaluator support to crosswalk table * Fix migration and updqte gql schema * Fix relationship definition * feat: Add prebuilt evaluators to template submenu * Tweak language * feat: Support input mapping code evaluators * Improve dataset messaging in evaluator form * update default evaluator template * Add DatasetExampleSelect component Also makes combobox and dataset select more responsive * Allow users to edit evaluator input preview * Fix db constraints for input mapping * Wire up input mapping end to end * Fix ruff * use fastapi instead of starlette import * Remove xfail and clean up input_config handling * Ruff * Verify evaluator id existence for type checker * Build gql schema and run relay compiler * Pull output from input-mapped inputs * Insure input config is stored as JSON * Add minWidth prop to Select * Fix evaluator config dialog header truncation * Use both unique constraint and partial index * Add builtin evaluators to dataloader * Call lower() after str conversion * Rename evaluator for simplicity * Remove explicit constraint name * Update variable name * Address PR feedback * Change column name from input_config to input_mapping * Update tests and other input_config references * Make mypy happy --------- Co-authored-by: Dustin Ngo <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
feature branch
a feature branch that consolidates multiple features into a single commit on main
size:XS
This PR changes 0-9 lines, ignoring generated files.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
this is the feature branch for the upcoming version 13