Skip to content

Conversation

@mikeldking
Copy link
Collaborator

@mikeldking mikeldking commented Sep 25, 2025

this is the feature branch for the upcoming version 13

@mikeldking mikeldking requested review from a team as code owners September 25, 2025 20:54
@github-project-automation github-project-automation bot moved this to 📘 Todo in phoenix Sep 25, 2025
@dosubot dosubot bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Sep 25, 2025
@mikeldking mikeldking changed the base branch from main to feat/version-12 September 25, 2025 20:56
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Sep 25, 2025
@mikeldking mikeldking changed the title version 13 feat!: version 13 - dataset evaluators Sep 25, 2025
@pkg-pr-new
Copy link

pkg-pr-new bot commented Sep 25, 2025

Open in StackBlitz

npm i https://pkg.pr.new/Arize-ai/phoenix/@arizeai/phoenix-client@9642
npm i https://pkg.pr.new/Arize-ai/phoenix/@arizeai/phoenix-mcp@9642

commit: 7a0c5f7

@mikeldking mikeldking added the feature branch a feature branch that consolidates multiple features into a single commit on main label Sep 25, 2025
@mikeldking mikeldking marked this pull request as draft September 25, 2025 21:22
Copy link
Contributor

@RogerHYang RogerHYang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blocking feature branch

@github-project-automation github-project-automation bot moved this from 📘 Todo to 🔍. Needs Review in phoenix Sep 25, 2025
Base automatically changed from feat/version-12 to main September 29, 2025 18:13
An error occurred while trying to automatically change base from feat/version-12 to main September 29, 2025 18:13
@RogerHYang RogerHYang removed this from phoenix Oct 6, 2025
@RogerHYang RogerHYang force-pushed the version-13 branch 4 times, most recently from fc49ed1 to 2b19c56 Compare October 24, 2025 15:47
@RogerHYang RogerHYang force-pushed the version-13 branch 3 times, most recently from f4ab1f0 to 2ef94fd Compare October 29, 2025 15:31
@mikeldking mikeldking closed this Nov 4, 2025
@mikeldking mikeldking reopened this Nov 4, 2025
cephalization and others added 25 commits November 20, 2025 08:41
* add minimal evaluators menu

* handle selection

* styling

* add menu footer

* set menu max width

* replace footer button with link
* feat: Add name field to evaluators form

* Reorganize choices and their default state

* Disable prompt save, tools, response format

* Move input mapping field from select to combobox

* Update arrow icon

* Persist input mapping fields across labels

* Do not render response format or tools if they are saved as provider default
* Add dummy evaluation payloads to single playground run

* Implement for chat mutations and subscription over dataset

* Ruff 🐶 and update graphql schema

* compile relay

* frontend

* Add dataset example id and repetition number

* Address feedback

* Update input typing

* Update relay

* Load and display real global evaluators

---------

Co-authored-by: Alexander Song <[email protected]>
Co-authored-by: Tony Powell <[email protected]>
* Add filter and sort capabilities to evaluators

* Improve clarity of allowed sort columns
* add EvaluatorSelect to dataset page

* stub out evaluator config dialog and rework data fetching

* add readonly prompt messages to eval config modal

* add output config to modal

* add dataset example preview and input mapping section to modal

* wire up add evaluator mutation

* add suspense boundaries

* Refactor promptVersionToInstance to depend on inline fragment

* remove unnecessary type annotations:

---------

Co-authored-by: Tony Powell <[email protected]>
* evaluator crud

* clean

* patch mutation

* update

* types

* Revert "types"

This reverts commit 25579b5.

* type ignore

* plural delete

* clean

* decorator

* fix metadata

* clean

* clean

* already exists

* test

* simplify

* test

* simplify
* add annotation name to eval select

* address feedback
…s useful (#10187)

* feat(evaluators): provide a useful correctness pre-built evaluator

* feat(evaluators): provide a useful correctness pre-built evaluator

* simplify
* evaluator prompt validation

* cursor tests

* clean

* condense

* test

* clean

* clean

* test

* parse pydantic errors

* clean

* validate mutations

* fix tests

* validate choices

* test with form

* test

* type check

* clean
* include only dataset-specific evaluators in playground eval selector

* fix dataset page tab selection

* add aria label to dialog

* add annotation names to playground select

* handle long annotation  names

* separate components for DatasetEvaluatorSelect and PlaygroundEvaluatorSelect

* remove extra opacity css var

* updates to Menu

* updates to evaluator menus

* fix menu item flicker

* wip: enable mapping evaluator from playground

* formatting
* add eval outputs to playground output cell

* add evaluation details popover & trace link

* include evals in output for non-streaming playground runs

* fix unnecessary truncation of eval name

* handle evaluations on error

* fix evaluation name

* rerun CI

* prevent losing example data when handling tool chunk

---------

Co-authored-by: Alexander Song <[email protected]>
* feat: Create distinct slideovers for evaluator use cases

* fix: manually update updated_at when creating llm_evaluator

* fix: global change to combobox, opens submenu on enter

---------

Co-authored-by: Rick Steele <[email protected]>
anticorrelator and others added 4 commits November 20, 2025 15:16
* Spike out builtin evaluator interfaces

* Get builtin evaluator if it exists

* Refine data model

* Simplify models

* Implement literal/path mapping logic

* Wire up builtin evaluators

* Persist single-run evaluations as SpanAnnotations

* Update gql schema and run relay compiler

* Fix evaluation over playground dataset run

* ruff

* Fix queries w.r.t BuiltInEvaluator

* Add built in evaluators to dataset evaluators query

* Add xfail to dataset evaluator test

* Ignore missing type stubs

* fix evaluators over single chat

* fix ts ci

---------

Co-authored-by: Tony Powell <[email protected]>
Co-authored-by: Alexander Song <[email protected]>
* wip: enable unassigning a dataset evaluator

* update cached evaluator data upon assignment/unassignment

* add confirmation dialog

* wire up evaluator unlink with optional delete

* remove row selectability

* add comment

* use alert banner instead of toast for errors

* explicitly close dialog on successful delete/unlink
* fix evaluator config dialog header overflow

* fix dataset select overflow

* styling

* dataset select styling
* feat: Add builtin evaluator support to crosswalk table

* Fix migration and updqte gql schema

* Fix relationship definition

* feat: Add prebuilt evaluators to template submenu

* Tweak language

* feat: Support input mapping code evaluators

* Improve dataset messaging in evaluator form

* update default evaluator template

* Add DatasetExampleSelect component

Also makes combobox and dataset select more responsive

* Allow users to edit evaluator input preview

* Fix db constraints for input mapping

* Wire up input mapping end to end

* Fix ruff

* use fastapi instead of starlette import

* Remove xfail and clean up input_config handling

* Ruff

* Verify evaluator id existence for type checker

* Build gql schema and run relay compiler

* Pull output from input-mapped inputs

* Insure input config is stored as JSON

* Add minWidth prop to Select

* Fix evaluator config dialog header truncation

* Use both unique constraint and partial index

* Add builtin evaluators to dataloader

* Call lower() after str conversion

* Rename evaluator for simplicity

* Remove explicit constraint name

* Update variable name

* Address PR feedback

* Change column name from input_config to input_mapping

* Update tests and other input_config references

* Make mypy happy

---------

Co-authored-by: Dustin Ngo <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature branch a feature branch that consolidates multiple features into a single commit on main size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants