-
Notifications
You must be signed in to change notification settings - Fork 469
feat(llmobs): [MLOB-4687] add api for running evals on exported spans #15435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 218 ± 3 ms. The average import time from base is: 218 ± 3 ms. The import time difference between this PR and base is: -0.1 ± 0.1 ms. The difference is not statistically significant (z = -0.67). Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate nicole-cybul/custom-eval-api (a0c2fd1) with baseline main (8851ec9) 📈 Performance Regressions (3 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 0.405µs (SLO: <10.000µs 📉 -96.0%) vs baseline: ~same Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.4% ✅ add_inplace_aspectTime: ✅ 0.404µs (SLO: <10.000µs 📉 -96.0%) vs baseline: -1.3% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.8% ✅ add_inplace_noaspectTime: ✅ 0.317µs (SLO: <10.000µs 📉 -96.8%) vs baseline: -2.9% Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.4% ✅ add_noaspectTime: ✅ 0.276µs (SLO: <10.000µs 📉 -97.2%) vs baseline: +0.2% Memory: ✅ 40.364MB (SLO: <41.500MB -2.7%) vs baseline: +5.1% ✅ bytearray_aspectTime: ✅ 1.358µs (SLO: <10.000µs 📉 -86.4%) vs baseline: +3.3% Memory: ✅ 40.324MB (SLO: <41.500MB -2.8%) vs baseline: +5.2% ✅ bytearray_extend_aspectTime: ✅ 1.493µs (SLO: <10.000µs 📉 -85.1%) vs baseline: -0.3% Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.9% ✅ bytearray_extend_noaspectTime: ✅ 0.614µs (SLO: <10.000µs 📉 -93.9%) vs baseline: +0.2% Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.7% ✅ bytearray_noaspectTime: ✅ 0.480µs (SLO: <10.000µs 📉 -95.2%) vs baseline: -0.6% Memory: ✅ 40.226MB (SLO: <41.500MB -3.1%) vs baseline: +4.8% ✅ bytes_aspectTime: ✅ 1.294µs (SLO: <10.000µs 📉 -87.1%) vs baseline: +1.6% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.8% ✅ bytes_noaspectTime: ✅ 0.493µs (SLO: <10.000µs 📉 -95.1%) vs baseline: -0.8% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.7% ✅ bytesio_aspectTime: ✅ 1.350µs (SLO: <10.000µs 📉 -86.5%) vs baseline: +2.3% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.7% ✅ bytesio_noaspectTime: ✅ 0.495µs (SLO: <10.000µs 📉 -95.1%) vs baseline: +0.3% Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.6% ✅ capitalize_aspectTime: ✅ 0.743µs (SLO: <10.000µs 📉 -92.6%) vs baseline: +1.0% Memory: ✅ 40.246MB (SLO: <41.500MB -3.0%) vs baseline: +4.8% ✅ capitalize_noaspectTime: ✅ 0.437µs (SLO: <10.000µs 📉 -95.6%) vs baseline: +0.4% Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.9% ✅ casefold_aspectTime: ✅ 0.734µs (SLO: <10.000µs 📉 -92.7%) vs baseline: ~same Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.6% ✅ casefold_noaspectTime: ✅ 0.366µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -2.1% Memory: ✅ 40.088MB (SLO: <41.500MB -3.4%) vs baseline: +4.5% ✅ decode_aspectTime: ✅ 0.727µs (SLO: <10.000µs 📉 -92.7%) vs baseline: +0.4% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.9% ✅ decode_noaspectTime: ✅ 0.416µs (SLO: <10.000µs 📉 -95.8%) vs baseline: ~same Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.6% ✅ encode_aspectTime: ✅ 0.708µs (SLO: <10.000µs 📉 -92.9%) vs baseline: -0.2% Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.6% ✅ encode_noaspectTime: ✅ 0.401µs (SLO: <10.000µs 📉 -96.0%) vs baseline: +0.4% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.8% ✅ format_aspectTime: ✅ 3.417µs (SLO: <10.000µs 📉 -65.8%) vs baseline: +1.2% Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.9% ✅ format_map_aspectTime: ✅ 3.543µs (SLO: <10.000µs 📉 -64.6%) vs baseline: -0.7% Memory: ✅ 40.344MB (SLO: <41.500MB -2.8%) vs baseline: +4.8% ✅ format_map_noaspectTime: ✅ 0.770µs (SLO: <10.000µs 📉 -92.3%) vs baseline: -0.4% Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.5% ✅ format_noaspectTime: ✅ 0.596µs (SLO: <10.000µs 📉 -94.0%) vs baseline: ~same Memory: ✅ 40.285MB (SLO: <41.500MB -2.9%) vs baseline: +4.9% ✅ index_aspectTime: ✅ 0.358µs (SLO: <10.000µs 📉 -96.4%) vs baseline: +0.5% Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.5% ✅ index_noaspectTime: ✅ 0.276µs (SLO: <10.000µs 📉 -97.2%) vs baseline: -0.6% Memory: ✅ 40.285MB (SLO: <41.500MB -2.9%) vs baseline: +4.4% ✅ join_aspectTime: ✅ 1.346µs (SLO: <10.000µs 📉 -86.5%) vs baseline: -2.7% Memory: ✅ 40.324MB (SLO: <41.500MB -2.8%) vs baseline: +4.9% ✅ join_noaspectTime: ✅ 0.491µs (SLO: <10.000µs 📉 -95.1%) vs baseline: -0.1% Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.7% ✅ ljust_aspectTime: ✅ 2.615µs (SLO: <20.000µs 📉 -86.9%) vs baseline: +5.2% Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.0% ✅ ljust_noaspectTime: ✅ 0.407µs (SLO: <10.000µs 📉 -95.9%) vs baseline: +0.4% Memory: ✅ 40.324MB (SLO: <41.500MB -2.8%) vs baseline: +4.8% ✅ lower_aspectTime: ✅ 2.304µs (SLO: <10.000µs 📉 -77.0%) vs baseline: +4.9% Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.7% ✅ lower_noaspectTime: ✅ 0.367µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -0.2% Memory: ✅ 40.246MB (SLO: <41.500MB -3.0%) vs baseline: +4.6% ✅ lstrip_aspectTime: ✅ 2.274µs (SLO: <20.000µs 📉 -88.6%) vs baseline: +3.9% Memory: ✅ 40.383MB (SLO: <41.500MB -2.7%) vs baseline: +4.7% ✅ lstrip_noaspectTime: ✅ 0.380µs (SLO: <10.000µs 📉 -96.2%) vs baseline: -1.2% Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +5.1% ✅ modulo_aspectTime: ✅ 1.046µs (SLO: <10.000µs 📉 -89.5%) vs baseline: +5.2% Memory: ✅ 40.088MB (SLO: <41.500MB -3.4%) vs baseline: +4.5% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 1.552µs (SLO: <10.000µs 📉 -84.5%) vs baseline: +0.2% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.7% ✅ modulo_aspect_for_bytesTime: ✅ 0.974µs (SLO: <10.000µs 📉 -90.3%) vs baseline: -4.4% Memory: ✅ 40.246MB (SLO: <41.500MB -3.0%) vs baseline: +5.0% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 1.239µs (SLO: <10.000µs 📉 -87.6%) vs baseline: -0.4% Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.4% ✅ modulo_noaspectTime: ✅ 0.629µs (SLO: <10.000µs 📉 -93.7%) vs baseline: -0.4% Memory: ✅ 40.088MB (SLO: <41.500MB -3.4%) vs baseline: +4.4% ✅ replace_aspectTime: ✅ 4.865µs (SLO: <10.000µs 📉 -51.4%) vs baseline: +1.8% Memory: ✅ 40.364MB (SLO: <41.500MB -2.7%) vs baseline: +5.2% ✅ replace_noaspectTime: ✅ 0.457µs (SLO: <10.000µs 📉 -95.4%) vs baseline: -0.2% Memory: ✅ 40.226MB (SLO: <41.500MB -3.1%) vs baseline: +4.8% ✅ repr_aspectTime: ✅ 0.909µs (SLO: <10.000µs 📉 -90.9%) vs baseline: +0.4% Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.9% ✅ repr_noaspectTime: ✅ 0.420µs (SLO: <10.000µs 📉 -95.8%) vs baseline: +0.6% Memory: ✅ 40.167MB (SLO: <41.500MB -3.2%) vs baseline: +4.3% ✅ rstrip_aspectTime: ✅ 1.924µs (SLO: <20.000µs 📉 -90.4%) vs baseline: +2.2% Memory: ✅ 40.364MB (SLO: <41.500MB -2.7%) vs baseline: +4.9% ✅ rstrip_noaspectTime: ✅ 0.383µs (SLO: <10.000µs 📉 -96.2%) vs baseline: +0.5% Memory: ✅ 40.226MB (SLO: <41.500MB -3.1%) vs baseline: +4.7% ✅ slice_aspectTime: ✅ 0.497µs (SLO: <10.000µs 📉 -95.0%) vs baseline: ~same Memory: ✅ 40.226MB (SLO: <41.500MB -3.1%) vs baseline: +4.9% ✅ slice_noaspectTime: ✅ 0.451µs (SLO: <10.000µs 📉 -95.5%) vs baseline: -0.4% Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.8% ✅ stringio_aspectTime: ✅ 1.785µs (SLO: <10.000µs 📉 -82.2%) vs baseline: 📈 +16.8% Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.7% ✅ stringio_noaspectTime: ✅ 0.719µs (SLO: <10.000µs 📉 -92.8%) vs baseline: ~same Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.7% ✅ strip_aspectTime: ✅ 2.226µs (SLO: <20.000µs 📉 -88.9%) vs baseline: +1.7% Memory: ✅ 40.285MB (SLO: <41.500MB -2.9%) vs baseline: +4.9% ✅ strip_noaspectTime: ✅ 0.389µs (SLO: <10.000µs 📉 -96.1%) vs baseline: +1.3% Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.9% ✅ swapcase_aspectTime: ✅ 2.508µs (SLO: <10.000µs 📉 -74.9%) vs baseline: +5.3% Memory: ✅ 40.383MB (SLO: <41.500MB -2.7%) vs baseline: +5.2% ✅ swapcase_noaspectTime: ✅ 0.536µs (SLO: <10.000µs 📉 -94.6%) vs baseline: -0.3% Memory: ✅ 40.246MB (SLO: <41.500MB -3.0%) vs baseline: +4.7% ✅ title_aspectTime: ✅ 2.414µs (SLO: <10.000µs 📉 -75.9%) vs baseline: +2.0% Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.6% ✅ title_noaspectTime: ✅ 0.501µs (SLO: <10.000µs 📉 -95.0%) vs baseline: -0.7% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.6% ✅ translate_aspectTime: ✅ 3.312µs (SLO: <10.000µs 📉 -66.9%) vs baseline: +4.5% Memory: ✅ 40.344MB (SLO: <41.500MB -2.8%) vs baseline: +5.1% ✅ translate_noaspectTime: ✅ 1.040µs (SLO: <10.000µs 📉 -89.6%) vs baseline: ~same Memory: ✅ 40.167MB (SLO: <41.500MB -3.2%) vs baseline: +4.2% ✅ upper_aspectTime: ✅ 2.298µs (SLO: <10.000µs 📉 -77.0%) vs baseline: +4.3% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.4% ✅ upper_noaspectTime: ✅ 0.368µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -1.4% Memory: ✅ 40.285MB (SLO: <41.500MB -2.9%) vs baseline: +4.9% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.147µs (SLO: <10.000µs 📉 -48.5%) vs baseline: 📈 +20.1% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.6% ✅ ospathbasename_noaspectTime: ✅ 1.088µs (SLO: <10.000µs 📉 -89.1%) vs baseline: -0.1% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9% ✅ ospathjoin_aspectTime: ✅ 6.167µs (SLO: <10.000µs 📉 -38.3%) vs baseline: +0.4% Memory: ✅ 40.128MB (SLO: <41.000MB -2.1%) vs baseline: +4.3% ✅ ospathjoin_noaspectTime: ✅ 2.286µs (SLO: <10.000µs 📉 -77.1%) vs baseline: -0.5% Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.4% ✅ ospathnormcase_aspectTime: ✅ 3.421µs (SLO: <10.000µs 📉 -65.8%) vs baseline: -3.6% Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.1% ✅ ospathnormcase_noaspectTime: ✅ 0.571µs (SLO: <10.000µs 📉 -94.3%) vs baseline: ~same Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.5% ✅ ospathsplit_aspectTime: ✅ 4.752µs (SLO: <10.000µs 📉 -52.5%) vs baseline: -2.9% Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +5.0% ✅ ospathsplit_noaspectTime: ✅ 1.598µs (SLO: <10.000µs 📉 -84.0%) vs baseline: +0.4% Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.4% ✅ ospathsplitdrive_aspectTime: ✅ 3.632µs (SLO: <10.000µs 📉 -63.7%) vs baseline: -2.1% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.9% ✅ ospathsplitdrive_noaspectTime: ✅ 0.698µs (SLO: <10.000µs 📉 -93.0%) vs baseline: ~same Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) ✅ ospathsplitext_aspectTime: ✅ 4.511µs (SLO: <10.000µs 📉 -54.9%) vs baseline: -1.6% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.7% ✅ ospathsplitext_noaspectTime: ✅ 1.383µs (SLO: <10.000µs 📉 -86.2%) vs baseline: +0.6% Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.0% 📈 telemetryaddmetric - 30/30✅ 1-count-metric-1-timesTime: ✅ 3.380µs (SLO: <20.000µs 📉 -83.1%) vs baseline: 📈 +15.9% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.7% ✅ 1-count-metrics-100-timesTime: ✅ 200.753µs (SLO: <220.000µs -8.7%) vs baseline: -0.4% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.6% ✅ 1-distribution-metric-1-timesTime: ✅ 3.274µs (SLO: <20.000µs 📉 -83.6%) vs baseline: +0.7% Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.3% ✅ 1-distribution-metrics-100-timesTime: ✅ 216.842µs (SLO: <230.000µs -5.7%) vs baseline: +1.7% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.7% ✅ 1-gauge-metric-1-timesTime: ✅ 2.137µs (SLO: <20.000µs 📉 -89.3%) vs baseline: -2.1% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.3% ✅ 1-gauge-metrics-100-timesTime: ✅ 136.315µs (SLO: <150.000µs -9.1%) vs baseline: ~same Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.6% ✅ 1-rate-metric-1-timesTime: ✅ 3.073µs (SLO: <20.000µs 📉 -84.6%) vs baseline: +1.0% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.7% ✅ 1-rate-metrics-100-timesTime: ✅ 213.563µs (SLO: <250.000µs 📉 -14.6%) vs baseline: ~same Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.9% ✅ 100-count-metrics-100-timesTime: ✅ 20.345ms (SLO: <22.000ms -7.5%) vs baseline: +0.2% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.6% ✅ 100-distribution-metrics-100-timesTime: ✅ 2.274ms (SLO: <2.300ms 🟡 -1.1%) vs baseline: +0.5% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +3.8% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.413ms (SLO: <1.550ms -8.9%) vs baseline: +0.5% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.5% ✅ 100-rate-metrics-100-timesTime: ✅ 2.229ms (SLO: <2.550ms 📉 -12.6%) vs baseline: +0.4% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.3% ✅ flush-1-metricTime: ✅ 4.583µs (SLO: <20.000µs 📉 -77.1%) vs baseline: +3.9% Memory: ✅ 35.036MB (SLO: <35.500MB 🟡 -1.3%) vs baseline: +4.3% ✅ flush-100-metricsTime: ✅ 173.269µs (SLO: <250.000µs 📉 -30.7%) vs baseline: -0.4% Memory: ✅ 35.193MB (SLO: <35.500MB 🟡 -0.9%) vs baseline: +4.9% ✅ flush-1000-metricsTime: ✅ 2.172ms (SLO: <2.500ms 📉 -13.1%) vs baseline: -0.9% Memory: ✅ 35.960MB (SLO: <36.500MB 🟡 -1.5%) vs baseline: +4.7% 🟡 Near SLO Breach (15 suites)🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 16.307ms (SLO: <19.850ms 📉 -17.9%) vs baseline: +0.3% Memory: ✅ 69.803MB (SLO: <70.000MB 🟡 -0.3%) vs baseline: +4.8% ✅ errortracking-enabled-userTime: ✅ 16.340ms (SLO: <19.400ms 📉 -15.8%) vs baseline: +0.5% Memory: ✅ 69.861MB (SLO: <70.000MB 🟡 -0.2%) vs baseline: +4.9% ✅ tracer-enabledTime: ✅ 16.342ms (SLO: <19.450ms 📉 -16.0%) vs baseline: ~same Memory: ✅ 69.758MB (SLO: <70.000MB 🟡 -0.3%) vs baseline: +4.8% 🟡 errortrackingflasksqli - 6/6✅ errortracking-enabled-allTime: ✅ 2.066ms (SLO: <2.300ms 📉 -10.2%) vs baseline: +0.1% Memory: ✅ 55.345MB (SLO: <56.500MB -2.0%) vs baseline: +4.4% ✅ errortracking-enabled-userTime: ✅ 2.073ms (SLO: <2.250ms -7.9%) vs baseline: +0.1% Memory: ✅ 55.384MB (SLO: <56.500MB 🟡 -2.0%) vs baseline: +4.2% ✅ tracer-enabledTime: ✅ 2.063ms (SLO: <2.300ms 📉 -10.3%) vs baseline: ~same Memory: ✅ 55.424MB (SLO: <56.500MB 🟡 -1.9%) vs baseline: +4.7% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 3.383ms (SLO: <4.750ms 📉 -28.8%) vs baseline: ~same Memory: ✅ 55.548MB (SLO: <66.500MB 📉 -16.5%) vs baseline: +4.7% ✅ appsec-postTime: ✅ 2.855ms (SLO: <6.750ms 📉 -57.7%) vs baseline: ~same Memory: ✅ 55.807MB (SLO: <66.500MB 📉 -16.1%) vs baseline: +4.6% ✅ appsec-telemetryTime: ✅ 3.400ms (SLO: <4.750ms 📉 -28.4%) vs baseline: +0.8% Memory: ✅ 55.552MB (SLO: <66.500MB 📉 -16.5%) vs baseline: +4.9% ✅ debuggerTime: ✅ 1.869ms (SLO: <2.000ms -6.5%) vs baseline: +0.3% Memory: ✅ 47.901MB (SLO: <49.500MB -3.2%) vs baseline: +4.7% ✅ iast-getTime: ✅ 1.855ms (SLO: <2.000ms -7.2%) vs baseline: ~same Memory: ✅ 44.481MB (SLO: <49.000MB -9.2%) vs baseline: +4.2% ✅ profilerTime: ✅ 1.904ms (SLO: <2.100ms -9.3%) vs baseline: -1.4% Memory: ✅ 48.845MB (SLO: <50.000MB -2.3%) vs baseline: +4.9% ✅ resource-renamingTime: ✅ 3.352ms (SLO: <3.650ms -8.2%) vs baseline: -0.6% Memory: ✅ 55.472MB (SLO: <56.000MB 🟡 -0.9%) vs baseline: +4.5% ✅ tracerTime: ✅ 3.370ms (SLO: <3.650ms -7.7%) vs baseline: -0.2% Memory: ✅ 55.551MB (SLO: <56.500MB 🟡 -1.7%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 3.374ms (SLO: <3.650ms -7.6%) vs baseline: +0.2% Memory: ✅ 55.508MB (SLO: <60.000MB -7.5%) vs baseline: +4.8% 🟡 flasksqli - 6/6✅ appsec-enabledTime: ✅ 2.063ms (SLO: <4.200ms 📉 -50.9%) vs baseline: ~same Memory: ✅ 55.424MB (SLO: <66.000MB 📉 -16.0%) vs baseline: +4.8% ✅ iast-enabledTime: ✅ 2.066ms (SLO: <2.800ms 📉 -26.2%) vs baseline: +0.1% Memory: ✅ 55.365MB (SLO: <62.500MB 📉 -11.4%) vs baseline: +4.6% ✅ tracer-enabledTime: ✅ 2.059ms (SLO: <2.250ms -8.5%) vs baseline: +0.1% Memory: ✅ 55.384MB (SLO: <56.500MB 🟡 -2.0%) vs baseline: +4.6% 🟡 httppropagationextract - 60/60✅ all_styles_all_headersTime: ✅ 81.044µs (SLO: <100.000µs 📉 -19.0%) vs baseline: ~same Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4% ✅ b3_headersTime: ✅ 14.212µs (SLO: <20.000µs 📉 -28.9%) vs baseline: ~same Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.1% ✅ b3_single_headersTime: ✅ 13.350µs (SLO: <20.000µs 📉 -33.2%) vs baseline: +0.1% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.2% ✅ datadog_tracecontext_tracestate_not_propagated_on_trace_id_no_matchTime: ✅ 63.699µs (SLO: <80.000µs 📉 -20.4%) vs baseline: -0.4% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.2% ✅ datadog_tracecontext_tracestate_propagated_on_trace_id_matchTime: ✅ 69.479µs (SLO: <80.000µs 📉 -13.2%) vs baseline: +5.2% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.0% ✅ empty_headersTime: ✅ 1.651µs (SLO: <10.000µs 📉 -83.5%) vs baseline: +2.1% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4% ✅ full_t_id_datadog_headersTime: ✅ 22.533µs (SLO: <30.000µs 📉 -24.9%) vs baseline: -0.1% Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.6% ✅ invalid_priority_headerTime: ✅ 6.531µs (SLO: <10.000µs 📉 -34.7%) vs baseline: ~same Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.7% ✅ invalid_span_id_headerTime: ✅ 6.549µs (SLO: <10.000µs 📉 -34.5%) vs baseline: +0.1% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4% ✅ invalid_tags_headerTime: ✅ 6.570µs (SLO: <10.000µs 📉 -34.3%) vs baseline: +0.5% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.5% ✅ invalid_trace_id_headerTime: ✅ 6.508µs (SLO: <10.000µs 📉 -34.9%) vs baseline: -0.1% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.5% ✅ large_header_no_matchesTime: ✅ 27.685µs (SLO: <30.000µs -7.7%) vs baseline: +0.3% Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.6% ✅ large_valid_headers_allTime: ✅ 28.692µs (SLO: <40.000µs 📉 -28.3%) vs baseline: +0.2% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.2% ✅ medium_header_no_matchesTime: ✅ 9.908µs (SLO: <20.000µs 📉 -50.5%) vs baseline: ~same Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.6% ✅ medium_valid_headers_allTime: ✅ 11.303µs (SLO: <20.000µs 📉 -43.5%) vs baseline: ~same Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.5% ✅ none_propagation_styleTime: ✅ 1.713µs (SLO: <10.000µs 📉 -82.9%) vs baseline: ~same Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.4% ✅ tracecontext_headersTime: ✅ 34.846µs (SLO: <40.000µs 📉 -12.9%) vs baseline: +0.3% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.3% ✅ valid_headers_allTime: ✅ 6.572µs (SLO: <10.000µs 📉 -34.3%) vs baseline: +0.8% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4% ✅ valid_headers_basicTime: ✅ 6.122µs (SLO: <10.000µs 📉 -38.8%) vs baseline: +0.7% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.3% ✅ wsgi_empty_headersTime: ✅ 1.605µs (SLO: <10.000µs 📉 -83.9%) vs baseline: +0.3% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.3% ✅ wsgi_invalid_priority_headerTime: ✅ 6.594µs (SLO: <10.000µs 📉 -34.1%) vs baseline: +0.1% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.5% ✅ wsgi_invalid_span_id_headerTime: ✅ 1.637µs (SLO: <10.000µs 📉 -83.6%) vs baseline: +2.1% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.0% ✅ wsgi_invalid_tags_headerTime: ✅ 6.637µs (SLO: <10.000µs 📉 -33.6%) vs baseline: +0.5% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.3% ✅ wsgi_invalid_trace_id_headerTime: ✅ 6.598µs (SLO: <10.000µs 📉 -34.0%) vs baseline: ~same Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.5% ✅ wsgi_large_header_no_matchesTime: ✅ 28.745µs (SLO: <40.000µs 📉 -28.1%) vs baseline: -0.2% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.3% ✅ wsgi_large_valid_headers_allTime: ✅ 29.764µs (SLO: <40.000µs 📉 -25.6%) vs baseline: -0.1% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4% ✅ wsgi_medium_header_no_matchesTime: ✅ 10.176µs (SLO: <20.000µs 📉 -49.1%) vs baseline: +0.3% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.3% ✅ wsgi_medium_valid_headers_allTime: ✅ 11.537µs (SLO: <20.000µs 📉 -42.3%) vs baseline: ~same Memory: ✅ 34.957MB (SLO: <35.500MB 🟡 -1.5%) vs baseline: +4.6% ✅ wsgi_valid_headers_allTime: ✅ 6.593µs (SLO: <10.000µs 📉 -34.1%) vs baseline: +0.5% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.2% ✅ wsgi_valid_headers_basicTime: ✅ 6.157µs (SLO: <10.000µs 📉 -38.4%) vs baseline: +0.7% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4% 🟡 httppropagationinject - 16/16✅ ids_onlyTime: ✅ 22.119µs (SLO: <30.000µs 📉 -26.3%) vs baseline: +5.7% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4% ✅ with_allTime: ✅ 28.045µs (SLO: <40.000µs 📉 -29.9%) vs baseline: +0.8% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.1% ✅ with_dd_originTime: ✅ 24.782µs (SLO: <30.000µs 📉 -17.4%) vs baseline: -0.2% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +3.9% ✅ with_priority_and_originTime: ✅ 24.477µs (SLO: <40.000µs 📉 -38.8%) vs baseline: +1.1% Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.4% ✅ with_sampling_priorityTime: ✅ 21.246µs (SLO: <30.000µs 📉 -29.2%) vs baseline: +1.0% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) ✅ with_tagsTime: ✅ 25.974µs (SLO: <40.000µs 📉 -35.1%) vs baseline: -0.4% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.2% ✅ with_tags_invalidTime: ✅ 27.454µs (SLO: <40.000µs 📉 -31.4%) vs baseline: ~same Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.3% ✅ with_tags_max_sizeTime: ✅ 26.536µs (SLO: <40.000µs 📉 -33.7%) vs baseline: -0.1% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4% 🟡 iast_aspects - 40/40✅ re_expand_aspectTime: ✅ 34.001µs (SLO: <40.000µs 📉 -15.0%) vs baseline: +6.2% Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.7% ✅ re_expand_noaspectTime: ✅ 28.659µs (SLO: <40.000µs 📉 -28.4%) vs baseline: -0.3% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.8% ✅ re_findall_aspectTime: ✅ 2.914µs (SLO: <10.000µs 📉 -70.9%) vs baseline: ~same Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +4.7% ✅ re_findall_noaspectTime: ✅ 1.411µs (SLO: <10.000µs 📉 -85.9%) vs baseline: -0.5% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.8% ✅ re_finditer_aspectTime: ✅ 4.465µs (SLO: <10.000µs 📉 -55.4%) vs baseline: +1.2% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9% ✅ re_finditer_noaspectTime: ✅ 1.382µs (SLO: <10.000µs 📉 -86.2%) vs baseline: -2.4% Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +4.6% ✅ re_fullmatch_aspectTime: ✅ 2.692µs (SLO: <10.000µs 📉 -73.1%) vs baseline: -0.2% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +5.0% ✅ re_fullmatch_noaspectTime: ✅ 1.317µs (SLO: <10.000µs 📉 -86.8%) vs baseline: -1.0% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.8% ✅ re_group_aspectTime: ✅ 3.012µs (SLO: <10.000µs 📉 -69.9%) vs baseline: +2.1% Memory: ✅ 40.226MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.8% ✅ re_group_noaspectTime: ✅ 1.614µs (SLO: <10.000µs 📉 -83.9%) vs baseline: -0.5% Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.3% ✅ re_groups_aspectTime: ✅ 3.142µs (SLO: <10.000µs 📉 -68.6%) vs baseline: +2.9% Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.5% ✅ re_groups_noaspectTime: ✅ 1.706µs (SLO: <10.000µs 📉 -82.9%) vs baseline: ~same Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +4.3% ✅ re_match_aspectTime: ✅ 2.795µs (SLO: <10.000µs 📉 -72.1%) vs baseline: +2.9% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.8% ✅ re_match_noaspectTime: ✅ 1.329µs (SLO: <10.000µs 📉 -86.7%) vs baseline: +0.7% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +5.1% ✅ re_search_aspectTime: ✅ 2.580µs (SLO: <10.000µs 📉 -74.2%) vs baseline: +1.8% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.7% ✅ re_search_noaspectTime: ✅ 1.201µs (SLO: <10.000µs 📉 -88.0%) vs baseline: +1.3% Memory: ✅ 40.088MB (SLO: <41.000MB -2.2%) vs baseline: +4.3% ✅ re_sub_aspectTime: ✅ 3.532µs (SLO: <10.000µs 📉 -64.7%) vs baseline: +3.6% Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.9% ✅ re_sub_noaspectTime: ✅ 1.532µs (SLO: <10.000µs 📉 -84.7%) vs baseline: +1.0% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.7% ✅ re_subn_aspectTime: ✅ 3.662µs (SLO: <10.000µs 📉 -63.4%) vs baseline: +1.4% Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.1% ✅ re_subn_noaspectTime: ✅ 1.611µs (SLO: <10.000µs 📉 -83.9%) vs baseline: +0.4% Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9% 🟡 iastaspectssplit - 12/12✅ rsplit_aspectTime: ✅ 1.544µs (SLO: <10.000µs 📉 -84.6%) vs baseline: +8.9% Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.0% ✅ rsplit_noaspectTime: ✅ 0.583µs (SLO: <10.000µs 📉 -94.2%) vs baseline: +0.6% Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) ✅ split_aspectTime: ✅ 1.425µs (SLO: <10.000µs 📉 -85.7%) vs baseline: +1.1% Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.6% ✅ split_noaspectTime: ✅ 0.568µs (SLO: <10.000µs 📉 -94.3%) vs baseline: -0.5% Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9% ✅ splitlines_aspectTime: ✅ 1.412µs (SLO: <10.000µs 📉 -85.9%) vs baseline: +2.0% Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +5.0% ✅ splitlines_noaspectTime: ✅ 0.586µs (SLO: <10.000µs 📉 -94.1%) vs baseline: +0.2% Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.2% 🟡 otelspan - 22/22✅ add-eventTime: ✅ 39.749ms (SLO: <47.150ms 📉 -15.7%) vs baseline: ~same Memory: ✅ 39.500MB (SLO: <47.000MB 📉 -16.0%) vs baseline: +4.7% ✅ add-metricsTime: ✅ 261.984ms (SLO: <344.800ms 📉 -24.0%) vs baseline: +1.5% Memory: ✅ 43.841MB (SLO: <47.500MB -7.7%) vs baseline: +4.5% ✅ add-tagsTime: ✅ 315.773ms (SLO: <321.000ms 🟡 -1.6%) vs baseline: +0.5% Memory: ✅ 43.686MB (SLO: <47.500MB -8.0%) vs baseline: +4.4% ✅ get-contextTime: ✅ 79.758ms (SLO: <92.350ms 📉 -13.6%) vs baseline: -0.2% Memory: ✅ 39.636MB (SLO: <46.500MB 📉 -14.8%) vs baseline: +4.3% ✅ is-recordingTime: ✅ 37.322ms (SLO: <44.500ms 📉 -16.1%) vs baseline: +0.2% Memory: ✅ 39.445MB (SLO: <47.500MB 📉 -17.0%) vs baseline: +4.6% ✅ record-exceptionTime: ✅ 58.449ms (SLO: <67.650ms 📉 -13.6%) vs baseline: +0.1% Memory: ✅ 39.818MB (SLO: <47.000MB 📉 -15.3%) vs baseline: +4.1% ✅ set-statusTime: ✅ 43.478ms (SLO: <50.400ms 📉 -13.7%) vs baseline: -0.5% Memory: ✅ 39.422MB (SLO: <47.000MB 📉 -16.1%) vs baseline: +4.5% ✅ startTime: ✅ 37.418ms (SLO: <43.450ms 📉 -13.9%) vs baseline: +2.8% Memory: ✅ 39.351MB (SLO: <47.000MB 📉 -16.3%) vs baseline: +4.5% ✅ start-finishTime: ✅ 81.721ms (SLO: <88.000ms -7.1%) vs baseline: ~same Memory: ✅ 37.257MB (SLO: <46.500MB 📉 -19.9%) vs baseline: +4.3% ✅ start-finish-telemetryTime: ✅ 83.299ms (SLO: <89.000ms -6.4%) vs baseline: -0.1% Memory: ✅ 37.356MB (SLO: <46.500MB 📉 -19.7%) vs baseline: +4.7% ✅ update-nameTime: ✅ 37.974ms (SLO: <45.150ms 📉 -15.9%) vs baseline: -0.3% Memory: ✅ 39.600MB (SLO: <47.000MB 📉 -15.7%) vs baseline: +4.6% 🟡 packagespackageforrootmodulemapping - 4/4✅ cache_offTime: ✅ 342.137ms (SLO: <354.300ms -3.4%) vs baseline: -0.8% Memory: ✅ 40.697MB (SLO: <41.500MB 🟡 -1.9%) vs baseline: +4.8% ✅ cache_onTime: ✅ 0.382µs (SLO: <10.000µs 📉 -96.2%) vs baseline: -0.7% Memory: ✅ 38.681MB (SLO: <41.000MB -5.7%) vs baseline: +4.4% 🟡 ratelimiter - 12/12✅ defaultsTime: ✅ 2.351µs (SLO: <10.000µs 📉 -76.5%) vs baseline: -0.7% Memory: ✅ 35.055MB (SLO: <35.500MB 🟡 -1.3%) vs baseline: +4.4% ✅ high_rate_limitTime: ✅ 2.429µs (SLO: <10.000µs 📉 -75.7%) vs baseline: +0.8% Memory: ✅ 35.154MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.6% ✅ long_windowTime: ✅ 2.358µs (SLO: <10.000µs 📉 -76.4%) vs baseline: +0.2% Memory: ✅ 35.134MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.5% ✅ low_rate_limitTime: ✅ 2.364µs (SLO: <10.000µs 📉 -76.4%) vs baseline: -0.1% Memory: ✅ 35.075MB (SLO: <35.500MB 🟡 -1.2%) vs baseline: +4.6% ✅ no_rate_limitTime: ✅ 0.829µs (SLO: <10.000µs 📉 -91.7%) vs baseline: +0.8% Memory: ✅ 35.173MB (SLO: <35.500MB 🟡 -0.9%) vs baseline: +4.8% ✅ short_windowTime: ✅ 2.493µs (SLO: <10.000µs 📉 -75.1%) vs baseline: +0.4% Memory: ✅ 35.095MB (SLO: <35.500MB 🟡 -1.1%) vs baseline: +4.6% 🟡 recursivecomputation - 8/8✅ deepTime: ✅ 308.895ms (SLO: <320.950ms -3.8%) vs baseline: ~same Memory: ✅ 35.901MB (SLO: <36.500MB 🟡 -1.6%) vs baseline: +4.1% ✅ deep-profiledTime: ✅ 328.807ms (SLO: <359.150ms -8.4%) vs baseline: -1.9% Memory: ✅ 39.911MB (SLO: <40.500MB 🟡 -1.5%) vs baseline: +4.5% ✅ mediumTime: ✅ 7.005ms (SLO: <7.400ms -5.3%) vs baseline: +0.2% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4% ✅ shallowTime: ✅ 0.948ms (SLO: <1.050ms -9.7%) vs baseline: +1.3% Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.1% 🟡 sethttpmeta - 32/32✅ all-disabledTime: ✅ 10.595µs (SLO: <20.000µs 📉 -47.0%) vs baseline: +0.8% Memory: ✅ 35.527MB (SLO: <36.000MB 🟡 -1.3%) vs baseline: +4.9% ✅ all-enabledTime: ✅ 41.137µs (SLO: <50.000µs 📉 -17.7%) vs baseline: +3.0% Memory: ✅ 35.350MB (SLO: <36.000MB 🟡 -1.8%) vs baseline: +4.6% ✅ collectipvariant_existsTime: ✅ 41.121µs (SLO: <50.000µs 📉 -17.8%) vs baseline: +1.3% Memory: ✅ 35.468MB (SLO: <36.000MB 🟡 -1.5%) vs baseline: +4.3% ✅ no-collectipvariantTime: ✅ 40.265µs (SLO: <50.000µs 📉 -19.5%) vs baseline: +1.0% Memory: ✅ 35.547MB (SLO: <36.000MB 🟡 -1.3%) vs baseline: +5.1% ✅ no-useragentvariantTime: ✅ 39.003µs (SLO: <50.000µs 📉 -22.0%) vs baseline: +0.8% Memory: ✅ 35.468MB (SLO: <36.000MB 🟡 -1.5%) vs baseline: +4.7% ✅ obfuscation-no-queryTime: ✅ 40.670µs (SLO: <50.000µs 📉 -18.7%) vs baseline: +0.8% Memory: ✅ 35.370MB (SLO: <36.000MB 🟡 -1.8%) vs baseline: +4.5% ✅ obfuscation-regular-case-explicit-queryTime: ✅ 75.756µs (SLO: <90.000µs 📉 -15.8%) vs baseline: ~same Memory: ✅ 35.724MB (SLO: <36.500MB -2.1%) vs baseline: +4.7% ✅ obfuscation-regular-case-implicit-queryTime: ✅ 76.630µs (SLO: <90.000µs 📉 -14.9%) vs baseline: +0.4% Memory: ✅ 35.704MB (SLO: <36.500MB -2.2%) vs baseline: +4.8% ✅ obfuscation-send-querystring-disabledTime: ✅ 154.011µs (SLO: <170.000µs -9.4%) vs baseline: -0.1% Memory: ✅ 35.566MB (SLO: <36.500MB -2.6%) vs baseline: +4.4% ✅ obfuscation-worst-case-explicit-queryTime: ✅ 148.926µs (SLO: <160.000µs -6.9%) vs baseline: +0.2% Memory: ✅ 35.684MB (SLO: <36.500MB -2.2%) vs baseline: +4.4% ✅ obfuscation-worst-case-implicit-queryTime: ✅ 154.594µs (SLO: <170.000µs -9.1%) vs baseline: -0.4% Memory: ✅ 35.606MB (SLO: <36.500MB -2.5%) vs baseline: +4.4% ✅ useragentvariant_exists_1Time: ✅ 39.773µs (SLO: <50.000µs 📉 -20.5%) vs baseline: +1.1% Memory: ✅ 35.350MB (SLO: <36.000MB 🟡 -1.8%) vs baseline: +4.7% ✅ useragentvariant_exists_2Time: ✅ 40.846µs (SLO: <50.000µs 📉 -18.3%) vs baseline: +0.9% Memory: ✅ 35.350MB (SLO: <36.000MB 🟡 -1.8%) vs baseline: +4.3% ✅ useragentvariant_exists_3Time: ✅ 40.330µs (SLO: <50.000µs 📉 -19.3%) vs baseline: +1.0% Memory: ✅ 35.448MB (SLO: <36.000MB 🟡 -1.5%) vs baseline: +4.8% ✅ useragentvariant_not_exists_1Time: ✅ 39.609µs (SLO: <50.000µs 📉 -20.8%) vs baseline: +0.6% Memory: ✅ 35.448MB (SLO: <36.000MB 🟡 -1.5%) vs baseline: +4.6% ✅ useragentvariant_not_exists_2Time: ✅ 39.645µs (SLO: <50.000µs 📉 -20.7%) vs baseline: +0.8% Memory: ✅ 35.586MB (SLO: <36.000MB 🟡 -1.1%) vs baseline: +5.3% 🟡 span - 26/26✅ add-eventTime: ✅ 18.315ms (SLO: <22.500ms 📉 -18.6%) vs baseline: +0.6% Memory: ✅ 36.941MB (SLO: <53.000MB 📉 -30.3%) vs baseline: +4.7% ✅ add-metricsTime: ✅ 88.415ms (SLO: <93.500ms -5.4%) vs baseline: +0.4% Memory: ✅ 41.078MB (SLO: <53.000MB 📉 -22.5%) vs baseline: +4.4% ✅ add-tagsTime: ✅ 142.526ms (SLO: <155.000ms -8.0%) vs baseline: +0.7% Memory: ✅ 41.062MB (SLO: <53.000MB 📉 -22.5%) vs baseline: +4.6% ✅ get-contextTime: ✅ 17.047ms (SLO: <20.500ms 📉 -16.8%) vs baseline: -0.3% Memory: ✅ 36.789MB (SLO: <53.000MB 📉 -30.6%) vs baseline: +5.0% ✅ is-recordingTime: ✅ 17.247ms (SLO: <20.500ms 📉 -15.9%) vs baseline: -0.5% Memory: ✅ 36.804MB (SLO: <53.000MB 📉 -30.6%) vs baseline: +4.5% ✅ record-exceptionTime: ✅ 36.772ms (SLO: <40.000ms -8.1%) vs baseline: +0.5% Memory: ✅ 37.276MB (SLO: <53.000MB 📉 -29.7%) vs baseline: +4.3% ✅ set-statusTime: ✅ 18.791ms (SLO: <22.000ms 📉 -14.6%) vs baseline: +0.4% Memory: ✅ 36.686MB (SLO: <53.000MB 📉 -30.8%) vs baseline: +4.3% ✅ startTime: ✅ 17.482ms (SLO: <20.500ms 📉 -14.7%) vs baseline: +4.1% Memory: ✅ 36.748MB (SLO: <53.000MB 📉 -30.7%) vs baseline: +4.5% ✅ start-finishTime: ✅ 50.996ms (SLO: <52.500ms -2.9%) vs baseline: -0.3% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.5% ✅ start-finish-telemetryTime: ✅ 52.361ms (SLO: <54.500ms -3.9%) vs baseline: +0.2% Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.6% ✅ start-finish-traceid128Time: ✅ 54.224ms (SLO: <57.000ms -4.9%) vs baseline: ~same Memory: ✅ 34.662MB (SLO: <35.500MB -2.4%) vs baseline: +4.3% ✅ start-traceid128Time: ✅ 17.292ms (SLO: <22.500ms 📉 -23.1%) vs baseline: +0.5% Memory: ✅ 36.671MB (SLO: <53.000MB 📉 -30.8%) vs baseline: +4.3% ✅ update-nameTime: ✅ 17.434ms (SLO: <22.000ms 📉 -20.8%) vs baseline: +0.6% Memory: ✅ 36.718MB (SLO: <53.000MB 📉 -30.7%) vs baseline: +4.3% 🟡 tracer - 6/6✅ largeTime: ✅ 29.340ms (SLO: <32.950ms 📉 -11.0%) vs baseline: +0.6% Memory: ✅ 35.999MB (SLO: <36.500MB 🟡 -1.4%) vs baseline: +4.7% ✅ mediumTime: ✅ 2.897ms (SLO: <3.200ms -9.5%) vs baseline: +0.4% Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.3% ✅ smallTime: ✅ 333.614µs (SLO: <370.000µs -9.8%) vs baseline: +1.9% Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.3% 📉 Performance Improvements (1 suite)📉 djangosimple - 30/30✅ appsecTime: ✅ 19.574ms (SLO: <22.300ms 📉 -12.2%) vs baseline: ~same Memory: ✅ 67.889MB (SLO: <70.500MB -3.7%) vs baseline: +4.1% ✅ exception-replay-enabledTime: ✅ 1.357ms (SLO: <1.450ms -6.4%) vs baseline: -0.1% Memory: ✅ 66.117MB (SLO: <67.500MB -2.0%) vs baseline: +4.6% ✅ iastTime: ✅ 19.597ms (SLO: <22.250ms 📉 -11.9%) vs baseline: -0.2% Memory: ✅ 67.849MB (SLO: <70.000MB -3.1%) vs baseline: +4.2% ✅ profilerTime: ✅ 15.400ms (SLO: <16.550ms -7.0%) vs baseline: -2.1% Memory: ✅ 56.313MB (SLO: <57.500MB -2.1%) vs baseline: +4.6% ✅ resource-renamingTime: ✅ 19.476ms (SLO: <21.750ms 📉 -10.5%) vs baseline: -0.3% Memory: ✅ 68.046MB (SLO: <70.500MB -3.5%) vs baseline: +4.4% ✅ span-code-originTime: ✅ 19.980ms (SLO: <28.200ms 📉 -29.2%) vs baseline: 📉 -14.4% Memory: ✅ 67.962MB (SLO: <71.000MB -4.3%) vs baseline: +2.8% ✅ tracerTime: ✅ 19.548ms (SLO: <21.750ms 📉 -10.1%) vs baseline: ~same Memory: ✅ 67.849MB (SLO: <70.000MB -3.1%) vs baseline: +4.3% ✅ tracer-and-profilerTime: ✅ 21.797ms (SLO: <23.500ms -7.2%) vs baseline: -0.2% Memory: ✅ 69.226MB (SLO: <71.000MB -2.5%) vs baseline: +4.7% ✅ tracer-dont-create-db-spansTime: ✅ 19.674ms (SLO: <21.500ms -8.5%) vs baseline: +0.5% Memory: ✅ 67.790MB (SLO: <70.000MB -3.2%) vs baseline: +3.9% ✅ tracer-minimalTime: ✅ 16.793ms (SLO: <17.500ms -4.0%) vs baseline: +0.2% Memory: ✅ 67.810MB (SLO: <70.000MB -3.1%) vs baseline: +4.6% ✅ tracer-nativeTime: ✅ 19.478ms (SLO: <21.750ms 📉 -10.4%) vs baseline: ~same Memory: ✅ 67.889MB (SLO: <72.500MB -6.4%) vs baseline: +4.3% ✅ tracer-no-cachesTime: ✅ 17.585ms (SLO: <19.650ms 📉 -10.5%) vs baseline: -0.1% Memory: ✅ 67.790MB (SLO: <70.000MB -3.2%) vs baseline: +4.6% ✅ tracer-no-databasesTime: ✅ 19.114ms (SLO: <20.100ms -4.9%) vs baseline: ~same Memory: ✅ 67.731MB (SLO: <70.000MB -3.2%) vs baseline: +4.5% ✅ tracer-no-middlewareTime: ✅ 19.296ms (SLO: <21.500ms 📉 -10.3%) vs baseline: -0.3% Memory: ✅ 67.889MB (SLO: <70.000MB -3.0%) vs baseline: +4.9% ✅ tracer-no-templatesTime: ✅ 19.483ms (SLO: <22.000ms 📉 -11.4%) vs baseline: +0.8% Memory: ✅ 67.830MB (SLO: <70.500MB -3.8%) vs baseline: +4.6%
|
ddtrace/llmobs/_llmobs.py
Outdated
| ml_app: Optional[str] = None, | ||
| from_timestamp: Optional[str] = None, | ||
| to_timestamp: Optional[str] = None, | ||
| evaluations: Optional[List[Callable[[Dict[str, Any]], LLMObsEvaluationResult]]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments here:
-
Was wondering if we should allow these evaluation functions to be asynchronous methods or if this is fine.
-
I wasn't sure what the return type should be for the
evaluations. I do not think we want users to directly create their own evaluation metric, so I made a newLLMObsEvaluationResultdictionary that contains all the info we have users pass intosubmit_evaluation. Then we handle building and validating the evaluation metric from that info, but open to suggestions on this design!
brettlangdon
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
release note lgtm
sabrenner
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really cool! just a couple comments for now, i didn't really look at the tests yet, but the source files mostly lgtm
| pass | ||
|
|
||
|
|
||
| class LLMObsEvaluationResult(TypedDict, total=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a type we expect users to return? if so might make more sense to put in our types module
| span_id: Optional[str] = None, | ||
| trace_id: Optional[str] = None, | ||
| tags: Optional[Dict[str, str]] = None, | ||
| span_kind: Optional[str] = None, | ||
| span_name: Optional[str] = None, | ||
| ml_app: Optional[str] = None, | ||
| from_timestamp: Optional[str] = None, | ||
| to_timestamp: Optional[str] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we enforce at least one of these below? agree they should be optional but maybe we want to enforce either span_id/trace_id or tags to give the user a sense of direction for how to most effectively use run_evaluations. just a thought tho and maybe i missed a point elsewhere about the constraints we wanna set for this method
| pass | ||
|
|
||
|
|
||
| class LLMObsExportSpansClient: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know this isn't a "writer" but i feel it somehow makes more sense to live in _writer.py. i could just be thinking about it wrong tho so feel free to ignore but wanna know what you think/if you already gave it some thought!
| if not self._api_key or not self._app_key: | ||
| raise ValueError("Both an API key and an APP key are required to make requests to the LLMObs Export API") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could probably do this earlier, in the constructor maybe
| ) | ||
| finally: | ||
| metric_type = evaluation_result.get("metric_type") or "" | ||
| telemetry.record_llmobs_submit_evaluation(join_on, metric_type, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a thought - maybe we wanna have an extra tag for this submit evaluation telemetry signal that it happened through the run_evaluations method, maybe like source:run_evaluations. if you also think that's a good idea, we'd have to update the metric definition in the backend to accept a new tag, although if we're worried about cardinality for that metric we could make a new one.
either way i think it might be nice to have a signal of how many people are using this run_evaluations method vs the normal submit_evaluations one
| """ | ||
| Tests that enqueuing evaluation metrics works for multiple spans and evaluations. | ||
| """ | ||
| mock_export_spans.return_value = mock_exported_spans() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we might be able to leverage the vcr from the testagent like we do for experiments tests if you wanted to try that out 👀 (ref, ref)
i'd be happy to help w that if you wanted to try that and you had any questions! otherwise i don't have any explicit problem with doing mocking this way since it's unlikely we'll change the search api i'm guessing
| "trace_id": exported_span.get("trace_id"), | ||
| }, | ||
| } | ||
| metric_type = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we don't need this here right, we just use it below in the finally block, or am i missing another place?
Description
This PR adds a
run_evaluationsmethod to the LLMObs SDK. This method takes in one or more filters by which to retrieve spans (via our Export API) as well as some number of evaluation functions to run against those spans. Once the spans are retrieved, the evaluations will be run and submitted automatically to our intake. See this doc for more details.If there is an error exporting spans from the Export API, the error will be logged and an exception is raised. However, if there is an error running or submitting an evaluation for a particular span, the error will be logged and no exception will be raised. This is to avoid a situation where one faulty evaluation prevents all other evals from being submitted.
Other effects:
submit_evaluationwhich I did so I could pull out the common logic for building an evaluation metric event.Manual Testing
To manually test this feature, I submitted a span using the following code:
I then made requests to
run_evaluationsusing various filters and observed that all of the evaluations showed up in the UI.I then confirmed that all of the evaluation metrics showed up in the UI attached to the correct span.

Testing
Risks
Additional Notes