Skip to content

Conversation

@ncybul
Copy link
Contributor

@ncybul ncybul commented Nov 26, 2025

Description

This PR adds a run_evaluations method to the LLMObs SDK. This method takes in one or more filters by which to retrieve spans (via our Export API) as well as some number of evaluation functions to run against those spans. Once the spans are retrieved, the evaluations will be run and submitted automatically to our intake. See this doc for more details.

If there is an error exporting spans from the Export API, the error will be logged and an exception is raised. However, if there is an error running or submitting an evaluation for a particular span, the error will be logged and no exception will be raised. This is to avoid a situation where one faulty evaluation prevents all other evals from being submitted.

Other effects:

  • This changes some of the error types raised in submit_evaluation which I did so I could pull out the common logic for building an evaluation metric event.

Manual Testing

To manually test this feature, I submitted a span using the following code:

from ddtrace.llmobs import LLMObs
from ddtrace.llmobs.decorators import llm

LLMObs.enable(
  ml_app="nicole-test",
  api_key=<REDACTED>,
  app_key=<REDACTED>,
)


@llm(model_name="gpt-4o-mini", model_provider="openai", name="llm_call_enriched")
def llm_call_enriched():
  LLMObs.annotate(
      input_data={"content": "hi", "role": "user"},
      output_data={"content": "hello", "role": "assistant"},
      metadata={
        "test-key": "test-value",
      },
      metrics={
        "input_tokens": 10,
        "output_tokens": 10,
        "total_tokens": 20,
      },
      tags={"test-key": "test-value"},
      tool_definitions=[
        {
          "name": "test-tool",
          "description": "A test tool",
          "schema": {
            "test-key": "test-value",
          },
        }
      ]
  )

llm_call_enriched()

I then made requests to run_evaluations using various filters and observed that all of the evaluations showed up in the UI.

from typing import Any, Callable, Dict
from ddtrace.llmobs import LLMObs, LLMObsEvaluationResult
import asyncio

LLMObs.enable(
  ml_app="nicole-test",
  api_key=<REDACTED>,
  app_key=<REDACTED>,
)

def generate_eval(value: float) -> Callable[Dict[str, Any], LLMObsEvaluationResult]:
    def sample_score_evaluation(exported_span: Dict[str, Any]) -> LLMObsEvaluationResult:
        return LLMObsEvaluationResult(
            metric_type="score",
            label=f"accuracy-{value}",
            value=value,
            ml_app=exported_span.get("ml_app"),
            tags={"test-key": "test-value"},
            assessment="pass",
            reasoning="The answer is correct.",
            metadata={"test-key": "test-value"},
        )
    return sample_score_evaluation


async def main():
    await LLMObs.run_evaluations(span_id="13062994909689029297", ml_app="nicole-test", evaluations=[generate_eval(1)])
    await LLMObs.run_evaluations(trace_id="692f20c900000000b74067158093b54c", ml_app="nicole-test", evaluations=[generate_eval(2)])
    await LLMObs.run_evaluations(tags={"test-key": "test-value"}, ml_app="nicole-test", evaluations=[generate_eval(3)])
    await LLMObs.run_evaluations(span_kind="llm", ml_app="nicole-test", evaluations=[generate_eval(4)])
    await LLMObs.run_evaluations(span_name="llm_call_enriched", ml_app="nicole-test", evaluations=[generate_eval(5)])
    await LLMObs.run_evaluations(from_timestamp="now-15m", to_timestamp="now", ml_app="nicole-test", evaluations=[generate_eval(6)])

if __name__ == "__main__":
    asyncio.run(main())


I then confirmed that all of the evaluation metrics showed up in the UI attached to the correct span.
image

Testing

Risks

Additional Notes

@github-actions
Copy link
Contributor

github-actions bot commented Nov 26, 2025

CODEOWNERS have been resolved as:

releasenotes/notes/add-llmobs-run-evaluations-api-69fff2d86a174111.yaml  @DataDog/apm-python
ddtrace/llmobs/__init__.py                                              @DataDog/ml-observability
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/llmobs/_utils.py                                                @DataDog/ml-observability
ddtrace/llmobs/_writer.py                                               @DataDog/ml-observability
tests/llmobs/_utils.py                                                  @DataDog/ml-observability
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability
tests/llmobs/test_utils.py                                              @DataDog/ml-observability

@github-actions
Copy link
Contributor

github-actions bot commented Nov 26, 2025

Bootstrap import analysis

Comparison of import times between this PR and base.

Summary

The average import time from this PR is: 218 ± 3 ms.

The average import time from base is: 218 ± 3 ms.

The import time difference between this PR and base is: -0.1 ± 0.1 ms.

The difference is not statistically significant (z = -0.67).

Import time breakdown

The following import paths have shrunk:

ddtrace.auto 1.886 ms (0.87%)
ddtrace 0.967 ms (0.44%)
ddtrace._logger 0.475 ms (0.22%)
ddtrace.internal.telemetry 0.475 ms (0.22%)
ddtrace.internal.telemetry.writer 0.475 ms (0.22%)
ddtrace.internal.utils.version 0.475 ms (0.22%)
ddtrace.version 0.475 ms (0.22%)
ddtrace.internal._unpatched 0.034 ms (0.02%)
json 0.034 ms (0.02%)
json.decoder 0.034 ms (0.02%)
re 0.034 ms (0.02%)
enum 0.034 ms (0.02%)
types 0.034 ms (0.02%)
ddtrace.bootstrap.sitecustomize 0.920 ms (0.42%)
ddtrace.bootstrap.preload 0.920 ms (0.42%)
ddtrace.internal.remoteconfig.client 0.508 ms (0.23%)

@pr-commenter
Copy link

pr-commenter bot commented Nov 26, 2025

Performance SLOs

Comparing candidate nicole-cybul/custom-eval-api (a0c2fd1) with baseline main (8851ec9)

📈 Performance Regressions (3 suites)
📈 iastaspects - 118/118

✅ add_aspect

Time: ✅ 0.405µs (SLO: <10.000µs 📉 -96.0%) vs baseline: ~same

Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.4%


✅ add_inplace_aspect

Time: ✅ 0.404µs (SLO: <10.000µs 📉 -96.0%) vs baseline: -1.3%

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.8%


✅ add_inplace_noaspect

Time: ✅ 0.317µs (SLO: <10.000µs 📉 -96.8%) vs baseline: -2.9%

Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.4%


✅ add_noaspect

Time: ✅ 0.276µs (SLO: <10.000µs 📉 -97.2%) vs baseline: +0.2%

Memory: ✅ 40.364MB (SLO: <41.500MB -2.7%) vs baseline: +5.1%


✅ bytearray_aspect

Time: ✅ 1.358µs (SLO: <10.000µs 📉 -86.4%) vs baseline: +3.3%

Memory: ✅ 40.324MB (SLO: <41.500MB -2.8%) vs baseline: +5.2%


✅ bytearray_extend_aspect

Time: ✅ 1.493µs (SLO: <10.000µs 📉 -85.1%) vs baseline: -0.3%

Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.9%


✅ bytearray_extend_noaspect

Time: ✅ 0.614µs (SLO: <10.000µs 📉 -93.9%) vs baseline: +0.2%

Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.7%


✅ bytearray_noaspect

Time: ✅ 0.480µs (SLO: <10.000µs 📉 -95.2%) vs baseline: -0.6%

Memory: ✅ 40.226MB (SLO: <41.500MB -3.1%) vs baseline: +4.8%


✅ bytes_aspect

Time: ✅ 1.294µs (SLO: <10.000µs 📉 -87.1%) vs baseline: +1.6%

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.8%


✅ bytes_noaspect

Time: ✅ 0.493µs (SLO: <10.000µs 📉 -95.1%) vs baseline: -0.8%

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.7%


✅ bytesio_aspect

Time: ✅ 1.350µs (SLO: <10.000µs 📉 -86.5%) vs baseline: +2.3%

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.7%


✅ bytesio_noaspect

Time: ✅ 0.495µs (SLO: <10.000µs 📉 -95.1%) vs baseline: +0.3%

Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.6%


✅ capitalize_aspect

Time: ✅ 0.743µs (SLO: <10.000µs 📉 -92.6%) vs baseline: +1.0%

Memory: ✅ 40.246MB (SLO: <41.500MB -3.0%) vs baseline: +4.8%


✅ capitalize_noaspect

Time: ✅ 0.437µs (SLO: <10.000µs 📉 -95.6%) vs baseline: +0.4%

Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.9%


✅ casefold_aspect

Time: ✅ 0.734µs (SLO: <10.000µs 📉 -92.7%) vs baseline: ~same

Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.6%


✅ casefold_noaspect

Time: ✅ 0.366µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -2.1%

Memory: ✅ 40.088MB (SLO: <41.500MB -3.4%) vs baseline: +4.5%


✅ decode_aspect

Time: ✅ 0.727µs (SLO: <10.000µs 📉 -92.7%) vs baseline: +0.4%

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.9%


✅ decode_noaspect

Time: ✅ 0.416µs (SLO: <10.000µs 📉 -95.8%) vs baseline: ~same

Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.6%


✅ encode_aspect

Time: ✅ 0.708µs (SLO: <10.000µs 📉 -92.9%) vs baseline: -0.2%

Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.6%


✅ encode_noaspect

Time: ✅ 0.401µs (SLO: <10.000µs 📉 -96.0%) vs baseline: +0.4%

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.8%


✅ format_aspect

Time: ✅ 3.417µs (SLO: <10.000µs 📉 -65.8%) vs baseline: +1.2%

Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.9%


✅ format_map_aspect

Time: ✅ 3.543µs (SLO: <10.000µs 📉 -64.6%) vs baseline: -0.7%

Memory: ✅ 40.344MB (SLO: <41.500MB -2.8%) vs baseline: +4.8%


✅ format_map_noaspect

Time: ✅ 0.770µs (SLO: <10.000µs 📉 -92.3%) vs baseline: -0.4%

Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.5%


✅ format_noaspect

Time: ✅ 0.596µs (SLO: <10.000µs 📉 -94.0%) vs baseline: ~same

Memory: ✅ 40.285MB (SLO: <41.500MB -2.9%) vs baseline: +4.9%


✅ index_aspect

Time: ✅ 0.358µs (SLO: <10.000µs 📉 -96.4%) vs baseline: +0.5%

Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.5%


✅ index_noaspect

Time: ✅ 0.276µs (SLO: <10.000µs 📉 -97.2%) vs baseline: -0.6%

Memory: ✅ 40.285MB (SLO: <41.500MB -2.9%) vs baseline: +4.4%


✅ join_aspect

Time: ✅ 1.346µs (SLO: <10.000µs 📉 -86.5%) vs baseline: -2.7%

Memory: ✅ 40.324MB (SLO: <41.500MB -2.8%) vs baseline: +4.9%


✅ join_noaspect

Time: ✅ 0.491µs (SLO: <10.000µs 📉 -95.1%) vs baseline: -0.1%

Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.7%


✅ ljust_aspect

Time: ✅ 2.615µs (SLO: <20.000µs 📉 -86.9%) vs baseline: +5.2%

Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.0%


✅ ljust_noaspect

Time: ✅ 0.407µs (SLO: <10.000µs 📉 -95.9%) vs baseline: +0.4%

Memory: ✅ 40.324MB (SLO: <41.500MB -2.8%) vs baseline: +4.8%


✅ lower_aspect

Time: ✅ 2.304µs (SLO: <10.000µs 📉 -77.0%) vs baseline: +4.9%

Memory: ✅ 40.187MB (SLO: <41.500MB -3.2%) vs baseline: +4.7%


✅ lower_noaspect

Time: ✅ 0.367µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -0.2%

Memory: ✅ 40.246MB (SLO: <41.500MB -3.0%) vs baseline: +4.6%


✅ lstrip_aspect

Time: ✅ 2.274µs (SLO: <20.000µs 📉 -88.6%) vs baseline: +3.9%

Memory: ✅ 40.383MB (SLO: <41.500MB -2.7%) vs baseline: +4.7%


✅ lstrip_noaspect

Time: ✅ 0.380µs (SLO: <10.000µs 📉 -96.2%) vs baseline: -1.2%

Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +5.1%


✅ modulo_aspect

Time: ✅ 1.046µs (SLO: <10.000µs 📉 -89.5%) vs baseline: +5.2%

Memory: ✅ 40.088MB (SLO: <41.500MB -3.4%) vs baseline: +4.5%


✅ modulo_aspect_for_bytearray_bytearray

Time: ✅ 1.552µs (SLO: <10.000µs 📉 -84.5%) vs baseline: +0.2%

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.7%


✅ modulo_aspect_for_bytes

Time: ✅ 0.974µs (SLO: <10.000µs 📉 -90.3%) vs baseline: -4.4%

Memory: ✅ 40.246MB (SLO: <41.500MB -3.0%) vs baseline: +5.0%


✅ modulo_aspect_for_bytes_bytearray

Time: ✅ 1.239µs (SLO: <10.000µs 📉 -87.6%) vs baseline: -0.4%

Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.4%


✅ modulo_noaspect

Time: ✅ 0.629µs (SLO: <10.000µs 📉 -93.7%) vs baseline: -0.4%

Memory: ✅ 40.088MB (SLO: <41.500MB -3.4%) vs baseline: +4.4%


✅ replace_aspect

Time: ✅ 4.865µs (SLO: <10.000µs 📉 -51.4%) vs baseline: +1.8%

Memory: ✅ 40.364MB (SLO: <41.500MB -2.7%) vs baseline: +5.2%


✅ replace_noaspect

Time: ✅ 0.457µs (SLO: <10.000µs 📉 -95.4%) vs baseline: -0.2%

Memory: ✅ 40.226MB (SLO: <41.500MB -3.1%) vs baseline: +4.8%


✅ repr_aspect

Time: ✅ 0.909µs (SLO: <10.000µs 📉 -90.9%) vs baseline: +0.4%

Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.9%


✅ repr_noaspect

Time: ✅ 0.420µs (SLO: <10.000µs 📉 -95.8%) vs baseline: +0.6%

Memory: ✅ 40.167MB (SLO: <41.500MB -3.2%) vs baseline: +4.3%


✅ rstrip_aspect

Time: ✅ 1.924µs (SLO: <20.000µs 📉 -90.4%) vs baseline: +2.2%

Memory: ✅ 40.364MB (SLO: <41.500MB -2.7%) vs baseline: +4.9%


✅ rstrip_noaspect

Time: ✅ 0.383µs (SLO: <10.000µs 📉 -96.2%) vs baseline: +0.5%

Memory: ✅ 40.226MB (SLO: <41.500MB -3.1%) vs baseline: +4.7%


✅ slice_aspect

Time: ✅ 0.497µs (SLO: <10.000µs 📉 -95.0%) vs baseline: ~same

Memory: ✅ 40.226MB (SLO: <41.500MB -3.1%) vs baseline: +4.9%


✅ slice_noaspect

Time: ✅ 0.451µs (SLO: <10.000µs 📉 -95.5%) vs baseline: -0.4%

Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.8%


✅ stringio_aspect

Time: ✅ 1.785µs (SLO: <10.000µs 📉 -82.2%) vs baseline: 📈 +16.8%

Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.7%


✅ stringio_noaspect

Time: ✅ 0.719µs (SLO: <10.000µs 📉 -92.8%) vs baseline: ~same

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.7%


✅ strip_aspect

Time: ✅ 2.226µs (SLO: <20.000µs 📉 -88.9%) vs baseline: +1.7%

Memory: ✅ 40.285MB (SLO: <41.500MB -2.9%) vs baseline: +4.9%


✅ strip_noaspect

Time: ✅ 0.389µs (SLO: <10.000µs 📉 -96.1%) vs baseline: +1.3%

Memory: ✅ 40.265MB (SLO: <41.500MB -3.0%) vs baseline: +4.9%


✅ swapcase_aspect

Time: ✅ 2.508µs (SLO: <10.000µs 📉 -74.9%) vs baseline: +5.3%

Memory: ✅ 40.383MB (SLO: <41.500MB -2.7%) vs baseline: +5.2%


✅ swapcase_noaspect

Time: ✅ 0.536µs (SLO: <10.000µs 📉 -94.6%) vs baseline: -0.3%

Memory: ✅ 40.246MB (SLO: <41.500MB -3.0%) vs baseline: +4.7%


✅ title_aspect

Time: ✅ 2.414µs (SLO: <10.000µs 📉 -75.9%) vs baseline: +2.0%

Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.6%


✅ title_noaspect

Time: ✅ 0.501µs (SLO: <10.000µs 📉 -95.0%) vs baseline: -0.7%

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.6%


✅ translate_aspect

Time: ✅ 3.312µs (SLO: <10.000µs 📉 -66.9%) vs baseline: +4.5%

Memory: ✅ 40.344MB (SLO: <41.500MB -2.8%) vs baseline: +5.1%


✅ translate_noaspect

Time: ✅ 1.040µs (SLO: <10.000µs 📉 -89.6%) vs baseline: ~same

Memory: ✅ 40.167MB (SLO: <41.500MB -3.2%) vs baseline: +4.2%


✅ upper_aspect

Time: ✅ 2.298µs (SLO: <10.000µs 📉 -77.0%) vs baseline: +4.3%

Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.4%


✅ upper_noaspect

Time: ✅ 0.368µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -1.4%

Memory: ✅ 40.285MB (SLO: <41.500MB -2.9%) vs baseline: +4.9%


📈 iastaspectsospath - 24/24

✅ ospathbasename_aspect

Time: ✅ 5.147µs (SLO: <10.000µs 📉 -48.5%) vs baseline: 📈 +20.1%

Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.6%


✅ ospathbasename_noaspect

Time: ✅ 1.088µs (SLO: <10.000µs 📉 -89.1%) vs baseline: -0.1%

Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9%


✅ ospathjoin_aspect

Time: ✅ 6.167µs (SLO: <10.000µs 📉 -38.3%) vs baseline: +0.4%

Memory: ✅ 40.128MB (SLO: <41.000MB -2.1%) vs baseline: +4.3%


✅ ospathjoin_noaspect

Time: ✅ 2.286µs (SLO: <10.000µs 📉 -77.1%) vs baseline: -0.5%

Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.4%


✅ ospathnormcase_aspect

Time: ✅ 3.421µs (SLO: <10.000µs 📉 -65.8%) vs baseline: -3.6%

Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.1%


✅ ospathnormcase_noaspect

Time: ✅ 0.571µs (SLO: <10.000µs 📉 -94.3%) vs baseline: ~same

Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.5%


✅ ospathsplit_aspect

Time: ✅ 4.752µs (SLO: <10.000µs 📉 -52.5%) vs baseline: -2.9%

Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +5.0%


✅ ospathsplit_noaspect

Time: ✅ 1.598µs (SLO: <10.000µs 📉 -84.0%) vs baseline: +0.4%

Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.4%


✅ ospathsplitdrive_aspect

Time: ✅ 3.632µs (SLO: <10.000µs 📉 -63.7%) vs baseline: -2.1%

Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.9%


✅ ospathsplitdrive_noaspect

Time: ✅ 0.698µs (SLO: <10.000µs 📉 -93.0%) vs baseline: ~same

Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%)


✅ ospathsplitext_aspect

Time: ✅ 4.511µs (SLO: <10.000µs 📉 -54.9%) vs baseline: -1.6%

Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.7%


✅ ospathsplitext_noaspect

Time: ✅ 1.383µs (SLO: <10.000µs 📉 -86.2%) vs baseline: +0.6%

Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.0%


📈 telemetryaddmetric - 30/30

✅ 1-count-metric-1-times

Time: ✅ 3.380µs (SLO: <20.000µs 📉 -83.1%) vs baseline: 📈 +15.9%

Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.7%


✅ 1-count-metrics-100-times

Time: ✅ 200.753µs (SLO: <220.000µs -8.7%) vs baseline: -0.4%

Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.6%


✅ 1-distribution-metric-1-times

Time: ✅ 3.274µs (SLO: <20.000µs 📉 -83.6%) vs baseline: +0.7%

Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.3%


✅ 1-distribution-metrics-100-times

Time: ✅ 216.842µs (SLO: <230.000µs -5.7%) vs baseline: +1.7%

Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.7%


✅ 1-gauge-metric-1-times

Time: ✅ 2.137µs (SLO: <20.000µs 📉 -89.3%) vs baseline: -2.1%

Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.3%


✅ 1-gauge-metrics-100-times

Time: ✅ 136.315µs (SLO: <150.000µs -9.1%) vs baseline: ~same

Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.6%


✅ 1-rate-metric-1-times

Time: ✅ 3.073µs (SLO: <20.000µs 📉 -84.6%) vs baseline: +1.0%

Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.7%


✅ 1-rate-metrics-100-times

Time: ✅ 213.563µs (SLO: <250.000µs 📉 -14.6%) vs baseline: ~same

Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.9%


✅ 100-count-metrics-100-times

Time: ✅ 20.345ms (SLO: <22.000ms -7.5%) vs baseline: +0.2%

Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.6%


✅ 100-distribution-metrics-100-times

Time: ✅ 2.274ms (SLO: <2.300ms 🟡 -1.1%) vs baseline: +0.5%

Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +3.8%


✅ 100-gauge-metrics-100-times

Time: ✅ 1.413ms (SLO: <1.550ms -8.9%) vs baseline: +0.5%

Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.5%


✅ 100-rate-metrics-100-times

Time: ✅ 2.229ms (SLO: <2.550ms 📉 -12.6%) vs baseline: +0.4%

Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.3%


✅ flush-1-metric

Time: ✅ 4.583µs (SLO: <20.000µs 📉 -77.1%) vs baseline: +3.9%

Memory: ✅ 35.036MB (SLO: <35.500MB 🟡 -1.3%) vs baseline: +4.3%


✅ flush-100-metrics

Time: ✅ 173.269µs (SLO: <250.000µs 📉 -30.7%) vs baseline: -0.4%

Memory: ✅ 35.193MB (SLO: <35.500MB 🟡 -0.9%) vs baseline: +4.9%


✅ flush-1000-metrics

Time: ✅ 2.172ms (SLO: <2.500ms 📉 -13.1%) vs baseline: -0.9%

Memory: ✅ 35.960MB (SLO: <36.500MB 🟡 -1.5%) vs baseline: +4.7%

🟡 Near SLO Breach (15 suites)
🟡 errortrackingdjangosimple - 6/6

✅ errortracking-enabled-all

Time: ✅ 16.307ms (SLO: <19.850ms 📉 -17.9%) vs baseline: +0.3%

Memory: ✅ 69.803MB (SLO: <70.000MB 🟡 -0.3%) vs baseline: +4.8%


✅ errortracking-enabled-user

Time: ✅ 16.340ms (SLO: <19.400ms 📉 -15.8%) vs baseline: +0.5%

Memory: ✅ 69.861MB (SLO: <70.000MB 🟡 -0.2%) vs baseline: +4.9%


✅ tracer-enabled

Time: ✅ 16.342ms (SLO: <19.450ms 📉 -16.0%) vs baseline: ~same

Memory: ✅ 69.758MB (SLO: <70.000MB 🟡 -0.3%) vs baseline: +4.8%


🟡 errortrackingflasksqli - 6/6

✅ errortracking-enabled-all

Time: ✅ 2.066ms (SLO: <2.300ms 📉 -10.2%) vs baseline: +0.1%

Memory: ✅ 55.345MB (SLO: <56.500MB -2.0%) vs baseline: +4.4%


✅ errortracking-enabled-user

Time: ✅ 2.073ms (SLO: <2.250ms -7.9%) vs baseline: +0.1%

Memory: ✅ 55.384MB (SLO: <56.500MB 🟡 -2.0%) vs baseline: +4.2%


✅ tracer-enabled

Time: ✅ 2.063ms (SLO: <2.300ms 📉 -10.3%) vs baseline: ~same

Memory: ✅ 55.424MB (SLO: <56.500MB 🟡 -1.9%) vs baseline: +4.7%


🟡 flasksimple - 18/18

✅ appsec-get

Time: ✅ 3.383ms (SLO: <4.750ms 📉 -28.8%) vs baseline: ~same

Memory: ✅ 55.548MB (SLO: <66.500MB 📉 -16.5%) vs baseline: +4.7%


✅ appsec-post

Time: ✅ 2.855ms (SLO: <6.750ms 📉 -57.7%) vs baseline: ~same

Memory: ✅ 55.807MB (SLO: <66.500MB 📉 -16.1%) vs baseline: +4.6%


✅ appsec-telemetry

Time: ✅ 3.400ms (SLO: <4.750ms 📉 -28.4%) vs baseline: +0.8%

Memory: ✅ 55.552MB (SLO: <66.500MB 📉 -16.5%) vs baseline: +4.9%


✅ debugger

Time: ✅ 1.869ms (SLO: <2.000ms -6.5%) vs baseline: +0.3%

Memory: ✅ 47.901MB (SLO: <49.500MB -3.2%) vs baseline: +4.7%


✅ iast-get

Time: ✅ 1.855ms (SLO: <2.000ms -7.2%) vs baseline: ~same

Memory: ✅ 44.481MB (SLO: <49.000MB -9.2%) vs baseline: +4.2%


✅ profiler

Time: ✅ 1.904ms (SLO: <2.100ms -9.3%) vs baseline: -1.4%

Memory: ✅ 48.845MB (SLO: <50.000MB -2.3%) vs baseline: +4.9%


✅ resource-renaming

Time: ✅ 3.352ms (SLO: <3.650ms -8.2%) vs baseline: -0.6%

Memory: ✅ 55.472MB (SLO: <56.000MB 🟡 -0.9%) vs baseline: +4.5%


✅ tracer

Time: ✅ 3.370ms (SLO: <3.650ms -7.7%) vs baseline: -0.2%

Memory: ✅ 55.551MB (SLO: <56.500MB 🟡 -1.7%) vs baseline: +4.9%


✅ tracer-native

Time: ✅ 3.374ms (SLO: <3.650ms -7.6%) vs baseline: +0.2%

Memory: ✅ 55.508MB (SLO: <60.000MB -7.5%) vs baseline: +4.8%


🟡 flasksqli - 6/6

✅ appsec-enabled

Time: ✅ 2.063ms (SLO: <4.200ms 📉 -50.9%) vs baseline: ~same

Memory: ✅ 55.424MB (SLO: <66.000MB 📉 -16.0%) vs baseline: +4.8%


✅ iast-enabled

Time: ✅ 2.066ms (SLO: <2.800ms 📉 -26.2%) vs baseline: +0.1%

Memory: ✅ 55.365MB (SLO: <62.500MB 📉 -11.4%) vs baseline: +4.6%


✅ tracer-enabled

Time: ✅ 2.059ms (SLO: <2.250ms -8.5%) vs baseline: +0.1%

Memory: ✅ 55.384MB (SLO: <56.500MB 🟡 -2.0%) vs baseline: +4.6%


🟡 httppropagationextract - 60/60

✅ all_styles_all_headers

Time: ✅ 81.044µs (SLO: <100.000µs 📉 -19.0%) vs baseline: ~same

Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4%


✅ b3_headers

Time: ✅ 14.212µs (SLO: <20.000µs 📉 -28.9%) vs baseline: ~same

Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.1%


✅ b3_single_headers

Time: ✅ 13.350µs (SLO: <20.000µs 📉 -33.2%) vs baseline: +0.1%

Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.2%


✅ datadog_tracecontext_tracestate_not_propagated_on_trace_id_no_match

Time: ✅ 63.699µs (SLO: <80.000µs 📉 -20.4%) vs baseline: -0.4%

Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.2%


✅ datadog_tracecontext_tracestate_propagated_on_trace_id_match

Time: ✅ 69.479µs (SLO: <80.000µs 📉 -13.2%) vs baseline: +5.2%

Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.0%


✅ empty_headers

Time: ✅ 1.651µs (SLO: <10.000µs 📉 -83.5%) vs baseline: +2.1%

Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4%


✅ full_t_id_datadog_headers

Time: ✅ 22.533µs (SLO: <30.000µs 📉 -24.9%) vs baseline: -0.1%

Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.6%


✅ invalid_priority_header

Time: ✅ 6.531µs (SLO: <10.000µs 📉 -34.7%) vs baseline: ~same

Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.7%


✅ invalid_span_id_header

Time: ✅ 6.549µs (SLO: <10.000µs 📉 -34.5%) vs baseline: +0.1%

Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4%


✅ invalid_tags_header

Time: ✅ 6.570µs (SLO: <10.000µs 📉 -34.3%) vs baseline: +0.5%

Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.5%


✅ invalid_trace_id_header

Time: ✅ 6.508µs (SLO: <10.000µs 📉 -34.9%) vs baseline: -0.1%

Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.5%


✅ large_header_no_matches

Time: ✅ 27.685µs (SLO: <30.000µs -7.7%) vs baseline: +0.3%

Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.6%


✅ large_valid_headers_all

Time: ✅ 28.692µs (SLO: <40.000µs 📉 -28.3%) vs baseline: +0.2%

Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.2%


✅ medium_header_no_matches

Time: ✅ 9.908µs (SLO: <20.000µs 📉 -50.5%) vs baseline: ~same

Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.6%


✅ medium_valid_headers_all

Time: ✅ 11.303µs (SLO: <20.000µs 📉 -43.5%) vs baseline: ~same

Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.5%


✅ none_propagation_style

Time: ✅ 1.713µs (SLO: <10.000µs 📉 -82.9%) vs baseline: ~same

Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.4%


✅ tracecontext_headers

Time: ✅ 34.846µs (SLO: <40.000µs 📉 -12.9%) vs baseline: +0.3%

Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.3%


✅ valid_headers_all

Time: ✅ 6.572µs (SLO: <10.000µs 📉 -34.3%) vs baseline: +0.8%

Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4%


✅ valid_headers_basic

Time: ✅ 6.122µs (SLO: <10.000µs 📉 -38.8%) vs baseline: +0.7%

Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.3%


✅ wsgi_empty_headers

Time: ✅ 1.605µs (SLO: <10.000µs 📉 -83.9%) vs baseline: +0.3%

Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.3%


✅ wsgi_invalid_priority_header

Time: ✅ 6.594µs (SLO: <10.000µs 📉 -34.1%) vs baseline: +0.1%

Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.5%


✅ wsgi_invalid_span_id_header

Time: ✅ 1.637µs (SLO: <10.000µs 📉 -83.6%) vs baseline: +2.1%

Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.0%


✅ wsgi_invalid_tags_header

Time: ✅ 6.637µs (SLO: <10.000µs 📉 -33.6%) vs baseline: +0.5%

Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.3%


✅ wsgi_invalid_trace_id_header

Time: ✅ 6.598µs (SLO: <10.000µs 📉 -34.0%) vs baseline: ~same

Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.5%


✅ wsgi_large_header_no_matches

Time: ✅ 28.745µs (SLO: <40.000µs 📉 -28.1%) vs baseline: -0.2%

Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.3%


✅ wsgi_large_valid_headers_all

Time: ✅ 29.764µs (SLO: <40.000µs 📉 -25.6%) vs baseline: -0.1%

Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4%


✅ wsgi_medium_header_no_matches

Time: ✅ 10.176µs (SLO: <20.000µs 📉 -49.1%) vs baseline: +0.3%

Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.3%


✅ wsgi_medium_valid_headers_all

Time: ✅ 11.537µs (SLO: <20.000µs 📉 -42.3%) vs baseline: ~same

Memory: ✅ 34.957MB (SLO: <35.500MB 🟡 -1.5%) vs baseline: +4.6%


✅ wsgi_valid_headers_all

Time: ✅ 6.593µs (SLO: <10.000µs 📉 -34.1%) vs baseline: +0.5%

Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.2%


✅ wsgi_valid_headers_basic

Time: ✅ 6.157µs (SLO: <10.000µs 📉 -38.4%) vs baseline: +0.7%

Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4%


🟡 httppropagationinject - 16/16

✅ ids_only

Time: ✅ 22.119µs (SLO: <30.000µs 📉 -26.3%) vs baseline: +5.7%

Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4%


✅ with_all

Time: ✅ 28.045µs (SLO: <40.000µs 📉 -29.9%) vs baseline: +0.8%

Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.1%


✅ with_dd_origin

Time: ✅ 24.782µs (SLO: <30.000µs 📉 -17.4%) vs baseline: -0.2%

Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +3.9%


✅ with_priority_and_origin

Time: ✅ 24.477µs (SLO: <40.000µs 📉 -38.8%) vs baseline: +1.1%

Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +4.4%


✅ with_sampling_priority

Time: ✅ 21.246µs (SLO: <30.000µs 📉 -29.2%) vs baseline: +1.0%

Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%)


✅ with_tags

Time: ✅ 25.974µs (SLO: <40.000µs 📉 -35.1%) vs baseline: -0.4%

Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.2%


✅ with_tags_invalid

Time: ✅ 27.454µs (SLO: <40.000µs 📉 -31.4%) vs baseline: ~same

Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.3%


✅ with_tags_max_size

Time: ✅ 26.536µs (SLO: <40.000µs 📉 -33.7%) vs baseline: -0.1%

Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.4%


🟡 iast_aspects - 40/40

✅ re_expand_aspect

Time: ✅ 34.001µs (SLO: <40.000µs 📉 -15.0%) vs baseline: +6.2%

Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.7%


✅ re_expand_noaspect

Time: ✅ 28.659µs (SLO: <40.000µs 📉 -28.4%) vs baseline: -0.3%

Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.8%


✅ re_findall_aspect

Time: ✅ 2.914µs (SLO: <10.000µs 📉 -70.9%) vs baseline: ~same

Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +4.7%


✅ re_findall_noaspect

Time: ✅ 1.411µs (SLO: <10.000µs 📉 -85.9%) vs baseline: -0.5%

Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.8%


✅ re_finditer_aspect

Time: ✅ 4.465µs (SLO: <10.000µs 📉 -55.4%) vs baseline: +1.2%

Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9%


✅ re_finditer_noaspect

Time: ✅ 1.382µs (SLO: <10.000µs 📉 -86.2%) vs baseline: -2.4%

Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +4.6%


✅ re_fullmatch_aspect

Time: ✅ 2.692µs (SLO: <10.000µs 📉 -73.1%) vs baseline: -0.2%

Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +5.0%


✅ re_fullmatch_noaspect

Time: ✅ 1.317µs (SLO: <10.000µs 📉 -86.8%) vs baseline: -1.0%

Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.8%


✅ re_group_aspect

Time: ✅ 3.012µs (SLO: <10.000µs 📉 -69.9%) vs baseline: +2.1%

Memory: ✅ 40.226MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.8%


✅ re_group_noaspect

Time: ✅ 1.614µs (SLO: <10.000µs 📉 -83.9%) vs baseline: -0.5%

Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.3%


✅ re_groups_aspect

Time: ✅ 3.142µs (SLO: <10.000µs 📉 -68.6%) vs baseline: +2.9%

Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.5%


✅ re_groups_noaspect

Time: ✅ 1.706µs (SLO: <10.000µs 📉 -82.9%) vs baseline: ~same

Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +4.3%


✅ re_match_aspect

Time: ✅ 2.795µs (SLO: <10.000µs 📉 -72.1%) vs baseline: +2.9%

Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.8%


✅ re_match_noaspect

Time: ✅ 1.329µs (SLO: <10.000µs 📉 -86.7%) vs baseline: +0.7%

Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +5.1%


✅ re_search_aspect

Time: ✅ 2.580µs (SLO: <10.000µs 📉 -74.2%) vs baseline: +1.8%

Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.7%


✅ re_search_noaspect

Time: ✅ 1.201µs (SLO: <10.000µs 📉 -88.0%) vs baseline: +1.3%

Memory: ✅ 40.088MB (SLO: <41.000MB -2.2%) vs baseline: +4.3%


✅ re_sub_aspect

Time: ✅ 3.532µs (SLO: <10.000µs 📉 -64.7%) vs baseline: +3.6%

Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.9%


✅ re_sub_noaspect

Time: ✅ 1.532µs (SLO: <10.000µs 📉 -84.7%) vs baseline: +1.0%

Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.7%


✅ re_subn_aspect

Time: ✅ 3.662µs (SLO: <10.000µs 📉 -63.4%) vs baseline: +1.4%

Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +4.1%


✅ re_subn_noaspect

Time: ✅ 1.611µs (SLO: <10.000µs 📉 -83.9%) vs baseline: +0.4%

Memory: ✅ 40.246MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9%


🟡 iastaspectssplit - 12/12

✅ rsplit_aspect

Time: ✅ 1.544µs (SLO: <10.000µs 📉 -84.6%) vs baseline: +8.9%

Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +5.0%


✅ rsplit_noaspect

Time: ✅ 0.583µs (SLO: <10.000µs 📉 -94.2%) vs baseline: +0.6%

Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%)


✅ split_aspect

Time: ✅ 1.425µs (SLO: <10.000µs 📉 -85.7%) vs baseline: +1.1%

Memory: ✅ 40.147MB (SLO: <41.000MB -2.1%) vs baseline: +4.6%


✅ split_noaspect

Time: ✅ 0.568µs (SLO: <10.000µs 📉 -94.3%) vs baseline: -0.5%

Memory: ✅ 40.265MB (SLO: <41.000MB 🟡 -1.8%) vs baseline: +4.9%


✅ splitlines_aspect

Time: ✅ 1.412µs (SLO: <10.000µs 📉 -85.9%) vs baseline: +2.0%

Memory: ✅ 40.305MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +5.0%


✅ splitlines_noaspect

Time: ✅ 0.586µs (SLO: <10.000µs 📉 -94.1%) vs baseline: +0.2%

Memory: ✅ 40.187MB (SLO: <41.000MB 🟡 -2.0%) vs baseline: +4.2%


🟡 otelspan - 22/22

✅ add-event

Time: ✅ 39.749ms (SLO: <47.150ms 📉 -15.7%) vs baseline: ~same

Memory: ✅ 39.500MB (SLO: <47.000MB 📉 -16.0%) vs baseline: +4.7%


✅ add-metrics

Time: ✅ 261.984ms (SLO: <344.800ms 📉 -24.0%) vs baseline: +1.5%

Memory: ✅ 43.841MB (SLO: <47.500MB -7.7%) vs baseline: +4.5%


✅ add-tags

Time: ✅ 315.773ms (SLO: <321.000ms 🟡 -1.6%) vs baseline: +0.5%

Memory: ✅ 43.686MB (SLO: <47.500MB -8.0%) vs baseline: +4.4%


✅ get-context

Time: ✅ 79.758ms (SLO: <92.350ms 📉 -13.6%) vs baseline: -0.2%

Memory: ✅ 39.636MB (SLO: <46.500MB 📉 -14.8%) vs baseline: +4.3%


✅ is-recording

Time: ✅ 37.322ms (SLO: <44.500ms 📉 -16.1%) vs baseline: +0.2%

Memory: ✅ 39.445MB (SLO: <47.500MB 📉 -17.0%) vs baseline: +4.6%


✅ record-exception

Time: ✅ 58.449ms (SLO: <67.650ms 📉 -13.6%) vs baseline: +0.1%

Memory: ✅ 39.818MB (SLO: <47.000MB 📉 -15.3%) vs baseline: +4.1%


✅ set-status

Time: ✅ 43.478ms (SLO: <50.400ms 📉 -13.7%) vs baseline: -0.5%

Memory: ✅ 39.422MB (SLO: <47.000MB 📉 -16.1%) vs baseline: +4.5%


✅ start

Time: ✅ 37.418ms (SLO: <43.450ms 📉 -13.9%) vs baseline: +2.8%

Memory: ✅ 39.351MB (SLO: <47.000MB 📉 -16.3%) vs baseline: +4.5%


✅ start-finish

Time: ✅ 81.721ms (SLO: <88.000ms -7.1%) vs baseline: ~same

Memory: ✅ 37.257MB (SLO: <46.500MB 📉 -19.9%) vs baseline: +4.3%


✅ start-finish-telemetry

Time: ✅ 83.299ms (SLO: <89.000ms -6.4%) vs baseline: -0.1%

Memory: ✅ 37.356MB (SLO: <46.500MB 📉 -19.7%) vs baseline: +4.7%


✅ update-name

Time: ✅ 37.974ms (SLO: <45.150ms 📉 -15.9%) vs baseline: -0.3%

Memory: ✅ 39.600MB (SLO: <47.000MB 📉 -15.7%) vs baseline: +4.6%


🟡 packagespackageforrootmodulemapping - 4/4

✅ cache_off

Time: ✅ 342.137ms (SLO: <354.300ms -3.4%) vs baseline: -0.8%

Memory: ✅ 40.697MB (SLO: <41.500MB 🟡 -1.9%) vs baseline: +4.8%


✅ cache_on

Time: ✅ 0.382µs (SLO: <10.000µs 📉 -96.2%) vs baseline: -0.7%

Memory: ✅ 38.681MB (SLO: <41.000MB -5.7%) vs baseline: +4.4%


🟡 ratelimiter - 12/12

✅ defaults

Time: ✅ 2.351µs (SLO: <10.000µs 📉 -76.5%) vs baseline: -0.7%

Memory: ✅ 35.055MB (SLO: <35.500MB 🟡 -1.3%) vs baseline: +4.4%


✅ high_rate_limit

Time: ✅ 2.429µs (SLO: <10.000µs 📉 -75.7%) vs baseline: +0.8%

Memory: ✅ 35.154MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.6%


✅ long_window

Time: ✅ 2.358µs (SLO: <10.000µs 📉 -76.4%) vs baseline: +0.2%

Memory: ✅ 35.134MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.5%


✅ low_rate_limit

Time: ✅ 2.364µs (SLO: <10.000µs 📉 -76.4%) vs baseline: -0.1%

Memory: ✅ 35.075MB (SLO: <35.500MB 🟡 -1.2%) vs baseline: +4.6%


✅ no_rate_limit

Time: ✅ 0.829µs (SLO: <10.000µs 📉 -91.7%) vs baseline: +0.8%

Memory: ✅ 35.173MB (SLO: <35.500MB 🟡 -0.9%) vs baseline: +4.8%


✅ short_window

Time: ✅ 2.493µs (SLO: <10.000µs 📉 -75.1%) vs baseline: +0.4%

Memory: ✅ 35.095MB (SLO: <35.500MB 🟡 -1.1%) vs baseline: +4.6%


🟡 recursivecomputation - 8/8

✅ deep

Time: ✅ 308.895ms (SLO: <320.950ms -3.8%) vs baseline: ~same

Memory: ✅ 35.901MB (SLO: <36.500MB 🟡 -1.6%) vs baseline: +4.1%


✅ deep-profiled

Time: ✅ 328.807ms (SLO: <359.150ms -8.4%) vs baseline: -1.9%

Memory: ✅ 39.911MB (SLO: <40.500MB 🟡 -1.5%) vs baseline: +4.5%


✅ medium

Time: ✅ 7.005ms (SLO: <7.400ms -5.3%) vs baseline: +0.2%

Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.4%


✅ shallow

Time: ✅ 0.948ms (SLO: <1.050ms -9.7%) vs baseline: +1.3%

Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.1%


🟡 sethttpmeta - 32/32

✅ all-disabled

Time: ✅ 10.595µs (SLO: <20.000µs 📉 -47.0%) vs baseline: +0.8%

Memory: ✅ 35.527MB (SLO: <36.000MB 🟡 -1.3%) vs baseline: +4.9%


✅ all-enabled

Time: ✅ 41.137µs (SLO: <50.000µs 📉 -17.7%) vs baseline: +3.0%

Memory: ✅ 35.350MB (SLO: <36.000MB 🟡 -1.8%) vs baseline: +4.6%


✅ collectipvariant_exists

Time: ✅ 41.121µs (SLO: <50.000µs 📉 -17.8%) vs baseline: +1.3%

Memory: ✅ 35.468MB (SLO: <36.000MB 🟡 -1.5%) vs baseline: +4.3%


✅ no-collectipvariant

Time: ✅ 40.265µs (SLO: <50.000µs 📉 -19.5%) vs baseline: +1.0%

Memory: ✅ 35.547MB (SLO: <36.000MB 🟡 -1.3%) vs baseline: +5.1%


✅ no-useragentvariant

Time: ✅ 39.003µs (SLO: <50.000µs 📉 -22.0%) vs baseline: +0.8%

Memory: ✅ 35.468MB (SLO: <36.000MB 🟡 -1.5%) vs baseline: +4.7%


✅ obfuscation-no-query

Time: ✅ 40.670µs (SLO: <50.000µs 📉 -18.7%) vs baseline: +0.8%

Memory: ✅ 35.370MB (SLO: <36.000MB 🟡 -1.8%) vs baseline: +4.5%


✅ obfuscation-regular-case-explicit-query

Time: ✅ 75.756µs (SLO: <90.000µs 📉 -15.8%) vs baseline: ~same

Memory: ✅ 35.724MB (SLO: <36.500MB -2.1%) vs baseline: +4.7%


✅ obfuscation-regular-case-implicit-query

Time: ✅ 76.630µs (SLO: <90.000µs 📉 -14.9%) vs baseline: +0.4%

Memory: ✅ 35.704MB (SLO: <36.500MB -2.2%) vs baseline: +4.8%


✅ obfuscation-send-querystring-disabled

Time: ✅ 154.011µs (SLO: <170.000µs -9.4%) vs baseline: -0.1%

Memory: ✅ 35.566MB (SLO: <36.500MB -2.6%) vs baseline: +4.4%


✅ obfuscation-worst-case-explicit-query

Time: ✅ 148.926µs (SLO: <160.000µs -6.9%) vs baseline: +0.2%

Memory: ✅ 35.684MB (SLO: <36.500MB -2.2%) vs baseline: +4.4%


✅ obfuscation-worst-case-implicit-query

Time: ✅ 154.594µs (SLO: <170.000µs -9.1%) vs baseline: -0.4%

Memory: ✅ 35.606MB (SLO: <36.500MB -2.5%) vs baseline: +4.4%


✅ useragentvariant_exists_1

Time: ✅ 39.773µs (SLO: <50.000µs 📉 -20.5%) vs baseline: +1.1%

Memory: ✅ 35.350MB (SLO: <36.000MB 🟡 -1.8%) vs baseline: +4.7%


✅ useragentvariant_exists_2

Time: ✅ 40.846µs (SLO: <50.000µs 📉 -18.3%) vs baseline: +0.9%

Memory: ✅ 35.350MB (SLO: <36.000MB 🟡 -1.8%) vs baseline: +4.3%


✅ useragentvariant_exists_3

Time: ✅ 40.330µs (SLO: <50.000µs 📉 -19.3%) vs baseline: +1.0%

Memory: ✅ 35.448MB (SLO: <36.000MB 🟡 -1.5%) vs baseline: +4.8%


✅ useragentvariant_not_exists_1

Time: ✅ 39.609µs (SLO: <50.000µs 📉 -20.8%) vs baseline: +0.6%

Memory: ✅ 35.448MB (SLO: <36.000MB 🟡 -1.5%) vs baseline: +4.6%


✅ useragentvariant_not_exists_2

Time: ✅ 39.645µs (SLO: <50.000µs 📉 -20.7%) vs baseline: +0.8%

Memory: ✅ 35.586MB (SLO: <36.000MB 🟡 -1.1%) vs baseline: +5.3%


🟡 span - 26/26

✅ add-event

Time: ✅ 18.315ms (SLO: <22.500ms 📉 -18.6%) vs baseline: +0.6%

Memory: ✅ 36.941MB (SLO: <53.000MB 📉 -30.3%) vs baseline: +4.7%


✅ add-metrics

Time: ✅ 88.415ms (SLO: <93.500ms -5.4%) vs baseline: +0.4%

Memory: ✅ 41.078MB (SLO: <53.000MB 📉 -22.5%) vs baseline: +4.4%


✅ add-tags

Time: ✅ 142.526ms (SLO: <155.000ms -8.0%) vs baseline: +0.7%

Memory: ✅ 41.062MB (SLO: <53.000MB 📉 -22.5%) vs baseline: +4.6%


✅ get-context

Time: ✅ 17.047ms (SLO: <20.500ms 📉 -16.8%) vs baseline: -0.3%

Memory: ✅ 36.789MB (SLO: <53.000MB 📉 -30.6%) vs baseline: +5.0%


✅ is-recording

Time: ✅ 17.247ms (SLO: <20.500ms 📉 -15.9%) vs baseline: -0.5%

Memory: ✅ 36.804MB (SLO: <53.000MB 📉 -30.6%) vs baseline: +4.5%


✅ record-exception

Time: ✅ 36.772ms (SLO: <40.000ms -8.1%) vs baseline: +0.5%

Memory: ✅ 37.276MB (SLO: <53.000MB 📉 -29.7%) vs baseline: +4.3%


✅ set-status

Time: ✅ 18.791ms (SLO: <22.000ms 📉 -14.6%) vs baseline: +0.4%

Memory: ✅ 36.686MB (SLO: <53.000MB 📉 -30.8%) vs baseline: +4.3%


✅ start

Time: ✅ 17.482ms (SLO: <20.500ms 📉 -14.7%) vs baseline: +4.1%

Memory: ✅ 36.748MB (SLO: <53.000MB 📉 -30.7%) vs baseline: +4.5%


✅ start-finish

Time: ✅ 50.996ms (SLO: <52.500ms -2.9%) vs baseline: -0.3%

Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.5%


✅ start-finish-telemetry

Time: ✅ 52.361ms (SLO: <54.500ms -3.9%) vs baseline: +0.2%

Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.6%


✅ start-finish-traceid128

Time: ✅ 54.224ms (SLO: <57.000ms -4.9%) vs baseline: ~same

Memory: ✅ 34.662MB (SLO: <35.500MB -2.4%) vs baseline: +4.3%


✅ start-traceid128

Time: ✅ 17.292ms (SLO: <22.500ms 📉 -23.1%) vs baseline: +0.5%

Memory: ✅ 36.671MB (SLO: <53.000MB 📉 -30.8%) vs baseline: +4.3%


✅ update-name

Time: ✅ 17.434ms (SLO: <22.000ms 📉 -20.8%) vs baseline: +0.6%

Memory: ✅ 36.718MB (SLO: <53.000MB 📉 -30.7%) vs baseline: +4.3%


🟡 tracer - 6/6

✅ large

Time: ✅ 29.340ms (SLO: <32.950ms 📉 -11.0%) vs baseline: +0.6%

Memory: ✅ 35.999MB (SLO: <36.500MB 🟡 -1.4%) vs baseline: +4.7%


✅ medium

Time: ✅ 2.897ms (SLO: <3.200ms -9.5%) vs baseline: +0.4%

Memory: ✅ 34.800MB (SLO: <35.500MB 🟡 -2.0%) vs baseline: +4.3%


✅ small

Time: ✅ 333.614µs (SLO: <370.000µs -9.8%) vs baseline: +1.9%

Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.3%

📉 Performance Improvements (1 suite)
📉 djangosimple - 30/30

✅ appsec

Time: ✅ 19.574ms (SLO: <22.300ms 📉 -12.2%) vs baseline: ~same

Memory: ✅ 67.889MB (SLO: <70.500MB -3.7%) vs baseline: +4.1%


✅ exception-replay-enabled

Time: ✅ 1.357ms (SLO: <1.450ms -6.4%) vs baseline: -0.1%

Memory: ✅ 66.117MB (SLO: <67.500MB -2.0%) vs baseline: +4.6%


✅ iast

Time: ✅ 19.597ms (SLO: <22.250ms 📉 -11.9%) vs baseline: -0.2%

Memory: ✅ 67.849MB (SLO: <70.000MB -3.1%) vs baseline: +4.2%


✅ profiler

Time: ✅ 15.400ms (SLO: <16.550ms -7.0%) vs baseline: -2.1%

Memory: ✅ 56.313MB (SLO: <57.500MB -2.1%) vs baseline: +4.6%


✅ resource-renaming

Time: ✅ 19.476ms (SLO: <21.750ms 📉 -10.5%) vs baseline: -0.3%

Memory: ✅ 68.046MB (SLO: <70.500MB -3.5%) vs baseline: +4.4%


✅ span-code-origin

Time: ✅ 19.980ms (SLO: <28.200ms 📉 -29.2%) vs baseline: 📉 -14.4%

Memory: ✅ 67.962MB (SLO: <71.000MB -4.3%) vs baseline: +2.8%


✅ tracer

Time: ✅ 19.548ms (SLO: <21.750ms 📉 -10.1%) vs baseline: ~same

Memory: ✅ 67.849MB (SLO: <70.000MB -3.1%) vs baseline: +4.3%


✅ tracer-and-profiler

Time: ✅ 21.797ms (SLO: <23.500ms -7.2%) vs baseline: -0.2%

Memory: ✅ 69.226MB (SLO: <71.000MB -2.5%) vs baseline: +4.7%


✅ tracer-dont-create-db-spans

Time: ✅ 19.674ms (SLO: <21.500ms -8.5%) vs baseline: +0.5%

Memory: ✅ 67.790MB (SLO: <70.000MB -3.2%) vs baseline: +3.9%


✅ tracer-minimal

Time: ✅ 16.793ms (SLO: <17.500ms -4.0%) vs baseline: +0.2%

Memory: ✅ 67.810MB (SLO: <70.000MB -3.1%) vs baseline: +4.6%


✅ tracer-native

Time: ✅ 19.478ms (SLO: <21.750ms 📉 -10.4%) vs baseline: ~same

Memory: ✅ 67.889MB (SLO: <72.500MB -6.4%) vs baseline: +4.3%


✅ tracer-no-caches

Time: ✅ 17.585ms (SLO: <19.650ms 📉 -10.5%) vs baseline: -0.1%

Memory: ✅ 67.790MB (SLO: <70.000MB -3.2%) vs baseline: +4.6%


✅ tracer-no-databases

Time: ✅ 19.114ms (SLO: <20.100ms -4.9%) vs baseline: ~same

Memory: ✅ 67.731MB (SLO: <70.000MB -3.2%) vs baseline: +4.5%


✅ tracer-no-middleware

Time: ✅ 19.296ms (SLO: <21.500ms 📉 -10.3%) vs baseline: -0.3%

Memory: ✅ 67.889MB (SLO: <70.000MB -3.0%) vs baseline: +4.9%


✅ tracer-no-templates

Time: ✅ 19.483ms (SLO: <22.000ms 📉 -11.4%) vs baseline: +0.8%

Memory: ✅ 67.830MB (SLO: <70.500MB -3.8%) vs baseline: +4.6%

⚠️ Unstable Tests (2 suites)
⚠️ coreapiscenario - 10/10 (1 unstable)

⚠️ context_with_data_listeners

Time: ⚠️ 13.243µs (SLO: <20.000µs 📉 -33.8%) vs baseline: +0.2%

Memory: ✅ 34.642MB (SLO: <35.500MB -2.4%) vs baseline: +4.1%


✅ context_with_data_no_listeners

Time: ✅ 3.272µs (SLO: <10.000µs 📉 -67.3%) vs baseline: -0.6%

Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.4%


✅ get_item_exists

Time: ✅ 0.584µs (SLO: <10.000µs 📉 -94.2%) vs baseline: -1.7%

Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.4%


✅ get_item_missing

Time: ✅ 0.633µs (SLO: <10.000µs 📉 -93.7%) vs baseline: -1.0%

Memory: ✅ 34.721MB (SLO: <35.500MB -2.2%) vs baseline: +4.3%


✅ set_item

Time: ✅ 23.918µs (SLO: <30.000µs 📉 -20.3%) vs baseline: -1.8%

Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +4.6%


⚠️ packagesupdateimporteddependencies - 24/24 (1 unstable)

✅ import_many

Time: ✅ 154.711µs (SLO: <170.000µs -9.0%) vs baseline: -0.6%

Memory: ✅ 39.586MB (SLO: <41.000MB -3.4%) vs baseline: +4.5%


✅ import_many_cached

Time: ✅ 120.780µs (SLO: <130.000µs -7.1%) vs baseline: -0.1%

Memory: ✅ 39.672MB (SLO: <41.000MB -3.2%) vs baseline: +5.3%


✅ import_many_stdlib

Time: ✅ 0.753ms (SLO: <1.750ms 📉 -56.9%) vs baseline: -0.6%

Memory: ✅ 39.514MB (SLO: <41.000MB -3.6%) vs baseline: +4.3%


⚠️ import_many_stdlib_cached

Time: ⚠️ 0.172ms (SLO: <1.100ms 📉 -84.3%) vs baseline: +0.1%

Memory: ✅ 39.426MB (SLO: <41.000MB -3.8%) vs baseline: +3.9%


✅ import_many_unknown

Time: ✅ 838.254µs (SLO: <890.000µs -5.8%) vs baseline: +0.8%

Memory: ✅ 39.624MB (SLO: <41.000MB -3.4%) vs baseline: +4.6%


✅ import_many_unknown_cached

Time: ✅ 792.155µs (SLO: <870.000µs -8.9%) vs baseline: -0.8%

Memory: ✅ 39.711MB (SLO: <41.000MB -3.1%) vs baseline: +4.7%


✅ import_one

Time: ✅ 19.749µs (SLO: <30.000µs 📉 -34.2%) vs baseline: -1.1%

Memory: ✅ 39.658MB (SLO: <41.000MB -3.3%) vs baseline: +4.9%


✅ import_one_cache

Time: ✅ 6.290µs (SLO: <10.000µs 📉 -37.1%) vs baseline: +0.4%

Memory: ✅ 39.468MB (SLO: <41.000MB -3.7%) vs baseline: +4.0%


✅ import_one_stdlib

Time: ✅ 18.707µs (SLO: <20.000µs -6.5%) vs baseline: -0.6%

Memory: ✅ 39.559MB (SLO: <41.000MB -3.5%) vs baseline: +4.4%


✅ import_one_stdlib_cache

Time: ✅ 6.264µs (SLO: <10.000µs 📉 -37.4%) vs baseline: +0.1%

Memory: ✅ 39.526MB (SLO: <41.000MB -3.6%) vs baseline: +4.8%


✅ import_one_unknown

Time: ✅ 45.290µs (SLO: <50.000µs -9.4%) vs baseline: -0.4%

Memory: ✅ 39.559MB (SLO: <41.000MB -3.5%) vs baseline: +4.7%


✅ import_one_unknown_cache

Time: ✅ 6.260µs (SLO: <10.000µs 📉 -37.4%) vs baseline: -0.2%

Memory: ✅ 39.538MB (SLO: <41.000MB -3.6%) vs baseline: +4.7%

✅ All Tests Passing (3 suites)
iastpropagation - 8/8

✅ no-propagation

Time: ✅ 48.512µs (SLO: <60.000µs 📉 -19.1%) vs baseline: ~same

Memory: ✅ 40.167MB (SLO: <42.000MB -4.4%) vs baseline: +4.8%


✅ propagation_enabled

Time: ✅ 174.287µs (SLO: <190.000µs -8.3%) vs baseline: +4.6%

Memory: ✅ 40.187MB (SLO: <42.000MB -4.3%) vs baseline: +4.9%


✅ propagation_enabled_100

Time: ✅ 1.926ms (SLO: <2.300ms 📉 -16.3%) vs baseline: +3.2%

Memory: ✅ 40.069MB (SLO: <42.000MB -4.6%) vs baseline: +4.5%


✅ propagation_enabled_1000

Time: ✅ 32.625ms (SLO: <34.550ms -5.6%) vs baseline: +1.4%

Memory: ✅ 40.147MB (SLO: <42.000MB -4.4%) vs baseline: +4.9%


otelsdkspan - 24/24

✅ add-event

Time: ✅ 40.462ms (SLO: <42.000ms -3.7%) vs baseline: ~same

Memory: ✅ 37.375MB (SLO: <39.000MB -4.2%) vs baseline: +5.0%


✅ add-link

Time: ✅ 36.403ms (SLO: <38.550ms -5.6%) vs baseline: ~same

Memory: ✅ 37.473MB (SLO: <39.000MB -3.9%) vs baseline: +5.3%


✅ add-metrics

Time: ✅ 220.539ms (SLO: <232.000ms -4.9%) vs baseline: +1.2%

Memory: ✅ 37.316MB (SLO: <39.000MB -4.3%) vs baseline: +3.8%


✅ add-tags

Time: ✅ 210.911ms (SLO: <221.600ms -4.8%) vs baseline: -0.3%

Memory: ✅ 37.316MB (SLO: <39.000MB -4.3%) vs baseline: +3.8%


✅ get-context

Time: ✅ 29.273ms (SLO: <31.300ms -6.5%) vs baseline: +1.2%

Memory: ✅ 37.297MB (SLO: <39.000MB -4.4%) vs baseline: +4.0%


✅ is-recording

Time: ✅ 29.166ms (SLO: <31.000ms -5.9%) vs baseline: +0.3%

Memory: ✅ 37.297MB (SLO: <39.000MB -4.4%) vs baseline: +4.9%


✅ record-exception

Time: ✅ 63.115ms (SLO: <65.850ms -4.2%) vs baseline: +0.3%

Memory: ✅ 37.356MB (SLO: <39.000MB -4.2%) vs baseline: +4.0%


✅ set-status

Time: ✅ 31.880ms (SLO: <34.150ms -6.6%) vs baseline: -0.2%

Memory: ✅ 37.277MB (SLO: <39.000MB -4.4%) vs baseline: +4.4%


✅ start

Time: ✅ 29.350ms (SLO: <30.150ms -2.7%) vs baseline: +1.9%

Memory: ✅ 37.316MB (SLO: <39.000MB -4.3%) vs baseline: +4.6%


✅ start-finish

Time: ✅ 34.035ms (SLO: <35.350ms -3.7%) vs baseline: +0.5%

Memory: ✅ 37.297MB (SLO: <39.000MB -4.4%) vs baseline: +4.6%


✅ start-finish-telemetry

Time: ✅ 33.910ms (SLO: <35.450ms -4.3%) vs baseline: ~same

Memory: ✅ 37.473MB (SLO: <39.000MB -3.9%) vs baseline: +5.0%


✅ update-name

Time: ✅ 31.154ms (SLO: <33.400ms -6.7%) vs baseline: +0.7%

Memory: ✅ 37.336MB (SLO: <39.000MB -4.3%) vs baseline: +5.0%


samplingrules - 8/8

✅ average_match

Time: ✅ 136.957µs (SLO: <290.000µs 📉 -52.8%) vs baseline: -0.2%

Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +4.4%


✅ high_match

Time: ✅ 174.001µs (SLO: <480.000µs 📉 -63.7%) vs baseline: ~same

Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +4.7%


✅ low_match

Time: ✅ 99.039µs (SLO: <120.000µs 📉 -17.5%) vs baseline: -0.5%

Memory: ✅ 603.513MB (SLO: <700.000MB 📉 -13.8%) vs baseline: +4.8%


✅ very_low_match

Time: ✅ 2.678ms (SLO: <8.500ms 📉 -68.5%) vs baseline: +0.6%

Memory: ✅ 70.993MB (SLO: <75.000MB -5.3%) vs baseline: +4.7%

ℹ️ Scenarios Missing SLO Configuration (10 scenarios)

The following scenarios exist in candidate data but have no SLO thresholds configured:

  • coreapiscenario-core_dispatch_listeners
  • coreapiscenario-core_dispatch_no_listeners
  • coreapiscenario-core_dispatch_with_results_listeners
  • coreapiscenario-core_dispatch_with_results_no_listeners
  • djangosimple-baseline
  • errortrackingdjangosimple-baseline
  • errortrackingflasksqli-baseline
  • flasksimple-baseline
  • flasksqli-baseline
  • sethttpmeta-obfuscation-disabled

@ncybul ncybul changed the title add api for running evals on exported spans feat(llmobs): add api for running evals on exported spans Dec 2, 2025
@ncybul ncybul changed the title feat(llmobs): add api for running evals on exported spans feat(llmobs): [MLOB-4867] add api for running evals on exported spans Dec 2, 2025
@ncybul ncybul changed the title feat(llmobs): [MLOB-4867] add api for running evals on exported spans feat(llmobs): [MLOB-4687] add api for running evals on exported spans Dec 2, 2025
ml_app: Optional[str] = None,
from_timestamp: Optional[str] = None,
to_timestamp: Optional[str] = None,
evaluations: Optional[List[Callable[[Dict[str, Any]], LLMObsEvaluationResult]]] = None
Copy link
Contributor Author

@ncybul ncybul Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here:

  1. Was wondering if we should allow these evaluation functions to be asynchronous methods or if this is fine.

  2. I wasn't sure what the return type should be for the evaluations. I do not think we want users to directly create their own evaluation metric, so I made a new LLMObsEvaluationResult dictionary that contains all the info we have users pass into submit_evaluation. Then we handle building and validating the evaluation metric from that info, but open to suggestions on this design!

@ncybul ncybul marked this pull request as ready for review December 2, 2025 21:12
@ncybul ncybul requested review from a team as code owners December 2, 2025 21:12
Copy link
Member

@brettlangdon brettlangdon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

release note lgtm

Copy link
Contributor

@sabrenner sabrenner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really cool! just a couple comments for now, i didn't really look at the tests yet, but the source files mostly lgtm

pass


class LLMObsEvaluationResult(TypedDict, total=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a type we expect users to return? if so might make more sense to put in our types module

Comment on lines +1750 to +1757
span_id: Optional[str] = None,
trace_id: Optional[str] = None,
tags: Optional[Dict[str, str]] = None,
span_kind: Optional[str] = None,
span_name: Optional[str] = None,
ml_app: Optional[str] = None,
from_timestamp: Optional[str] = None,
to_timestamp: Optional[str] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we enforce at least one of these below? agree they should be optional but maybe we want to enforce either span_id/trace_id or tags to give the user a sense of direction for how to most effectively use run_evaluations. just a thought tho and maybe i missed a point elsewhere about the constraints we wanna set for this method

pass


class LLMObsExportSpansClient:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i know this isn't a "writer" but i feel it somehow makes more sense to live in _writer.py. i could just be thinking about it wrong tho so feel free to ignore but wanna know what you think/if you already gave it some thought!

Comment on lines +555 to +556
if not self._api_key or not self._app_key:
raise ValueError("Both an API key and an APP key are required to make requests to the LLMObs Export API")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could probably do this earlier, in the constructor maybe

)
finally:
metric_type = evaluation_result.get("metric_type") or ""
telemetry.record_llmobs_submit_evaluation(join_on, metric_type, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a thought - maybe we wanna have an extra tag for this submit evaluation telemetry signal that it happened through the run_evaluations method, maybe like source:run_evaluations. if you also think that's a good idea, we'd have to update the metric definition in the backend to accept a new tag, although if we're worried about cardinality for that metric we could make a new one.

either way i think it might be nice to have a signal of how many people are using this run_evaluations method vs the normal submit_evaluations one

"""
Tests that enqueuing evaluation metrics works for multiple spans and evaluations.
"""
mock_export_spans.return_value = mock_exported_spans()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might be able to leverage the vcr from the testagent like we do for experiments tests if you wanted to try that out 👀 (ref, ref)

i'd be happy to help w that if you wanted to try that and you had any questions! otherwise i don't have any explicit problem with doing mocking this way since it's unlikely we'll change the search api i'm guessing

"trace_id": exported_span.get("trace_id"),
},
}
metric_type = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think we don't need this here right, we just use it below in the finally block, or am i missing another place?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants