ci: exclusion of metrics added #7671

neoandmatrix · 2025-11-27T17:12:32Z

Which problem is this PR solving?

Part of [ci]: Metrics comparison is flaky #7617

Description of the changes

This PR introduces the changes to filter out 503 calls to reduce the flakiness in the metrics comparision.

How was this change tested?

I used the artifacts form https://github.com/jaegertracing/jaeger/actions/runs/19589467818/job/56105108460 and https://github.com/jaegertracing/jaeger/actions/runs/19596403385/job/56121587792?pr=7666 to manually test the script. Before changes the 503 calls were listed as metrics added/deleted but after changes they are being filtered correctly.

Before changes

After changes

Checklist

I have read https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
I have signed all commits
I have added unit tests for the new functionality
I have run lint and test steps successfully
- for jaeger: make lint test
- for jaeger-ui: npm run lint and npm run test

Signed-off-by: Tushar Anand <[email protected]>

scripts/e2e/compare_metrics.py

Signed-off-by: Tushar Anand <[email protected]>

codecov · 2025-11-27T17:31:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.54%. Comparing base (fbf3ccc) to head (8c4232c).
⚠️ Report is 6 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #7671   +/-   ##
=======================================
  Coverage   96.54%   96.54%           
=======================================
  Files         384      384           
  Lines       19502    19502           
=======================================
  Hits        18828    18828           
  Misses        489      489           
  Partials      185      185

Flag	Coverage Δ
badger_v1	`8.82% <ø> (ø)`
badger_v2	`1.72% <ø> (ø)`
cassandra-4.x-v1-manual	`12.54% <ø> (ø)`
cassandra-4.x-v2-auto	`1.71% <ø> (ø)`
cassandra-4.x-v2-manual	`1.71% <ø> (ø)`
cassandra-5.x-v1-manual	`12.54% <ø> (ø)`
cassandra-5.x-v2-auto	`1.71% <ø> (ø)`
cassandra-5.x-v2-manual	`1.71% <ø> (ø)`
clickhouse	`1.65% <ø> (ø)`
elasticsearch-6.x-v1	`16.74% <ø> (ø)`
elasticsearch-7.x-v1	`16.77% <ø> (ø)`
elasticsearch-8.x-v1	`16.92% <ø> (ø)`
elasticsearch-8.x-v2	`1.72% <ø> (ø)`
elasticsearch-9.x-v2	`1.72% <ø> (ø)`
grpc_v1	`10.75% <ø> (ø)`
grpc_v2	`1.72% <ø> (ø)`
kafka-3.x-v1	`10.25% <ø> (ø)`
kafka-3.x-v2	`1.72% <ø> (ø)`
memory_v2	`1.72% <ø> (ø)`
opensearch-1.x-v1	`16.81% <ø> (ø)`
opensearch-2.x-v1	`16.81% <ø> (ø)`
opensearch-2.x-v2	`1.72% <ø> (ø)`
opensearch-3.x-v2	`1.72% <ø> (ø)`
query	`1.72% <ø> (ø)`
tailsampling-processor	`0.49% <ø> (ø)`
unittests	`95.45% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

yurishkuro · 2025-11-27T19:22:45Z

scripts/e2e/compare_metrics.py

+            should_exclude= should_exclude_metric(sample.name, labels)
+            if should_exclude:


Suggested change

should_exclude= should_exclude_metric(sample.name, labels)

if should_exclude:

if should_exclude_metric(sample.name, labels):

it would be good to count these exclusions in stats

it would be good to count these exclusions in stats

Like the ones posted in PR, if some stats are excluded?

yes (although I am not sure excluding full metrics is a good thing, per the other comment).

yurishkuro · 2025-11-27T19:25:51Z

scripts/e2e/compare_metrics.py

    # }
 }

+METRIC_EXCLUSION_RULES = {


rather than excluding, could we add a curl to the CI scripts that always fails (e.g. looking for a trace id we know doesn't exist)? The main point of the metrics regression checks is to make sure all metrics are covered (but ignore the actual values).

rather than excluding, could we add a curl to the CI scripts that always fails (e.g. looking for a trace id we know doesn't exist)?

I might be misunderstanding the approach, so could you clarify how the curl is meant to work? Would the idea be to make a request to an endpoint that we know will fail, so that a 503 can be generated and included in the comparison, if it isn't there in one of the metrics?

Hi @yurishkuro , I tried to implement this approach to generate missing metrics but the there are few inconsistencies that i observed.

First being that getting the exact url or route from where the flaky metrics were exactly generated. In this particular case i tried curl -v http://localhost:13133/health and curl -v http://localhost:13133/ on my local system thinking that these were from these endpoints but the generated metrics didn't exactly match. Sometimes 503 was generated but not matching the ones in the CI and sometimes it returned 200. In CI i believe that these are generated due to some minor mismatch in the starting of service and the OTEL collector , replicating which is bit difficult.

And second was that deciding which metrics are actually diff and which are flaky. We can use rule based approach for that just like one in this PR but the generated metrics have to exactly match the format if we are to go with generating them manually.

With this i feel that going with the above approach of just displaying these excluded metrics like mentioned by you would be more appropriate.
Can you please have a look and suggest how should i move forward with this or should try some different approach.

chore: exclusion of flaky 5xx metrics added

6afc753

Signed-off-by: Tushar Anand <[email protected]>

neoandmatrix requested a review from a team as a code owner November 27, 2025 17:12

neoandmatrix requested a review from joe-elliott November 27, 2025 17:12

dosubot bot added the enhancement label Nov 27, 2025

neoandmatrix changed the title ~~chore: exclusion of metrics added~~ ci: exclusion of metrics added Nov 27, 2025

graphite-app bot reviewed Nov 27, 2025

View reviewed changes

scripts/e2e/compare_metrics.py Outdated Show resolved Hide resolved

fix: return values

8c4232c

Signed-off-by: Tushar Anand <[email protected]>

neoandmatrix force-pushed the flaky-metrics branch from bd4ceed to 8c4232c Compare November 27, 2025 17:23

yurishkuro reviewed Nov 27, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: exclusion of metrics added #7671

ci: exclusion of metrics added #7671

Uh oh!

neoandmatrix commented Nov 27, 2025

Uh oh!

Uh oh!

codecov bot commented Nov 27, 2025 •

edited

Loading

Uh oh!

yurishkuro Nov 27, 2025

Uh oh!

yurishkuro Nov 27, 2025

Uh oh!

neoandmatrix Nov 28, 2025

Uh oh!

yurishkuro Nov 28, 2025

Uh oh!

yurishkuro Nov 27, 2025

Uh oh!

neoandmatrix Nov 28, 2025

Uh oh!

yurishkuro Nov 28, 2025

Uh oh!

neoandmatrix Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		should_exclude= should_exclude_metric(sample.name, labels)
		if should_exclude:

	should_exclude= should_exclude_metric(sample.name, labels)
	if should_exclude:
	if should_exclude_metric(sample.name, labels):

ci: exclusion of metrics added #7671

Are you sure you want to change the base?

ci: exclusion of metrics added #7671

Uh oh!

Conversation

neoandmatrix commented Nov 27, 2025

Which problem is this PR solving?

Description of the changes

How was this change tested?

Checklist

Uh oh!

Uh oh!

codecov bot commented Nov 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 27, 2025 •

edited

Loading