Skip to content

[issue-5336] [BE] Fix OTEL GenAI mapping gaps#5348

Draft
vincentkoc wants to merge 1 commit intomainfrom
vincentkoc-code/NA-otel-genai-mapping-fixes
Draft

[issue-5336] [BE] Fix OTEL GenAI mapping gaps#5348
vincentkoc wants to merge 1 commit intomainfrom
vincentkoc-code/NA-otel-genai-mapping-fixes

Conversation

@vincentkoc
Copy link
Copy Markdown
Member

@vincentkoc vincentkoc commented Feb 20, 2026

Details

This PR addresses span-level OTEL GenAI mapping correctness in opik-backend after PR #5336. This area is cross-cutting because Opik ingests OTEL spans and normalizes semantic-convention attributes into trace/span fields.

What changed

  • thread_id handling is now deterministic: when both thread_id and gen_ai.conversation.id exist, explicit thread_id is treated as canonical.
  • Malformed JSON inside gen_ai.usage.* is now treated as non-fatal metadata parse noise instead of aborting the whole span mapping.
  • Added compatibility mappings for gen_ai.openai.* deprecated keys used by older instrumentations.
  • Extended unit tests for mapping behavior and edge cases.

Why this is necessary

  • OpenTelemetry is evolving, and both stable and deprecated GenAI attributes can appear in mixed payloads from downstream clients.
  • Non-fatal handling of malformed usage fields avoids total batch loss for otherwise valid trace ingestion.

References

Change checklist

  • User facing
  • Documentation update

Issues

Testing

  • Not run in this session.
  • Suggested commands:
    • mvn -f apps/opik-backend/pom.xml -Dtest=OpenTelemetryMapperTest,OpenTelemetryMappingUtilsTest test

Documentation

  • No user-facing documentation changes in this PR.
  • otel_issues.md remains in the metrics-focused draft PR.

@github-actions github-actions Bot added java Pull requests that update Java code Backend tests Including test files, or tests related like configuration. labels Feb 20, 2026
Comment on lines 212 to 214
} catch (UncheckedIOException ex) {
log.warn("Failed to parse JSON string for usage field {}: {}. Skipping usage extraction.", key,
ex.getMessage());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log.warn currently embeds ex.getMessage() into the message string, so the stack trace is discarded and the log breaks the "pass exception as last parameter" rule; can we keep the descriptive text at the start and pass ex as the final argument (e.g. log.warn("Failed to parse JSON string for usage field {} and skip usage extraction.", key, ex);) so the exception is captured properly?

Finding type: Structure log messages properly


  • Apply fix with Baz

Comment on lines +23 to +27
OpenTelemetryMappingRule.builder()
.rule("error.type").source(SOURCE).outcome(OpenTelemetryMappingRule.Outcome.METADATA).build(),
OpenTelemetryMappingRule.builder()
.rule("error.message").source(SOURCE).outcome(OpenTelemetryMappingRule.Outcome.METADATA).build(),
OpenTelemetryMappingRule.builder()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping rules now persist error.type/error.message into metadata (and the following lines also add error.stack, exception.* and server.address/port), yet OpenTelemetryMapper.enrichSpanWithAttributes writes every metadata attribute via extractToJsonColumn directly into the span (see lines 123‑134) with no sanitizer/redaction; these newly captured fields can contain secrets, stack traces, or IPs and are now stored as-is. Can we introduce a centralized sanitizer before metadata is persisted?

Finding type: Basic Security Patterns


  • Apply fix with Baz

Comment on lines +242 to +246
void testGenAiProviderNameMapsToProvider() {
var attributes = List.of(
KeyValue.newBuilder()
.setKey("gen_ai.provider.name")
.setValue(AnyValue.newBuilder().setStringValue("openai"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New tests around here (e.g. testGenAiProviderNameMapsToProvider) repeat the same span setup/assert pattern as the dozen that follow but with only key/value differences; .agents/skills/opik-backend/testing.md's Parameterized Tests rule says such similar inputs should be consolidated into @ParameterizedTest cases to avoid duplication and keep the suite maintainable.

Finding type: AI Coding Guidelines


  • Apply fix with Baz
Other fix methods

Fix in Cursor

Prompt for AI Agents:

In apps/opik-backend/src/test/java/com/comet/opik/domain/OpenTelemetryMapperTest.java
around lines 242 to 246, the test method testGenAiProviderNameMapsToProvider (and the
many tests that follow) repeat the same spanBuilder setup and assertion pattern with
only differing attribute keys/values. Refactor by extracting the common Span builder
setup and OpenTelemetryMapper.enrichSpanWithAttributes call into a private helper method
that accepts a List<KeyValue> and returns the built Span, then replace the repetitive
single-case tests with one or more @ParameterizedTest methods (using CsvSource or
MethodSource) that supply tuples of (attributeKey, attributeValue, expectedTarget,
expectedValueType/Value). Update each parameterized case to perform the appropriate
assertion (e.g., provider equals, metadata has key and value, input/output presence) so
all similar tests are consolidated and duplication removed.

Comment on lines +198 to +202
if (cacheReadTokens != null) {
computedInputTokens += cacheReadTokens;
}
if (cacheWriteTokens != null) {
computedInputTokens += cacheWriteTokens;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

per the Otel GenAI semantic conventions for Anthropic (https://opentelemetry.io/docs/specs/semconv/gen-ai/anthropic/), gen_ai.usage.input_tokens is already computed as base input_tokens + cache_read.input_tokens + cache_creation.input_tokens, so the new enrichUsageWithAnthropicCacheTokens adds the cache counts once more whenever those cache_* fields are present, double-counting cached tokens in prompt_tokens for spec-compliant spans; can we skip this accumulation and rely on the upstream total?

Finding type: Logical Bugs


  • Apply fix with Baz

@github-actions
Copy link
Copy Markdown
Contributor

Backend Tests Results

  432 files    432 suites   57m 1s ⏱️
6 840 tests 6 827 ✅ 13 💤 0 ❌
6 657 runs  6 644 ✅ 13 💤 0 ❌

Results for commit 5623274.

@andrescrz
Copy link
Copy Markdown
Member

Hi @vincentkoc

Thanks again for the PR and your contribution.

Before I proceed with the review, could you please address the automated comments from Baz? Also, please run the backend formatting so the CI formatting check passes.

Once those are resolved, I’ll continue with the review.

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Backend java Pull requests that update Java code tests Including test files, or tests related like configuration.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants