[issue-5336] [BE] Fix OTEL GenAI mapping gaps#5348
Conversation
| } catch (UncheckedIOException ex) { | ||
| log.warn("Failed to parse JSON string for usage field {}: {}. Skipping usage extraction.", key, | ||
| ex.getMessage()); |
There was a problem hiding this comment.
log.warn currently embeds ex.getMessage() into the message string, so the stack trace is discarded and the log breaks the "pass exception as last parameter" rule; can we keep the descriptive text at the start and pass ex as the final argument (e.g. log.warn("Failed to parse JSON string for usage field {} and skip usage extraction.", key, ex);) so the exception is captured properly?
Finding type: Structure log messages properly
- Apply fix with Baz
| OpenTelemetryMappingRule.builder() | ||
| .rule("error.type").source(SOURCE).outcome(OpenTelemetryMappingRule.Outcome.METADATA).build(), | ||
| OpenTelemetryMappingRule.builder() | ||
| .rule("error.message").source(SOURCE).outcome(OpenTelemetryMappingRule.Outcome.METADATA).build(), | ||
| OpenTelemetryMappingRule.builder() |
There was a problem hiding this comment.
Mapping rules now persist error.type/error.message into metadata (and the following lines also add error.stack, exception.* and server.address/port), yet OpenTelemetryMapper.enrichSpanWithAttributes writes every metadata attribute via extractToJsonColumn directly into the span (see lines 123‑134) with no sanitizer/redaction; these newly captured fields can contain secrets, stack traces, or IPs and are now stored as-is. Can we introduce a centralized sanitizer before metadata is persisted?
Finding type: Basic Security Patterns
- Apply fix with Baz
| void testGenAiProviderNameMapsToProvider() { | ||
| var attributes = List.of( | ||
| KeyValue.newBuilder() | ||
| .setKey("gen_ai.provider.name") | ||
| .setValue(AnyValue.newBuilder().setStringValue("openai")) |
There was a problem hiding this comment.
New tests around here (e.g. testGenAiProviderNameMapsToProvider) repeat the same span setup/assert pattern as the dozen that follow but with only key/value differences; .agents/skills/opik-backend/testing.md's Parameterized Tests rule says such similar inputs should be consolidated into @ParameterizedTest cases to avoid duplication and keep the suite maintainable.
Finding type: AI Coding Guidelines
- Apply fix with Baz
Other fix methods
Prompt for AI Agents:
In apps/opik-backend/src/test/java/com/comet/opik/domain/OpenTelemetryMapperTest.java
around lines 242 to 246, the test method testGenAiProviderNameMapsToProvider (and the
many tests that follow) repeat the same spanBuilder setup and assertion pattern with
only differing attribute keys/values. Refactor by extracting the common Span builder
setup and OpenTelemetryMapper.enrichSpanWithAttributes call into a private helper method
that accepts a List<KeyValue> and returns the built Span, then replace the repetitive
single-case tests with one or more @ParameterizedTest methods (using CsvSource or
MethodSource) that supply tuples of (attributeKey, attributeValue, expectedTarget,
expectedValueType/Value). Update each parameterized case to perform the appropriate
assertion (e.g., provider equals, metadata has key and value, input/output presence) so
all similar tests are consolidated and duplication removed.
| if (cacheReadTokens != null) { | ||
| computedInputTokens += cacheReadTokens; | ||
| } | ||
| if (cacheWriteTokens != null) { | ||
| computedInputTokens += cacheWriteTokens; |
There was a problem hiding this comment.
per the Otel GenAI semantic conventions for Anthropic (https://opentelemetry.io/docs/specs/semconv/gen-ai/anthropic/), gen_ai.usage.input_tokens is already computed as base input_tokens + cache_read.input_tokens + cache_creation.input_tokens, so the new enrichUsageWithAnthropicCacheTokens adds the cache counts once more whenever those cache_* fields are present, double-counting cached tokens in prompt_tokens for spec-compliant spans; can we skip this accumulation and rely on the upstream total?
Finding type: Logical Bugs
- Apply fix with Baz
Backend Tests Results 432 files 432 suites 57m 1s ⏱️ Results for commit 5623274. |
|
Hi @vincentkoc Thanks again for the PR and your contribution. Before I proceed with the review, could you please address the automated comments from Baz? Also, please run the backend formatting so the CI formatting check passes. Once those are resolved, I’ll continue with the review. Thank you! |
Details
This PR addresses span-level OTEL GenAI mapping correctness in
opik-backendafter PR #5336. This area is cross-cutting because Opik ingests OTEL spans and normalizes semantic-convention attributes into trace/span fields.What changed
thread_idhandling is now deterministic: when boththread_idandgen_ai.conversation.idexist, explicitthread_idis treated as canonical.gen_ai.usage.*is now treated as non-fatal metadata parse noise instead of aborting the whole span mapping.gen_ai.openai.*deprecated keys used by older instrumentations.Why this is necessary
References
Change checklist
Issues
Testing
mvn -f apps/opik-backend/pom.xml -Dtest=OpenTelemetryMapperTest,OpenTelemetryMappingUtilsTest testDocumentation
otel_issues.mdremains in the metrics-focused draft PR.