Skip to content

Add OTel auto-instrumentation for ddtrace-equivalent fidelity#60

Merged
prim-8 merged 2 commits into
mainfrom
feature/otel-auto-instrumentation
Apr 29, 2026
Merged

Add OTel auto-instrumentation for ddtrace-equivalent fidelity#60
prim-8 merged 2 commits into
mainfrom
feature/otel-auto-instrumentation

Conversation

@prim-8
Copy link
Copy Markdown
Contributor

@prim-8 prim-8 commented Apr 29, 2026

Summary

The previous PR (#59) added OpenTelemetry as a parallel runtime tracing option to ddtrace, but the OTel path only emitted the three manual spans hand-coded in the milter (process_email, storage_upload, webhook_call). This PR closes the fidelity gap so the OTel path matches what ddtrace's auto-instrumentation provides.

  • Adds the four opentelemetry-instrumentation-* packages that match the libraries primitivemail uses (urllib3, botocore, logging, dnspython) plus opentelemetry-distro for the CLI wrapper.
  • Wraps the milter launch under opentelemetry-instrument when OTEL_TRACING_ENABLED=true so all installed instrumentations attach at process start.
  • Strictly opt-in (gated on both INSTALL_OTEL=true build arg and OTEL_TRACING_ENABLED=true runtime flag). Zero impact on ddtrace deployers and untraced deployers.

What this delivers

For an OTel deployer, the trace graph for an inbound email becomes:

milter.process_email                        # manual
├── HTTP GET / dnspython (SPF lookup)       # auto (dnspython)
├── HTTP GET / dnspython (DKIM lookup)      # auto
├── milter.storage_upload                    # manual
│   ├── S3.PutObject                         # auto (botocore)
│   └── HTTP POST /upload                    # auto (urllib3, if HTTP-style storage)
└── milter.webhook_call                      # manual
    └── HTTP POST {webhook_url}              # auto (urllib3)

Same depth ddtrace currently produces in Datadog. Manual spans become parents of the auto-instrumented child spans naturally.

Backwards compatibility

Build arg Runtime flag Effect
INSTALL_OTEL=false (default) any OTel packages absent; OTel branch noops
INSTALL_OTEL=true, OTEL_TRACING_ENABLED=false Packages installed but neither SDK nor wrapper activated
INSTALL_OTEL=true, OTEL_TRACING_ENABLED=true Wrapper activates, all instrumentations attach, full trace graph emitted
INSTALL_DDTRACE=true, DATADOG_TRACING_ENABLED=true Unchanged from today

Auto-instrumentations can be disabled per-library at runtime via OTEL_PYTHON_DISABLED_INSTRUMENTATIONS=urllib3,botocore,... for deployers who only want the manual spans.

Test plan

  • pytest — existing tests pass (auto-instrumentation only attaches when both gates are set; default-off path is undisturbed).
  • Build with INSTALL_DDTRACE=true INSTALL_OTEL=true, run with DATADOG_TRACING_ENABLED=true — confirm Datadog APM still shows existing trace structure (no behavior change on this path).
  • Same image, run with OTEL_TRACING_ENABLED=true + OTEL_EXPORTER_OTLP_ENDPOINT=... pointing at a local OTel collector or Tempo — confirm:
    • manual spans (milter.process_email, milter.storage_upload, milter.webhook_call) appear
    • urllib3 child spans appear under webhook + HTTP-storage paths
    • botocore child spans appear under S3 storage paths
    • log lines emitted during the trace carry otelTraceID / otelSpanID fields
    • dnspython spans appear when SPOOF_PROTECTION=enforce exercises SPF/DKIM/DMARC
  • OTEL_PYTHON_DISABLED_INSTRUMENTATIONS=urllib3,botocore,logging,dnspython — confirm only the manual spans land (escape hatch works).

When INSTALL_OTEL=true, the image now also installs:
  - opentelemetry-distro (provides the opentelemetry-instrument CLI)
  - opentelemetry-instrumentation-urllib3 (outbound HTTP — webhook +
    HTTP-style storage uploads)
  - opentelemetry-instrumentation-botocore (boto3 — S3 native uploads)
  - opentelemetry-instrumentation-logging (otelTraceID/otelSpanID on
    LogRecord, equivalent to DD_LOGS_INJECTION)
  - opentelemetry-instrumentation-dnspython (DNS during SPF/DKIM/DMARC)

entrypoint.sh wraps the milter launch with opentelemetry-instrument
when OTEL_TRACING_ENABLED=true, which auto-discovers and activates all
installed instrumentations. ddtrace path stays unchanged (its
auto-instrumentation activates via import, no wrapper needed).

Net effect on the OTel path: child spans for every outbound HTTP
request, S3 op, log line, and DNS lookup — same depth ddtrace
provides today. Manual spans (process_email, storage_upload,
webhook_call) become parents of the auto-instrumentation child spans.
No degradation for ddtrace deployers (gated on INSTALL_OTEL build arg
+ OTEL_TRACING_ENABLED runtime flag); auto-instrumentation can be
disabled per-library via OTEL_PYTHON_DISABLED_INSTRUMENTATIONS.
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 29, 2026

Greptile Summary

This PR adds four opentelemetry-instrumentation-* packages and opentelemetry-distro to the INSTALL_OTEL build stage, then wraps the milter launch under opentelemetry-instrument when OTEL_TRACING_ENABLED=true so all auto-instrumentations (urllib3, botocore, logging, dnspython) attach at process start. The previous review concern about the silent fallback is fully addressed with explicit stderr warnings when the wrapper binary is missing.

Confidence Score: 5/5

Safe to merge — strictly additive, doubly gated, and the prior reviewer concern is resolved.

No P0 or P1 findings. The implementation is clean: version ranges follow the OTel Python convention (0.48b0 ↔ api 1.27), opentelemetry-instrument exec's into the Python process so MILTER_PID tracking and signal propagation are unaffected, the silent-fallback gap from the prior review is closed with three explicit stderr warnings, and the change is completely opt-in behind two independent gates (build arg + runtime flag). All findings are P2 or lower — full marks.

No files require special attention.

Important Files Changed

Filename Overview
Dockerfile Adds five OTel contrib packages (opentelemetry-distro + four instrumentation packages) to the existing INSTALL_OTEL conditional block; version ranges follow the established OTel Python convention (0.48b0 ↔ 1.27).
milter/entrypoint.sh Replaces the single milter launch line with a three-branch conditional: OTel+wrapper present → wrapped launch; OTel+wrapper absent → explicit stderr warnings then undecorated launch; default → undecorated launch. MILTER_PID tracking, readiness check, and shutdown handler remain unchanged and are unaffected by the wrapper (opentelemetry-instrument exec's into the Python process).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[entrypoint.sh starts] --> B{OTEL_TRACING_ENABLED=true?}
    B -- No --> C[Launch milter directly]
    B -- Yes --> D{opentelemetry-instrument\nin PATH?}
    D -- Found --> E[opentelemetry-instrument\npython3 primitivemail_milter.py]
    D -- Missing --> F[Print 3 stderr WARNINGs]
    F --> G[Launch milter directly\nmanual spans only]
    E --> H[MILTER_PID captured]
    G --> H
    C --> H
    H --> I[Readiness check port 9900]
    I --> J[Postfix starts]

    subgraph OTel_auto_instrumentations
        E --> K[urllib3 spans]
        E --> L[botocore S3 spans]
        E --> M[logging correlation]
        E --> N[dnspython spans]
    end
Loading

Reviews (3): Last reviewed commit: "Greptile P2: warn loudly when openteleme..." | Re-trigger Greptile

Comment thread milter/entrypoint.sh
Previously, OTEL_TRACING_ENABLED=true with the wrapper absent fell
silently through to a plain milter launch — auto-instrumentation
didn't attach, but the earlier 'OpenTelemetry tracing enabled' log
made it look like everything was working.

Now an explicit three-line WARNING surfaces the fidelity-loss case
and points at the rebuild fix (INSTALL_OTEL=true). Manual spans
still emit, so this is a degradation alarm, not a hard failure.
@prim-8 prim-8 merged commit 94491ed into main Apr 29, 2026
8 checks passed
@prim-8 prim-8 deleted the feature/otel-auto-instrumentation branch April 29, 2026 02:47
prim-8 added a commit that referenced this pull request Apr 29, 2026
…61)

PR #60 listed it among the auto-instrumentation packages by analogy
with the others, but there's no upstream OTel auto-instrumentation
for dnspython on PyPI. The build fails with:

    ERROR: Could not find a version that satisfies the requirement
    opentelemetry-instrumentation-dnspython<1,>=0.48b0
    (from versions: none)
    ERROR: No matching distribution found for
    opentelemetry-instrumentation-dnspython<1,>=0.48b0

DNS lookups during SPF/DKIM/DMARC checks won't have per-call spans,
but the manual milter.process_email parent still wraps them — trace
context is preserved, only the per-DNS-call breakdown is lost.
Comment on the Dockerfile updated to call this out explicitly so the
next person who looks at the list knows why it's three packages, not
four.

Co-authored-by: prim-8 <prim-8@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant