Skip to content

Conversation

@lalitb
Copy link
Member

@lalitb lalitb commented Dec 5, 2025

Adds mutual TLS (mTLS) client certificate authentication to OTLP/OTAP receivers with parallel handshake processing and timeout protection.

What's New

1. mTLS Client Certificate Verification

  • Client CA loading from file, PEM, or system pool
  • Standards-compliant X.509 validation via WebPkiClientVerifier
  • Mandatory authentication when CA is configured

2. Parallel Handshake Processing

.buffer_unordered(MAX_CONCURRENT_HANDSHAKES)  // 64 concurrent handshakes
Prevents head-of-line blocking from slow/malicious clients.

3. Handshake timeout

  tokio::time::timeout(handshake_timeout, acceptor.accept(conn))
  Default 10s, configurable. DoS protection against slowloris-style attacks.

Configuration

Enable mTLS

  receivers:
    otlp:
      tls:
        cert_file: /path/to/server.crt
        key_file: /path/to/server.key
        client_ca_file: /path/to/client-ca.crt  # NEW
        handshake_timeout: "10s"  # NEW (default: 10s)

System CA Pool

  receivers:
    otlp:
      tls:
        include_system_ca_certs_pool: true  # NEW

What's Deferred

  • Client CA certificate reload (separate PR)
  • Client-side TLS for exporters (separate PR)
  • TLS version/cipher configuration

This PR adds mutual TLS (mTLS) support to the OTLP and OTAP gRPC receivers,
enabling client certificate authentication for enhanced security.

Features:
- Client CA certificate configuration (client_ca_file, client_ca_pem)
- Optional system CA certificate pool inclusion
- Configurable handshake timeout for DoS protection
- Concurrent TLS handshake support (up to 64 parallel handshakes)
- Certificate hot-reloading with configurable reload interval

Configuration options:
- tls.cert_file/cert_pem: Server certificate
- tls.key_file/key_pem: Server private key
- tls.client_ca_file/client_ca_pem: CA for client cert verification (enables mTLS)
- tls.include_system_ca_certs_pool: Include system CAs for client verification
- tls.handshake_timeout: Timeout for TLS handshake (default: 10s)
- tls.reload_interval: Interval for certificate reload checks (default: 5m)

Tests added:
- mTLS client certificate verification tests
- Handshake timeout enforcement test
- Concurrent handshake non-blocking test
- TLS reload integration test fixes
This PR adds mutual TLS (mTLS) support to the OTLP and OTAP gRPC receivers,
enabling client certificate authentication for enhanced security.

Features:
- Client CA certificate configuration (client_ca_file, client_ca_pem)
- Optional system CA certificate pool inclusion
- Configurable handshake timeout for DoS protection
- Concurrent TLS handshake support (up to 64 parallel handshakes)
- Certificate hot-reloading with configurable reload interval

Configuration options:
- tls.cert_file/cert_pem: Server certificate
- tls.key_file/key_pem: Server private key
- tls.client_ca_file/client_ca_pem: CA for client cert verification (enables mTLS)
- tls.include_system_ca_certs_pool: Include system CAs for client verification
- tls.handshake_timeout: Timeout for TLS handshake (default: 10s)
- tls.reload_interval: Interval for certificate reload checks (default: 5m)

Tests added:
- mTLS client certificate verification tests
- Handshake timeout enforcement test
- Concurrent handshake non-blocking test
- TLS reload integration test fixes
@lalitb lalitb requested a review from a team as a code owner December 5, 2025 23:42
@github-actions github-actions bot added the rust Pull requests that update Rust code label Dec 5, 2025
@codecov
Copy link

codecov bot commented Dec 5, 2025

Codecov Report

❌ Patch coverage is 78.03030% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.47%. Comparing base (dfca971) to head (69437a9).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1539      +/-   ##
==========================================
- Coverage   83.50%   83.47%   -0.04%     
==========================================
  Files         428      428              
  Lines      119410   119479      +69     
==========================================
+ Hits        99713    99733      +20     
- Misses      19163    19212      +49     
  Partials      534      534              
Components Coverage Δ
otap-dataflow 84.64% <78.03%> (-0.06%) ⬇️
query_abstraction 80.61% <ø> (ø)
query_engine 90.26% <ø> (ø)
syslog_cef_receivers ∅ <ø> (∅)
otel-arrow-go 53.50% <ø> (ø)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

arc-swap = { workspace = true, optional = true }
rustls = { workspace = true, optional = true }
rustls-pemfile = { workspace = true, optional = true }
rustls-pki-types = { workspace = true, optional = true }
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rustls-pemfile is no longer maintained, so replaced by rustls-pki-types, which is maintained by rustls project.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

match maybe_tls_acceptor {
Some(tls_acceptor) => {
let tls_stream = create_tls_stream(listener_stream, tls_acceptor);
let tls_stream = create_tls_stream(listener_stream, tls_acceptor, handshake_timeout);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handshake_timeout is added to protect against slow/malicious clients that could hold connection slots indefinitely by not completing the TLS handshake.

Without a timeout, a malicious client could:

  • Open a TCP connection
  • Never send TLS ClientHello (or send it very slowly)
  • Hold the connection slot forever, preventing legitimate clients from connecting
  • Repeat this to exhaust server resources (DoS attack)

/// - Allows concurrent handshakes to prevent slow clients from blocking others
/// - Limits memory overhead for pending handshake state
/// - May need adjustment based on actual workload characteristics
const MAX_CONCURRENT_HANDSHAKES: usize = 64;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not mTLS-specific - this is a general TLS hardening to prevent DoS via connection flooding. Limits concurrent handshakes to 64, applying backpressure when exceeded. Without this limit, an attacker could overwhelm the server by initiating thousands of handshakes simultaneously.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rust Pull requests that update Rust code

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

2 participants