Skip to content

Conversation

@schahal
Copy link
Contributor

@schahal schahal commented Nov 24, 2025

Description:

See #4441 for details

The root cause (from the 3 separate accounts in the thread) seems to be that all three TargetAllocator certificates (CA, server, client) were created at the same time with the same 90-day duration.

When they all came up for renewal simultaneously (around day 60), there was a race condition where, *for example:

  • One of either the Server or Client cert gets renewed and signed by the original CA
  • The CA gets renewed second (new CA cert + key)
  • The other (e.g., Client) cert gets renewed and signed by the new CA
  • This causes "certificate signed by unknown authority" errors

By giving the CA a 1-year duration while client/server certs keep the default 90-day duration, we ensure:

  • The CA remains stable (e.g., longer period between renewals) while client/server certs renew
  • Client and server certs are always signed by the same CA (unless there's a very tiny chance that they renew at same time as CA's 11-month mark)

Link to tracking Issue(s):

Testing:

  • Updated test to validate duration
go test ./internal/manifests/targetallocator/ -run Certificate -v
=== RUN   TestCACertificate
=== RUN   TestCACertificate/Default_CA_Certificate
--- PASS: TestCACertificate (0.00s)
    --- PASS: TestCACertificate/Default_CA_Certificate (0.00s)
=== RUN   TestServingCertificate
=== RUN   TestServingCertificate/Default_Serving_Certificate
--- PASS: TestServingCertificate (0.00s)
    --- PASS: TestServingCertificate/Default_Serving_Certificate (0.00s)
=== RUN   TestClientCertificate
=== RUN   TestClientCertificate/Default_Client_Certificate
--- PASS: TestClientCertificate (0.00s)
    --- PASS: TestClientCertificate/Default_Client_Certificate (0.00s)
PASS
ok  	github.com/open-telemetry/opentelemetry-operator/internal/manifests/targetallocator	(cached)

Documentation:

Added a changelog gen entry

@github-actions
Copy link
Contributor

github-actions bot commented Nov 24, 2025

E2E Test Results

 34 files  ±0  227 suites  ±0   1h 58m 24s ⏱️ - 1m 23s
 90 tests ±0   90 ✅ ±0  0 💤 ±0  0 ❌ ±0 
231 runs  ±0  231 ✅ ±0  0 💤 ±0  0 ❌ ±0 

Results for commit 1186ee2. ± Comparison against base commit a8ba309.

♻️ This comment has been updated with latest results.

Spec: cmv1.CertificateSpec{
IsCA: true,
CommonName: naming.CACertificate(params.TargetAllocator.Name),
// Set CA certificate to 1 year (much longer than the default 90-day duration of client/server certs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the client/server certs renewal configurable?

Would it make sense to make the CA renewal shorter? / What is the lowest safe value possible for the CA renewal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

client/server certs renewal configurable?

No, when I'd looked at it, it was using cert-manager's defaults (which I had to look up, hence the comments for reference because i thought it was 90 days, but had to look it up).

I think that's fine, for simplicity

make the CA renewal shorter? / What is the lowest safe value possible for the CA renewal?

In general, I've seen CAs pretty long-lived. 1-year seems reasonable (I think even like 10 years would be ok, but guessing original author(s) were erring on shorter lifecycle for CA - I don't have that history, so I picked 1 year.

I think 1 year, combined with @swiatekm suggestion adding a renewBefore would actually eliminate the race.

Will push a change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@swiatekm
Copy link
Contributor

Client and server certs are always signed by the same CA (unless there's a very tiny chance that they renew at same time as CA's 11-month mark)

Would it make sense to set the CA certificate grace period (spec.renewBefore) to be longer than the client certficiate duration? Then it'll be impossible to renew a client certificate with a CA certificate with a shorter remaining duration.

@schahal schahal requested a review from pavolloffay November 26, 2025 22:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

failed to verify certificate: x509: certificate signed by unknown authority

3 participants