Skip to content

ctrl_encrypt_cert_change: rotate ECDH cert without restart#1163

Merged
eriknordmark merged 2 commits into
lf-edge:masterfrom
eriknordmark:ctrl-encrypt-cert-change
May 13, 2026
Merged

ctrl_encrypt_cert_change: rotate ECDH cert without restart#1163
eriknordmark merged 2 commits into
lf-edge:masterfrom
eriknordmark:ctrl-encrypt-cert-change

Conversation

@eriknordmark
Copy link
Copy Markdown
Contributor

@eriknordmark eriknordmark commented May 8, 2026

Summary

End-to-end test for rotation of the controller's ECDH encryption certificate (CERT_TYPE_CONTROLLER_ECDH_EXCHANGE, encrypt.pem), independent of the signing cert. Two commits:

  1. Plumbingeden utils gen-encrypt-cert, eden adam change-encrypt-cert, eden pod deploy --use-encrypt-cert. The --use-encrypt-cert flag makes prepareCipherData derive ECDH from encrypt-key.pem and tag the cipher context with the encrypt cert hash, mirroring production controller layouts where signing and ECDH are distinct certs with independent lifetimes. A new cipher-context-cert-hash filter in reencryptConfigs (and the new reencryptConfigsForEncryptCert) lets the two rotation commands coexist: each only re-encrypts cipher blocks whose context references the cert it owns.

  2. Testctrl_encrypt_cert_change.txt, registered as smoke step 24/26. Deploys three eclient apps with --use-encrypt-cert, rotates encrypt + signing certs twice, verifies fresh-app deployment after each rotation, cross-checks the rotated cert is advertised in pubsub, and confirms reboot survival.

Why each rotation rotates both signing and encrypt

EVE's only fast trigger for /certs refetch is signing-cert mismatch in the auth-envelope SenderCertHash (controllerconn/authen.go:152-162). The cipher decrypt path returns "Controller Certificate get fail" without triggering refetch, adam doesn't populate EdgeDevConfig.controllercert_confighash so handleControllerCertsSha is dead, and the periodic controllerCertsTask defaults to CertInterval = 24h. See lf-edge/eve#5926 for the gap analysis and lf-edge/adam#152 for the fix; once that adam patch ships in an Eden-tracked release, the test can be extended with a pure encrypt-cert rotation.

Verification approach

  1. Fresh-app deployment is the primary signal. After each rotation, eclient2/eclient3 is deployed with --use-encrypt-cert — the cipher block uses the just-rotated ECDH key. Reaching RUNNING means EVE pulled /certs (triggered by the paired signing rotation), published the new Type=3 cert into pubControllerCert, and decrypted the cipher block.

  2. Cert-chain cross-check. check_encrypt_cert.sh walks /run/zedagent/ControllerCert/ (or /persist/... on EVE where that pubsub is Persistent: true), filters for Type=3, and byte-matches the decoded PEM against encrypt-new.pem (normalized for adam's strings.TrimSpace).

  3. Reboot survival. After both rotations + a reboot, all three apps come back RUNNING and the cert chain still has the rotated encrypt cert.

Test plan

  • CI smoke matrix — 3 of 4 entries reach the test and pass; the fourth (Smoke (ext4, false)) hits the upstream log_test flake that log_test flakes on Smoke (ext4, false) waiting for 'Disconnected' log #1156 addresses, before reaching step 24.
  • gofmt -l pkg/ cmd/ clean
  • go build ./... clean
  • go test ./pkg/openevec/... passes
  • Existing ctrl_cert_change.txt continues to pass (the new reencryptConfigs filter is a no-op for it — its cipher contexts all reference the signing cert hash, which is what the filter selects).

Notes for reviewers

🤖 Generated with Claude Code

eriknordmark and others added 2 commits May 11, 2026 18:19
Add eden plumbing to rotate the controller's
CERT_TYPE_CONTROLLER_ECDH_EXCHANGE certificate (encrypt.pem /
encrypt-key.pem) independently of the signing cert, and to deploy
applications whose user-data cipher block is tagged with the encrypt
cert hash rather than the signing cert hash.

`eden utils gen-encrypt-cert` generates a fresh ECDSA P-256 key pair
and an encryption cert signed by the eden root CA, mirroring
gen-signing-cert.

`eden adam change-encrypt-cert` rotates encrypt.pem / encrypt-key.pem
on adam's disk, re-encrypts every cipher block whose CipherContext
references the old encrypt cert hash with the new ECDH key, and
pushes the updated configs to adam. The signing cert and key are left
untouched, so auth-container envelopes continue to verify against the
device's saved signing cert.

`eden pod deploy --use-encrypt-cert` makes prepareCipherData use
encrypt-key.pem (instead of signing-key.pem) for ECDH derivation and
tags the cipher context with the encrypt cert hash. With this flag,
app cipher blocks track the encrypt-cert rotation rather than the
signing-cert rotation - matching production controller layouts where
signing and ECDH are distinct certs with independent lifetimes.

The Controller interface gains EncryptCertGet alongside
SigningCertGet to fetch the controller's
CERT_TYPE_CONTROLLER_ECDH_EXCHANGE cert from
/api/v2/edgedevice/certs.

The cipher-context-cert-hash filter in reencryptConfigs (used by
change-signing-cert) and reencryptConfigsForEncryptCert ensures the
two rotation paths coexist: each command re-encrypts only the configs
whose CipherContext references the cert it owns. Apps deployed with
--use-encrypt-cert reference the encrypt cert hash and are skipped by
change-signing-cert; apps deployed without the flag (the historical
default) reference the signing cert hash and are skipped by
change-encrypt-cert. Without the filter, a signing rotation would
attempt to AES-decrypt encrypt-cert-tagged blocks with the wrong
symmetric key (signing-key-derived) and ReencryptConfigData would
fail outright on the sha256 plaintext check.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end test that rotates the controller's ECDH encryption cert
twice and verifies that user-data cipher blocks tagged with the
encrypt cert hash keep decrypting on the device. Companion to
ctrl_cert_change.txt: that test rotates the signing cert and
incidentally exercises the encryption path because Eden historically
reused signing-key.pem for ECDH derivation; this test deploys apps
with --use-encrypt-cert so their cipher contexts reference the
controller's encrypt cert (Type=3) and rotates that cert specifically.

Each rotation rotates *both* signing and encrypt certs because EVE's
only fast trigger for /certs refetch is signing-cert mismatch in the
auth-envelope SenderCertHash. The cipher decrypt path silently fails
when the encrypt cert hash is unknown without triggering refetch,
adam doesn't populate EdgeDevConfig.controllercert_confighash so
handleControllerCertsSha is dead, and the periodic
controllerCertsTask defaults to CertInterval=24h. See lf-edge/eve#5926
for the design gap and lf-edge/adam#152 for the fix; once that adam
patch ships in an Eden-tracked release, the test could be extended
with an encrypt-cert-only rotation step.

Order within each rotation: change-encrypt-cert first, then
change-signing-cert. change-encrypt-cert re-encrypts the
encrypt-tagged cipher blocks and writes the new encrypt files on
adam's disk. change-signing-cert is a no-op for those cipher blocks
(its reencryptConfigs filter skips them), but its on-disk signing-key
swap is what makes adam sign the next auth envelope with a key the
device hasn't seen, triggering SenderStatusCertMiss. By the time the
device refetches /certs, both new certs are on adam's disk and arrive
in a single round-trip.

Verification proceeds in three layers:

1. Fresh-app deployment (eclient2 after first rotation, eclient3
   after second). The app encrypts with the just-rotated ECDH key;
   reaching RUNNING means EVE successfully fetched the new encrypt
   cert into pubControllerCert and decrypted the cipher block.

2. check_encrypt_cert.sh walks /run/zedagent/ControllerCert/ (or
   /persist/status/zedagent/ControllerCert/ on EVE versions where
   that pubsub is Persistent: true) and byte-matches a Type=3 entry
   against encrypt-new.pem. The script normalizes for adam's
   strings.TrimSpace before computing sha256 to match the bytes
   actually published.

3. Reboot survival: after both rotations and a reboot, all three
   apps come back RUNNING and check_encrypt_cert.sh re-confirms the
   latest rotated encrypt cert is still advertised.

Two rotations exercise both controllercerts.bak code paths inside
EVE's MaybeSaveControllerCerts: the first rotation runs with no .bak
yet, the second runs with .bak from the first rotation present.

Registered as test 24/26 in tests/workflow/smoke.tests.txt.

Signed-off-by: eriknordmark <erik@zededa.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eriknordmark eriknordmark force-pushed the ctrl-encrypt-cert-change branch from 278f819 to 74cbd30 Compare May 11, 2026 16:20
@eriknordmark eriknordmark merged commit 5d41b7a into lf-edge:master May 13, 2026
29 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant