ctrl_encrypt_cert_change: rotate ECDH cert without restart#1163
Merged
eriknordmark merged 2 commits intoMay 13, 2026
Conversation
a70d7d2 to
d471818
Compare
This was referenced May 8, 2026
cf3bc30 to
278f819
Compare
Add eden plumbing to rotate the controller's CERT_TYPE_CONTROLLER_ECDH_EXCHANGE certificate (encrypt.pem / encrypt-key.pem) independently of the signing cert, and to deploy applications whose user-data cipher block is tagged with the encrypt cert hash rather than the signing cert hash. `eden utils gen-encrypt-cert` generates a fresh ECDSA P-256 key pair and an encryption cert signed by the eden root CA, mirroring gen-signing-cert. `eden adam change-encrypt-cert` rotates encrypt.pem / encrypt-key.pem on adam's disk, re-encrypts every cipher block whose CipherContext references the old encrypt cert hash with the new ECDH key, and pushes the updated configs to adam. The signing cert and key are left untouched, so auth-container envelopes continue to verify against the device's saved signing cert. `eden pod deploy --use-encrypt-cert` makes prepareCipherData use encrypt-key.pem (instead of signing-key.pem) for ECDH derivation and tags the cipher context with the encrypt cert hash. With this flag, app cipher blocks track the encrypt-cert rotation rather than the signing-cert rotation - matching production controller layouts where signing and ECDH are distinct certs with independent lifetimes. The Controller interface gains EncryptCertGet alongside SigningCertGet to fetch the controller's CERT_TYPE_CONTROLLER_ECDH_EXCHANGE cert from /api/v2/edgedevice/certs. The cipher-context-cert-hash filter in reencryptConfigs (used by change-signing-cert) and reencryptConfigsForEncryptCert ensures the two rotation paths coexist: each command re-encrypts only the configs whose CipherContext references the cert it owns. Apps deployed with --use-encrypt-cert reference the encrypt cert hash and are skipped by change-signing-cert; apps deployed without the flag (the historical default) reference the signing cert hash and are skipped by change-encrypt-cert. Without the filter, a signing rotation would attempt to AES-decrypt encrypt-cert-tagged blocks with the wrong symmetric key (signing-key-derived) and ReencryptConfigData would fail outright on the sha256 plaintext check. Signed-off-by: eriknordmark <erik@zededa.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end test that rotates the controller's ECDH encryption cert twice and verifies that user-data cipher blocks tagged with the encrypt cert hash keep decrypting on the device. Companion to ctrl_cert_change.txt: that test rotates the signing cert and incidentally exercises the encryption path because Eden historically reused signing-key.pem for ECDH derivation; this test deploys apps with --use-encrypt-cert so their cipher contexts reference the controller's encrypt cert (Type=3) and rotates that cert specifically. Each rotation rotates *both* signing and encrypt certs because EVE's only fast trigger for /certs refetch is signing-cert mismatch in the auth-envelope SenderCertHash. The cipher decrypt path silently fails when the encrypt cert hash is unknown without triggering refetch, adam doesn't populate EdgeDevConfig.controllercert_confighash so handleControllerCertsSha is dead, and the periodic controllerCertsTask defaults to CertInterval=24h. See lf-edge/eve#5926 for the design gap and lf-edge/adam#152 for the fix; once that adam patch ships in an Eden-tracked release, the test could be extended with an encrypt-cert-only rotation step. Order within each rotation: change-encrypt-cert first, then change-signing-cert. change-encrypt-cert re-encrypts the encrypt-tagged cipher blocks and writes the new encrypt files on adam's disk. change-signing-cert is a no-op for those cipher blocks (its reencryptConfigs filter skips them), but its on-disk signing-key swap is what makes adam sign the next auth envelope with a key the device hasn't seen, triggering SenderStatusCertMiss. By the time the device refetches /certs, both new certs are on adam's disk and arrive in a single round-trip. Verification proceeds in three layers: 1. Fresh-app deployment (eclient2 after first rotation, eclient3 after second). The app encrypts with the just-rotated ECDH key; reaching RUNNING means EVE successfully fetched the new encrypt cert into pubControllerCert and decrypted the cipher block. 2. check_encrypt_cert.sh walks /run/zedagent/ControllerCert/ (or /persist/status/zedagent/ControllerCert/ on EVE versions where that pubsub is Persistent: true) and byte-matches a Type=3 entry against encrypt-new.pem. The script normalizes for adam's strings.TrimSpace before computing sha256 to match the bytes actually published. 3. Reboot survival: after both rotations and a reboot, all three apps come back RUNNING and check_encrypt_cert.sh re-confirms the latest rotated encrypt cert is still advertised. Two rotations exercise both controllercerts.bak code paths inside EVE's MaybeSaveControllerCerts: the first rotation runs with no .bak yet, the second runs with .bak from the first rotation present. Registered as test 24/26 in tests/workflow/smoke.tests.txt. Signed-off-by: eriknordmark <erik@zededa.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
278f819 to
74cbd30
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
End-to-end test for rotation of the controller's ECDH encryption certificate (
CERT_TYPE_CONTROLLER_ECDH_EXCHANGE,encrypt.pem), independent of the signing cert. Two commits:Plumbing —
eden utils gen-encrypt-cert,eden adam change-encrypt-cert,eden pod deploy --use-encrypt-cert. The--use-encrypt-certflag makesprepareCipherDataderive ECDH fromencrypt-key.pemand tag the cipher context with the encrypt cert hash, mirroring production controller layouts where signing and ECDH are distinct certs with independent lifetimes. A new cipher-context-cert-hash filter inreencryptConfigs(and the newreencryptConfigsForEncryptCert) lets the two rotation commands coexist: each only re-encrypts cipher blocks whose context references the cert it owns.Test —
ctrl_encrypt_cert_change.txt, registered as smoke step 24/26. Deploys three eclient apps with--use-encrypt-cert, rotates encrypt + signing certs twice, verifies fresh-app deployment after each rotation, cross-checks the rotated cert is advertised in pubsub, and confirms reboot survival.Why each rotation rotates both signing and encrypt
EVE's only fast trigger for
/certsrefetch is signing-cert mismatch in the auth-envelopeSenderCertHash(controllerconn/authen.go:152-162). The cipher decrypt path returns "Controller Certificate get fail" without triggering refetch, adam doesn't populateEdgeDevConfig.controllercert_confighashsohandleControllerCertsShais dead, and the periodiccontrollerCertsTaskdefaults toCertInterval = 24h. See lf-edge/eve#5926 for the gap analysis and lf-edge/adam#152 for the fix; once that adam patch ships in an Eden-tracked release, the test can be extended with a pure encrypt-cert rotation.Verification approach
Fresh-app deployment is the primary signal. After each rotation, eclient2/eclient3 is deployed with
--use-encrypt-cert— the cipher block uses the just-rotated ECDH key. ReachingRUNNINGmeans EVE pulled/certs(triggered by the paired signing rotation), published the new Type=3 cert intopubControllerCert, and decrypted the cipher block.Cert-chain cross-check.
check_encrypt_cert.shwalks/run/zedagent/ControllerCert/(or/persist/...on EVE where that pubsub isPersistent: true), filters forType=3, and byte-matches the decoded PEM againstencrypt-new.pem(normalized for adam'sstrings.TrimSpace).Reboot survival. After both rotations + a reboot, all three apps come back
RUNNINGand the cert chain still has the rotated encrypt cert.Test plan
Smoke (ext4, false)) hits the upstreamlog_testflake that log_test flakes on Smoke (ext4, false) waiting for 'Disconnected' log #1156 addresses, before reaching step 24.gofmt -l pkg/ cmd/cleango build ./...cleango test ./pkg/openevec/...passesctrl_cert_change.txtcontinues to pass (the newreencryptConfigsfilter is a no-op for it — its cipher contexts all reference the signing cert hash, which is what the filter selects).Notes for reviewers
4b0a8cd). That commit will fall away naturally once log_test flakes on Smoke (ext4, false) waiting for 'Disconnected' log #1156 merges and we rebase onto upstream master.🤖 Generated with Claude Code