Skip to content

fix(kube): persist k3s node password across reboots#5970

Open
naiming-zededa wants to merge 1 commit into
lf-edge:masterfrom
naiming-zededa:naiming-k3s-passwd
Open

fix(kube): persist k3s node password across reboots#5970
naiming-zededa wants to merge 1 commit into
lf-edge:masterfrom
naiming-zededa:naiming-k3s-passwd

Conversation

@naiming-zededa
Copy link
Copy Markdown
Contributor

@naiming-zededa naiming-zededa commented May 18, 2026

Description

The k3s node password lives on a tmpfs overlay and is regenerated on
every reboot, causing NodePasswordValidationFailed errors against the
server-side etcd secret and potentially leaving the node stuck
NotReady.

Persist the password to /var/lib/k3s-node-password (inside the
TPM-sealed vault) so it survives reboots. Restore it before k3s
starts; save it inside check_start_k3s immediately after k3s launches,
covering both first-init and restart paths.

Added the bronw field case, it finds if there is no k3s-node-password
in the persist /var/lib, it will flag it, and delete the secret of the
node-password for itself

PR dependencies

How to test and validate this PR

With this patch in eve-k, at the single-node mode, convert it to part of the cluster.
Then restart the device after it is part of the cluster, do kubectl describe node,
we should not see the events having: NodePasswordValidationFailed

and test for brown field for existing eve-k cluster.
After the first image upgrade w/ this patch, the node-password will be removed from
the cluster. it takes a second reboot or k3s restart to take effect.

Changelog notes

fix(kube): persist k3s node password across reboots

PR Backports

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've tested my PR on arm64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

For backport PRs (remove it if it's not a backport):

  • I've added a reference link to the original PR
  • PR's title follows the template

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 18, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 21.05%. Comparing base (2caf795) to head (ca89af2).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5970      +/-   ##
==========================================
+ Coverage   20.64%   21.05%   +0.41%     
==========================================
  Files         489      499      +10     
  Lines       90431    92129    +1698     
==========================================
+ Hits        18667    19399     +732     
- Misses      70187    70972     +785     
- Partials     1577     1758     +181     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread pkg/kube/cluster-utils.sh Outdated
# password rides along in the cluster -> single-node snapshot for free
# (no separate snapshot helper needed).
#
# NOTE: this is a green-field fix. Devices that have already booted under
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to update these comments since I believe you fixed the brown field installs too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Updated.

Copy link
Copy Markdown

@zedi-pramodh zedi-pramodh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Just address the comments.

  The k3s node password lives on a tmpfs overlay and is regenerated on
  every reboot, causing NodePasswordValidationFailed errors against the
  server-side etcd secret and potentially leaving the node stuck
  NotReady.

  Persist the password to /var/lib/k3s-node-password (inside the
  TPM-sealed vault) so it survives reboots. Restore it before k3s
  starts; save it inside check_start_k3s immediately after k3s launches,
  covering both first-init and restart paths.

  Added the bronw field case, it finds if there is no k3s-node-password
  in the persist /var/lib, it will flag it, and delete the secret of the
  node-password for itself

Signed-off-by: naiming-zededa <naiming@zededa.com>
@github-actions github-actions Bot requested a review from zedi-pramodh May 18, 2026 20:10
@rene
Copy link
Copy Markdown
Contributor

rene commented May 19, 2026

@claude

Copy link
Copy Markdown
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants