Skip to content

grubconfigs: Sync grub.cfg and bootuuid.cfg to all ESPs on multi-device setups#1077

Draft
ckyrouac wants to merge 1 commit intocoreos:mainfrom
ckyrouac:grub-multi-fix
Draft

grubconfigs: Sync grub.cfg and bootuuid.cfg to all ESPs on multi-device setups#1077
ckyrouac wants to merge 1 commit intocoreos:mainfrom
ckyrouac:grub-multi-fix

Conversation

@ckyrouac
Copy link
Copy Markdown
Contributor

On multi-device setups (e.g. RAID 1 with an ESP per disk), grubconfigs::install() only writes grub.cfg and bootuuid.cfg to the single ESP currently mounted at /boot/efi. This means that if the system boots from a different disk, GRUB drops to a shell because there is no grub.cfg on that ESP.

Add sync_grub_configs_to_all_esps() which runs after the primary grubconfigs::install(). It discovers all ESP partitions across the provided devices, mounts each one to a temporary directory, and calls the new grubconfigs::install_to_esp() helper to write the static EFI grub.cfg and bootuuid.cfg.

Also refactor the ESP config writing in grubconfigs::install() into a shared write_esp_configs() helper to avoid code duplication.

Fixes #1076

Assisted-by: Claude Code (Opus 4.6)

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Mar 27, 2026

Hi @ckyrouac. Thanks for your PR.

I'm waiting for a coreos member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work.

Tip

We noticed you've done this a few times! Consider joining the org to skip this step and gain /lgtm and other bot rights. We recommend asking approvers on your previous PRs to sponsor you.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ckyrouac ckyrouac marked this pull request as draft March 27, 2026 13:37
@ckyrouac
Copy link
Copy Markdown
Contributor Author

pushing up what claude generated in case anyone wants to review it while I test/validate locally and add a test.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for synchronizing GRUB configuration files across all detected EFI System Partitions (ESPs) on multi-device systems. It refactors the GRUB installation logic into reusable helpers and introduces a synchronization loop that mounts additional ESPs to apply updates. Feedback suggests improving error logging during ESP detection and implementing a panic-safe RAII pattern for the mount/unmount operations.

src/bootupd.rs Outdated
Comment on lines +271 to +306
for esp_path in &esp_parts {
log::info!("Syncing GRUB configs to ESP {esp_path}");

// Mount the ESP
std::process::Command::new("mount")
.arg(esp_path)
.arg(tmpmnt)
.run_inherited()
.with_context(|| format!("Mounting ESP {esp_path}"))?;

let result = (|| -> Result<()> {
// Ensure the vendor directory exists under EFI/
let efi_vendor = tmpmnt.join("EFI").join(vendor);
std::fs::create_dir_all(&efi_vendor)
.with_context(|| format!("Creating {:?}", efi_vendor))?;

let efidir = openat::Dir::open(tmpmnt.join("EFI"))
.context("Opening EFI directory on ESP")?;

crate::grubconfigs::install_to_esp(
source_root,
&boot_grub2,
&efidir,
vendor,
write_uuid,
)
})();

// Always unmount, even on error
std::process::Command::new("umount")
.arg(tmpmnt)
.run_inherited()
.with_context(|| format!("Unmounting ESP {esp_path}"))?;

result?;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation for mounting and unmounting the ESP is not panic-safe. If a panic occurs after mounting but before unmounting (for example, inside crate::grubconfigs::install_to_esp), the umount command will not be executed. This could leave a stale mount point on a temporary directory that is about to be removed.

A more robust approach is to use a RAII guard struct that mounts in its constructor and unmounts in its Drop implementation. This ensures that the ESP is always unmounted, even in the case of a panic.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run in a private mountns anyways which avoids global leaks.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes multi-device (multi-ESP) installs where only the currently-mounted ESP at /boot/efi received EFI/<vendor>/grub.cfg and EFI/<vendor>/bootuuid.cfg, causing boot failures when firmware selects a different disk’s ESP.

Changes:

  • Refactors ESP config writing in grubconfigs::install() into a shared write_esp_configs() helper.
  • Adds grubconfigs::install_to_esp() to install GRUB config files into an already-mounted ESP EFI/ directory.
  • Adds sync_grub_configs_to_all_esps() in bootupd::install() to discover ESP partitions across provided devices, mount each, and sync the GRUB config files.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
src/grubconfigs.rs Extracts shared ESP config writing and adds a helper to install configs to an already-mounted ESP.
src/bootupd.rs Adds multi-device ESP discovery + mount loop to sync GRUB config files to all ESPs after the primary install.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/bootupd.rs Outdated
// Collect all ESP partition device paths.
let esp_parts: Vec<String> = devices
.iter()
.filter_map(|dev| dev.find_partition_of_esp().ok().map(|p| p.path()))
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find_partition_of_esp() errors are silently ignored via .ok() here, which can cause the sync step to become a no-op (e.g. one ESP discovered, one skipped due to a transient probe error) without any indication to the user. Consider matching the earlier EFI-device filtering logic in this file: log a warning per-device when ESP discovery fails, and only skip when it truly has no ESP, so failures are visible and diagnosable.

Suggested change
.filter_map(|dev| dev.find_partition_of_esp().ok().map(|p| p.path()))
.filter_map(|dev| match dev.find_partition_of_esp() {
Ok(p) => Some(p.path()),
Err(e) => {
log::warn!("Failed to discover ESP on device {:?}: {e:#}", dev);
None
}
})

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, in the bootc PR I added a new API for this

src/bootupd.rs Outdated
Comment on lines +274 to +288
// Mount the ESP
std::process::Command::new("mount")
.arg(esp_path)
.arg(tmpmnt)
.run_inherited()
.with_context(|| format!("Mounting ESP {esp_path}"))?;

let result = (|| -> Result<()> {
// Ensure the vendor directory exists under EFI/
let efi_vendor = tmpmnt.join("EFI").join(vendor);
std::fs::create_dir_all(&efi_vendor)
.with_context(|| format!("Creating {:?}", efi_vendor))?;

let efidir = openat::Dir::open(tmpmnt.join("EFI"))
.context("Opening EFI directory on ESP")?;
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After mounting the ESP, the code doesn’t validate that the mount is actually a VFAT ESP and writable (unlike the existing logic in efi.rs which checks filesystem type / mountpoint and calls util::ensure_writable_mount). Adding an explicit writability + fstype check here (or reusing the existing EFI mount helper) would prevent confusing failures when the ESP is mounted read-only or the wrong partition is mounted.

Copilot uses AI. Check for mistakes.
src/bootupd.rs Outdated
Comment on lines +253 to +262
// Collect all ESP partition device paths.
let esp_parts: Vec<String> = devices
.iter()
.filter_map(|dev| dev.find_partition_of_esp().ok().map(|p| p.path()))
.collect();

if esp_parts.len() < 2 {
log::debug!("Fewer than 2 ESPs found, nothing to sync");
return Ok(());
}
Copy link

Copilot AI Apr 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esp_parts may contain duplicate ESP paths (e.g. if the same device is passed twice or if multiple Device entries resolve to the same ESP). Consider de-duplicating the list before iterating to avoid redundant mount/copy/unmount cycles.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One existing problem we had is that I messed up the static configs stuff and didn't wire them in as something updatable in place. If we'd had that, this could have just been part of a general "sync content across ESP".

src/bootupd.rs Outdated
log::info!("Syncing GRUB configs to ESP {esp_path}");

// Mount the ESP
std::process::Command::new("mount")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nonblocking but I'd like to use the mount-as-fd new mount API for stuff like this in the future

src/bootupd.rs Outdated
Comment on lines +271 to +306
for esp_path in &esp_parts {
log::info!("Syncing GRUB configs to ESP {esp_path}");

// Mount the ESP
std::process::Command::new("mount")
.arg(esp_path)
.arg(tmpmnt)
.run_inherited()
.with_context(|| format!("Mounting ESP {esp_path}"))?;

let result = (|| -> Result<()> {
// Ensure the vendor directory exists under EFI/
let efi_vendor = tmpmnt.join("EFI").join(vendor);
std::fs::create_dir_all(&efi_vendor)
.with_context(|| format!("Creating {:?}", efi_vendor))?;

let efidir = openat::Dir::open(tmpmnt.join("EFI"))
.context("Opening EFI directory on ESP")?;

crate::grubconfigs::install_to_esp(
source_root,
&boot_grub2,
&efidir,
vendor,
write_uuid,
)
})();

// Always unmount, even on error
std::process::Command::new("umount")
.arg(tmpmnt)
.run_inherited()
.with_context(|| format!("Unmounting ESP {esp_path}"))?;

result?;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We run in a private mountns anyways which avoids global leaks.

src/bootupd.rs Outdated
// setups we need to sync these files to every other ESP so that
// any disk can boot independently.
if let Some(ref vendor) = installed_efi_vendor {
sync_grub_configs_to_all_esps(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd be a lot cleaner for us to get the list of explicit devices and then do an install on each one.

I think this would require changing the generic crate::grubconfigs::install invocation above instead.

Now I am not sure if we support multi-device BIOS. But basically WDYT about ensuring we gather all devices, and use that in the above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated with this suggestion

…ce setups

On multi-device setups (e.g. RAID 1 with an ESP per disk),
grubconfigs::install() only wrote grub.cfg and bootuuid.cfg to the
single ESP mounted at /boot/efi. If the system booted from a
different disk, GRUB would drop to a shell because no grub.cfg
existed on that ESP.

Extend grubconfigs::install() with an `additional_esp_devices`
parameter. When non-empty, each listed ESP is temporarily mounted
and receives copies of grub.cfg and bootuuid.cfg via a new
write_esp_configs() helper (extracted from the existing inline
logic to avoid duplication).

In bootupd::install(), discover colocated ESPs via
find_colocated_esps() on the first device and pass them through,
consistent with the approach already used in the update and
validate paths in efi.rs. Callers that do not need multi-ESP
support (bios.rs, efi.rs update path) simply pass an empty slice.

Fixes coreos#1076

Assisted-by: Claude Code (Opus 4.6)
Signed-off-by: ckyrouac <ckyrouac@redhat.com>
jmarrero added a commit to jmarrero/bootc-install-to-raid-example that referenced this pull request Apr 3, 2026
Three issues prevented a successful Anaconda-based bootc install onto a
RAID 1 array with per-disk ESP partitions:

1. Bind mount failure during ostreecontainer payload install:
   Anaconda mounts the root FS at /mnt/sysimage and then tries to
   bind-mount /mnt/sysimage/boot/efi2 and /boot/efi3 into /mnt/sysroot.
   However, it does not create the mount point directories for the
   secondary ESPs before the bind mount, causing:
     mount: /mnt/sysroot/boot/efi2: special device
            /mnt/sysimage/boot/efi2 does not exist.
   Fix: add a %pre-install script that creates the directories after
   the root FS is mounted but before the payload runs.

2. Empty secondary ESPs after installation:
   Anaconda only installs the bootloader (shim, GRUB, grub.cfg) to the
   primary ESP at /boot/efi. The secondary ESPs (efi2, efi3) are
   formatted and mounted but receive no bootloader files. Additionally,
   the 'noauto' fsoption causes Anaconda to unmount them before %post.
   Fix: add a %post --nochroot script that explicitly mounts each
   secondary ESP by device path and copies the full EFI directory tree
   from the primary ESP.

3. GRUB cannot find the kernel on the RAID root filesystem:
   The GRUB EFI binary shipped with CentOS Stream 10 does not include
   the mdraid1x module. GRUB's grub.cfg uses 'search --fs-uuid' to
   locate the root filesystem and load the kernel from the ostree
   deployment path, but it cannot access the md array.
   Fix: add a separate /boot partition (ext4, 1G) on the primary disk.
   GRUB can read ext4 natively without mdraid support. The kernel and
   initramfs live on /boot, and the initramfs handles RAID assembly
   for the root filesystem during boot.

Note: /boot is only on vda (not mirrored), so only disk 1 can boot
independently. Full boot redundancy from any disk would require either
whole-disk RAID (where GRUB reads from the raw mirror) or a GRUB build
with mdraid1x support. This is a known limitation tracked in:
  - bootc-dev/bootc#1911
  - coreos/bootupd#1077
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

grub.cfg and bootuuid.cfg not synced to secondary ESPs on multi-device setups

3 participants