-
Notifications
You must be signed in to change notification settings - Fork 379
feat: Support ProposeToBlessAlternativeGuestOsVersionCmd in ic-admin #8626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
The new command allows proposing a recovery GuestOS for some nodes or for a subnet. The code looks up chip ids from the registry in order to populate the proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):
-
Update
unreleased_changelog.md(if there are behavior changes, even if they are
non-breaking). -
Are there BREAKING changes?
-
Is a data migration needed?
-
Security review?
How to Satisfy This Automatic Review
-
Go to the bottom of the pull request page.
-
Look for where it says this bot is requesting changes.
-
Click the three dots to the right.
-
Select "Dismiss review".
-
In the text entry box, respond to each of the numbered items in the previous
section, declare one of the following:
-
Done.
-
$REASON_WHY_NO_NEED. E.g. for
unreleased_changelog.md, "No
canister behavior changes.", or for item 2, "Existing APIs
behave as before.".
Brief Guide to "Externally Visible" Changes
"Externally visible behavior change" is very often due to some NEW canister API.
Changes to EXISTING APIs are more likely to be "breaking".
If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.
If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.
Reference(s)
For a more comprehensive checklist, see here.
GOVERNANCE_CHECKLIST_REMINDER_DEDUP
| ic-nervous-system-root = { path = "../../nervous_system/root" } | ||
| ic-nns-common = { path = "../../nns/common" } | ||
| ic-nns-constants = { path = "../../nns/constants" } | ||
| ic-nns-governance = { path = "../../nns/governance" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to avoid this, because it could have a (very) detrimental impact on build performance.
| impl ProposeToReviseElectedGuestsOsVersionsCmd { | ||
| /// Reads and returns the SEV-SNP measurements from the input file. | ||
| fn read_guest_launch_measurements(&self) -> Option<GuestLaunchMeasurements> { | ||
| self.guest_launch_measurements_path.as_ref().map(|path| { | ||
| let guest_launch_measurements_json = | ||
| std::fs::read(path).unwrap_or_else(|_| panic!("Failed to read {}", path.display())); | ||
| serde_json::from_slice::<GuestLaunchMeasurements>(&guest_launch_measurements_json) | ||
| .expect("Could not decode GuestLaunchMeasurements") | ||
| }) | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems orthogonal to the main point of this PR (per the title).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved and refactored this code into read_from_json_file so it can be reused.
| let guest_launch_measurements = self | ||
| .guest_launch_measurements_path | ||
| .as_ref() | ||
| .map(|path| read_from_json_file::<GuestLaunchMeasurements>(path)); | ||
|
|
||
| let payload = ReviseElectedGuestosVersionsPayload { | ||
| replica_version_to_elect: self.guestos_version_to_elect.clone(), | ||
| release_package_sha256_hex: self.release_package_sha256_hex.clone(), | ||
| release_package_urls: self.release_package_urls.clone(), | ||
| guest_launch_measurements: self.read_guest_launch_measurements(), | ||
| guest_launch_measurements, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems orthogonal? Also, I kind of like how it used to be spun out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the original read_guest_launch_measurements into a more generic read_from_json_file.
| /// Subnet to query for chip IDs and launch measurements. | ||
| /// If specified, chip_ids and base_launch_measurements are automatically fetched from the registry. | ||
| /// Either --subnet or (--chip-ids/--node-ids and --base-launch-measurements) must be provided. | ||
| #[clap( | ||
| long, | ||
| conflicts_with_all = &["chip_ids", "base_launch_measurements", "node_ids"], | ||
| required_unless_present_any = &["chip_ids", "node_ids"] | ||
| )] | ||
| subnet: Option<SubnetDescriptor>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, if this is used, then all nodes in a subnet would be allowed to run this altnerative guest OS, right? Otherwise, if you want more precise targetting, you would use either --chip-ids or --node-ids for targetting, right? How come when using --chip-ids (or --node-ids), --base-launch-measurements MUST be supplied? Can't it also be inferred? Or is inference just an unimplemented feature that COULD be added later?
| // Query the SubnetRecord | ||
| let subnet_record_key = make_subnet_record_key(subnet_id); | ||
| let (subnet_record_bytes, _version) = registry_canister | ||
| .get_value_with_update(subnet_record_key.as_bytes().to_vec(), None) | ||
| .await | ||
| .unwrap_or_else(|err| { | ||
| panic!( | ||
| "Failed to fetch SubnetRecord for subnet {}: {}", | ||
| subnet_id, err | ||
| ) | ||
| }); | ||
| let subnet_record = | ||
| SubnetRecordProto::decode(&subnet_record_bytes[..]).unwrap_or_else(|err| { | ||
| panic!( | ||
| "Failed to decode SubnetRecord for subnet {}: {}", | ||
| subnet_id, err | ||
| ) | ||
| }); | ||
| let node_ids: Vec<NodeId> = subnet_record | ||
| .membership | ||
| .iter() | ||
| .map(|id| { | ||
| let principal_id = PrincipalId::try_from(id.as_slice()).unwrap(); | ||
| NodeId::from(principal_id) | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this could be spun out.
| ) | ||
| .await; | ||
|
|
||
| let chip_ids = get_chip_ids_from_subnet_nodes(®istry_canister, &node_ids).await; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this should be moved up to where node_ids is calculated.
|
|
||
| // Execute all futures in parallel and collect results | ||
| let chip_ids = futures::future::join_all(futures).await; | ||
| let chip_ids: Vec<_> = chip_ids.into_iter().flatten().collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, flatten causes failures to be swept under the rug, which seems like something you wouldn't want. This seems to require that the caller detect problems by comparing the len of the output to the len of the input, which seems like an oblique way to detect problems, which isn't ideal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the usecase, I may want that. If a user specifies a subnet, not every node in the subnet may have a chip id, in that case we want to skip those nodes (while logging them to the user).
The new command allows proposing a recovery GuestOS for some nodes or for a subnet. The code looks up chip ids from the registry in order to populate the proposal.
Example: