MCO-2032: Avoid MCPs duplication at install time#5518
MCO-2032: Avoid MCPs duplication at install time#5518openshift-merge-bot[bot] merged 1 commit intoopenshift:mainfrom
Conversation
|
@pablintino: This pull request references MCO-2032 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
7180927 to
dfbcaff
Compare
dfbcaff to
474ec3b
Compare
e17ccc4 to
9de01c4
Compare
9de01c4 to
3840d79
Compare
28894d3 to
a224486
Compare
03f03bb to
0dbc827
Compare
31a93cf to
8fb8b90
Compare
|
@pablintino: This pull request references MCO-2032 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@pablintino: This pull request references MCO-2032 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@pablintino: This pull request references MCO-2032 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@pablintino: This pull request references MCO-2032 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
| return false | ||
| } | ||
| for _, generated := range generatedPools { | ||
| if reflect.DeepEqual(generated, pool) { |
There was a problem hiding this comment.
question (non-blocking): Why a DeepEqual() here? Is going by name / UUID insufficient to determine uniqueness here?
I'm most likely missing some greater context here 😄
There was a problem hiding this comment.
Based on how this is used, I think Pablo is using this to check for the exact generated MCP from our manifests, if we find two pools of the same name (i.e. a user provided an MCP override with the same name).
This does account for all metadata, including timestamps, etc. which... may cause unintended false negatives, but I think since this is before any controller/api modification of the MCP objects, in practice that shouldn't happen, so probably fine as is.
There was a problem hiding this comment.
Based on how this is used, I think Pablo is using this to check for the exact generated MCP from our manifests, if we find two pools of the same name (i.e. a user provided an MCP override with the same name).
Correct, and it "works" only because we control the generated data and we know there's nothing dynamic on it. It won't work with MCPs that have been already applied and have, as you said, timestamps, generation, etc.
yuqi-zhang
left a comment
There was a problem hiding this comment.
Some questions/comments inline
|
|
||
| // Group the pools by name | ||
| for _, pool := range pools { | ||
| if _, ok := groupedPools[pool.Name]; !ok { |
There was a problem hiding this comment.
I think this if condition is redundant, and groupedPools[pool.Name] = append(groupedPools[pool.Name], pool) will append properly to a nil slice
There was a problem hiding this comment.
Good one, I forgot appending to nil was a safe operation in Go!
| return false | ||
| } | ||
| for _, generated := range generatedPools { | ||
| if reflect.DeepEqual(generated, pool) { |
There was a problem hiding this comment.
Based on how this is used, I think Pablo is using this to check for the exact generated MCP from our manifests, if we find two pools of the same name (i.e. a user provided an MCP override with the same name).
This does account for all metadata, including timestamps, etc. which... may cause unintended false negatives, but I think since this is before any controller/api modification of the MCP objects, in practice that shouldn't happen, so probably fine as is.
| var lastPool *mcfgv1.MachineConfigPool | ||
| for _, poolObj := range poolObjs { | ||
| if manifests.ContainsMachineConfigPool(generatedPools, poolObj) { | ||
| // Handle a dev error in case the same pool is |
There was a problem hiding this comment.
This is interesting. Basically, there's these scenarios you're accounting for here: (G = generated, U = user provided, assuming same pool name)
G, U - this is the easiest, we want U
G, G, U - basically same as above
G, G - we want G
G, U, U - we just take the last match, based on however the slice got sorted (I guess it's maybe based on manifest name?)
U, U - same scenario
And all of them are "valid".
I want to say anything except scenario 1 (G, U) should actually be a failure, because either the user somehow provided conflicting manifests, or we somehow generated duplicates, both of which should be... fails? Would U, U or G, U, U work in the cluster? Or does the manifest parsing reject the duplicate elsewhere?
There was a problem hiding this comment.
G, U, U - we just take the last match, based on however the slice got sorted (I guess it's maybe based on manifest name?)
Yes, the order we had, and still have, in this case is the based on the file name.
I want to say anything except scenario 1 (G, U) should actually be a failure, because either the user somehow provided conflicting manifests, or we somehow generated duplicates, both of which should be... fails? Would U, U or G, U, U work in the cluster? Or does the manifest parsing reject the duplicate elsewhere?
And I would agree, but this exact scenario is how jobs are now configured... https://github.com/openshift/release/pull/73173/changes#diff-4dfd4f4e866cda920dd45858843286a4a93ae1318705fc5b0e5bdc66842afdbfR11 (forget about the manifest_ prefix, the prow job does some mangling with the name and the prefix is removed before the manifest is copied to the installer manifests). I wouldn't like to break the jobs we already have.
Or does the manifest parsing reject the duplicate elsewhere
There's no other filtering. This new function is all we have as filter. The installer doesn't filter either.
There was a problem hiding this comment.
Hmm, ok, so the installer is doing G, G instead of G, U. Weird, but I guess there's no harm in us filtering for the additional configs for now.
There was a problem hiding this comment.
Just a random followup thought, it this always deterministic? I ask because I wonder if in the case of (default manifest is rhel9, installer provides rhel10) we think both of them to be the generated case, and we take the "last object in the list" but what's guarantee'ing the last object to be then rhel10 overwrite one?
There was a problem hiding this comment.
@yuqi-zhang So, the default manifest doesn't set the stream at all, we leave it blank and let the render controller pick the default stream when rendering the MCs. So, in theory, that case shouldn't happen. Anyways, the order of the files is deterministic and it's based on the filename.
we think both of them to be the generated case,
That shouldn't be possible, as the deepequals should properly identify the generated one (the rhel-9 one) and the rhel-10 one should, given that, handled always as user provided.
This change is the preparatory work to allow the installer to provide its own set of MCP, including the managed ones. Without this change user or installer provided MCPs will collide with the MCO's ones if the file names are identical. This change allows users to reset the values of the MCP's spec to zero as the old merge logic didn't allow it. Signed-off-by: Pablo Rodriguez Nava <git@amail.pablintino.com>
8fb8b90 to
f194385
Compare
yuqi-zhang
left a comment
There was a problem hiding this comment.
/lgtm
Looks good after the changes, will let CI validate this doesn't break anything (maybe we should run some RHEL10 jobs if we can invoke them here)
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: pablintino, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/label acknowledge-critical-fixes-only |
|
Pre-merge verification steps: Pre-requisites TemplatesapiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: master
labels:
"operator.machineconfiguration.openshift.io/required-for-upgrade": ""
"machineconfiguration.openshift.io/mco-built-in": ""
"pools.operator.machineconfiguration.openshift.io/master": ""
spec:
machineConfigSelector:
matchLabels:
"machineconfiguration.openshift.io/role": "master"
nodeSelector:
matchLabels:
node-role.kubernetes.io/master: ""
osImageStream:
name: rhel-10
Case1: added the below files in openshift/ folder of manifests in
/label qe-approved |
|
@pablintino: This pull request references MCO-2032 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@ptalgulk01: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/test unit |
|
/test e2e-hypershift |
|
/retest-required |
1 similar comment
|
/retest-required |
|
@pablintino: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
- What I did
Allow the installer to provide MachineConfigPool manifests at bootstrap time. When present, the operator uses them as-is instead of generating its own.
For any managed pool not provided by the installer, the operator generates one with the OSImageStream from the install-config.
On the sync path, if a managed MCP is missing (e.g. deleted by the user), the operator recreates it using the original install-config.
- How to verify it
For all tests, be sure to have a recent openshift-installer, one that contains this change.
Case 1
Automated CI tests for regular installer deployments and hypershift should pass to make sure the default MCPs are still propery generated and consumed.
Case 2
Apply a user given MCP for the worker pool that doesn't use the filename of the MCO builtins
Case 3
Apply a user given MCP for the worker pool that uses the same filename the MCO uses
- Description for the changelog
Allow user/installer-provided MachineConfigPool manifests by removing duplicated manifests and ensure the user can reset builtin MCP specs.