[proto]: bolt a DRA driver frontend on the topology-aware policy.#536
[proto]: bolt a DRA driver frontend on the topology-aware policy.#536klihub wants to merge 15 commits intocontainers:mainfrom
Conversation
d10467e to
0f3a301
Compare
66c2519 to
8527808
Compare
8527808 to
f96ea65
Compare
776684c to
7ce62a1
Compare
7ce62a1 to
a3b4047
Compare
| cs.sharable = cs.sharable.Union(all.SharableCPUs().Intersection(cpus)) | ||
| cs.reserved = cs.reserved.Union(all.ReservedCPUs().Intersection(cpus)) | ||
| cs.claimed = cs.claimed.Difference(cpus) | ||
| } |
There was a problem hiding this comment.
After a Pod with DRA claim is terminated, general workloads do not gain access to the released Unclaimed CPUs. Considering the scenario, where a DRA claimed pod access to core 1, and the other generic shared workloads has access to 2-16 cores. After terminating the claimed Pod, the internal shared pool gets updated, but the generic workloads still has access to only 2-16 cores.
During Pod termination, StopContainer is invoked first followed by NodeUnprepareResources function. This 'unclaimCPUs' function falls under 'NodeUnprepareResources' flow, it unclaims and release the claimed CPUs through updateSharedAllocation for remaining workloads. This update happens internally, but is not reflected in the end container.
Releasing the claimed CPUs back to shared/reserved pool in function 'ReleaseCPU' (part of the StopContainer flow) seems to work, the general workloads can access full shared pool of 0-16 upon a claimed pod termination. But, not sure if this would be a valid placement.
There was a problem hiding this comment.
@vigashinikesavan It was a long while ago when we rolled this early prototype, so it took me a while to pick up where we left off. I tried a few simple cases and indeed once you release a claim, if you have burstable or a best-effort container which would be eligible to run on the freed up CPUs (if they run in a shared pool where the CPU was returned to), their effective cgroup cpuset might not get correctly updated.(.. immediately.)
But based on a quick look at the code, and my few simple tests, this does not happen as/because of what you state above. Instead what happens is that we do update the remaining containers correctly, but because the update did not happen in the context of a normal resource allocation or release (CreateContainer or StopContainer) the cpuset/resource updates we did for the containers do not get propagated back to the runtime as part of an NRI response but instead they stay cached in the plugin, until the next container creation or release. You can test this easily when you are in the incorrect state, by creating for instance a new best effort container, which will then flush all the collected pending updates in the response, including the ones collected during the release/unprepare of the claim. This will then cause the containers running in the shared pools to finally get correctly updated. What we should do in this situation is to update the affected containers using an unsolicited container update, but the current code does not do it.
Anyway, this was an early prototype to see how far we could get with what was available at the time we rolled this draft PR. This needs to be updated/reworked pretty much, because a lot has happened since on the DRA front, and the approach taken in this proto is now outdated.
There was a problem hiding this comment.
Got it. Thanks for the clarification.
f492576 to
25c5540
Compare
6b292c3 to
2c08fe3
Compare
A dear child has many names... Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
AsYaml can be used to produced YAML-formatted log blocks with
something like this:
logger.DebugBlock(" <my-obj> ", "%s", log.AsYaml(my-obj))
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Split out K8s client setup code from agent to make it more generally available to any kind of plugin, not just resource management ones. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Allow setting the content types a client accepts and the type it uses on the wire. Provide constants for JSON and protobuf content types. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Split out K8s watch wrapper setup code from agent to make it generally available to any kind of plugin, not just resource management ones. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Add the necessary RBAC rules (access resource slices and claims) and kubelet host mounts (plugin and plugin registry directories) to the topology-aware policy Helm chart. Add a device class for native.cpu DRA driver. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Add an interface for caching policy agnostic data, similar to
but simpler than {Get,Set}PolicyData(). Add an interface for
querying the full container environment variable list.
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Sort caches by level, kind, and id to enumerate caches globally. Add a common DRA device representation for CPUs. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Add the necessary plumbing to allow implementations act as a CPU DRA driver. In partice, add an option for publishing CPUs as DRA devices and interfaces to allocate and release CPUs as claims are prepared and unprepared. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Register as a 'native.cpu' DRA resource driver/plugin. Provide DRA CPU device publishing for policy implementations. Generate CDI Spec for the published CPU devices. Hook the active policy in for CPUs allocation and release for claim preparation and un- preparation. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Flush pending changes immediately using an unsolicited update to containers instead of waiting for the next NRI event to do so. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Publish CPUs as DRA devices. Use a sledgehammer to bolt allocation and release of DRA-claimed CPUs on top of the current implementation. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>
2c08fe3 to
9deb65b
Compare
This prototype patch set bolts a DRA allocation frontend on top of the existing topology aware resource policy plugin. The main intention with of this patch set is
Notes:
This patched NRI plugin, especially in its current state and form, is not a proposal for a first real DRA-based CPU driver.
If you want to play around with this (for instance modify the exposed CPU abstraction), the easiest way is to
helm install --devel -n kube-system test oci://ghcr.io/$YOUR_GITHUB_USERID/nri-plugins/helm-charts/nri-resource-policy-topology-aware --version v0.9-dra-driver-unstableYou can then test if things work with something like