[proto]: bolt a DRA driver frontend on the topology-aware policy. by klihub · Pull Request #536 · containers/nri-plugins

klihub · 2025-06-14T08:36:48Z

This prototype patch set bolts a DRA allocation frontend on top of the existing topology aware resource policy plugin. The main intention with of this patch set is

provide something practical to play around with for the feasibility study of enabling DRA-based CPU allocation,
allow (relatively) easy experimentation with how to expose CPU as DRA devices (IOW test various CPU DRA attributes)
allow testing how DRA-based CPU allocation (using non-trivial CEL expressions) would scale with cluster and cluster node size

Notes:
This patched NRI plugin, especially in its current state and form, is not a proposal for a first real DRA-based CPU driver.

If you want to play around with this (for instance modify the exposed CPU abstraction), the easiest way is to

fork the main NRI Reference Plugins repo
enable github actions in your personal fork
make any changes you want (for instance, to alter the CPU abstraction, take a look at cpu.DRA()
Push your changes to ssh://git@github.com/$YOUR_FORK/nri-plugins/refs/heads/test/build/dra-driver.
Wait for the image and Helm chart publishing actions to succeed
Once done, you can pull the result in to your cluster with something like helm install --devel -n kube-system test oci://ghcr.io/$YOUR_GITHUB_USERID/nri-plugins/helm-charts/nri-resource-policy-topology-aware --version v0.9-dra-driver-unstable

You can then test if things work with something like

apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: any-cores
spec:
  spec:
    devices:
      requests:
      - name: cpu
        exactly:
          deviceClassName: native.cpu
---
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: p-cores
spec:
  spec:
    devices:
      requests:
      - name: cpu
        exactly:
          deviceClassName: native.cpu
          selectors:
            - cel:
                expression: device.attributes["native.cpu"].coreType == "P-core"
          count: 1
---
apiVersion: resource.k8s.io/v1
kind: ResourceClaimTemplate
metadata:
  name: e-cores
spec:
  spec:
    devices:
      requests:
      - name: cpu
        exactly:
          deviceClassName: native.cpu
          selectors:
            - cel:
                expression: device.attributes["native.cpu"].coreType == "E-core"
          count: 1
---
apiVersion: v1
kind: Pod
metadata:
  name: pcore-test
  labels:
    app: pod
spec:
  containers:
  - name: ctr0
    image: busybox
    imagePullPolicy: IfNotPresent
    args:
      - /bin/sh
      - -c
      - trap 'exit 0' TERM; sleep 3600 & wait
    resources:
      requests:
        cpu: 1
        memory: 100M
      limits:
        cpu: 1
        memory: 100M
      claims:
      - name: claim-pcores
  resourceClaims:
  - name: claim-pcores
    resourceClaimTemplateName: p-cores
  terminationGracePeriodSeconds: 1

vigashinikesavan · 2025-12-16T08:54:21Z

cmd/plugins/topology-aware/policy/resources.go

+	cs.sharable = cs.sharable.Union(all.SharableCPUs().Intersection(cpus))
+	cs.reserved = cs.reserved.Union(all.ReservedCPUs().Intersection(cpus))
+	cs.claimed = cs.claimed.Difference(cpus)
+}


After a Pod with DRA claim is terminated, general workloads do not gain access to the released Unclaimed CPUs. Considering the scenario, where a DRA claimed pod access to core 1, and the other generic shared workloads has access to 2-16 cores. After terminating the claimed Pod, the internal shared pool gets updated, but the generic workloads still has access to only 2-16 cores.

During Pod termination, StopContainer is invoked first followed by NodeUnprepareResources function. This 'unclaimCPUs' function falls under 'NodeUnprepareResources' flow, it unclaims and release the claimed CPUs through updateSharedAllocation for remaining workloads. This update happens internally, but is not reflected in the end container.

Releasing the claimed CPUs back to shared/reserved pool in function 'ReleaseCPU' (part of the StopContainer flow) seems to work, the general workloads can access full shared pool of 0-16 upon a claimed pod termination. But, not sure if this would be a valid placement.

@vigashinikesavan It was a long while ago when we rolled this early prototype, so it took me a while to pick up where we left off. I tried a few simple cases and indeed once you release a claim, if you have burstable or a best-effort container which would be eligible to run on the freed up CPUs (if they run in a shared pool where the CPU was returned to), their effective cgroup cpuset might not get correctly updated.(.. immediately.)

But based on a quick look at the code, and my few simple tests, this does not happen as/because of what you state above. Instead what happens is that we do update the remaining containers correctly, but because the update did not happen in the context of a normal resource allocation or release (CreateContainer or StopContainer) the cpuset/resource updates we did for the containers do not get propagated back to the runtime as part of an NRI response but instead they stay cached in the plugin, until the next container creation or release. You can test this easily when you are in the incorrect state, by creating for instance a new best effort container, which will then flush all the collected pending updates in the response, including the ones collected during the release/unprepare of the claim. This will then cause the containers running in the shared pools to finally get correctly updated. What we should do in this situation is to update the affected containers using an unsolicited container update, but the current code does not do it.

Anyway, this was an early prototype to see how far we could get with what was available at the time we rolled this draft PR. This needs to be updated/reworked pretty much, because a lot has happened since on the DRA front, and the approach taken in this proto is now outdated.

Got it. Thanks for the clarification.

A dear child has many names... Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

AsYaml can be used to produced YAML-formatted log blocks with something like this: logger.DebugBlock(" <my-obj> ", "%s", log.AsYaml(my-obj)) Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Split out K8s client setup code from agent to make it more generally available to any kind of plugin, not just resource management ones. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Allow setting the content types a client accepts and the type it uses on the wire. Provide constants for JSON and protobuf content types. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Split out K8s watch wrapper setup code from agent to make it generally available to any kind of plugin, not just resource management ones. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Add the necessary RBAC rules (access resource slices and claims) and kubelet host mounts (plugin and plugin registry directories) to the topology-aware policy Helm chart. Add a device class for native.cpu DRA driver. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Add an interface for caching policy agnostic data, similar to but simpler than {Get,Set}PolicyData(). Add an interface for querying the full container environment variable list. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Sort caches by level, kind, and id to enumerate caches globally. Add a common DRA device representation for CPUs. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Add the necessary plumbing to allow implementations act as a CPU DRA driver. In partice, add an option for publishing CPUs as DRA devices and interfaces to allocate and release CPUs as claims are prepared and unprepared. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Register as a 'native.cpu' DRA resource driver/plugin. Provide DRA CPU device publishing for policy implementations. Generate CDI Spec for the published CPU devices. Hook the active policy in for CPUs allocation and release for claim preparation and un- preparation. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Flush pending changes immediately using an unsolicited update to containers instead of waiting for the next NRI event to do so. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Publish CPUs as DRA devices. Use a sledgehammer to bolt allocation and release of DRA-claimed CPUs on top of the current implementation. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

klihub force-pushed the test/build/dra-driver branch 3 times, most recently from d10467e to 0f3a301 Compare June 14, 2025 14:08

klihub changed the title ~~[prototype]: bolt a test DRA driver on top of the topology-aware policy plugin.~~ [proto]: bolt a DRA driver frontend on the topology-aware policy. Jun 14, 2025

klihub force-pushed the test/build/dra-driver branch 3 times, most recently from 66c2519 to 8527808 Compare June 14, 2025 15:47

klihub force-pushed the test/build/dra-driver branch from 8527808 to f96ea65 Compare June 23, 2025 06:36

klihub force-pushed the test/build/dra-driver branch 2 times, most recently from 776684c to 7ce62a1 Compare August 4, 2025 09:39

klihub force-pushed the test/build/dra-driver branch from 7ce62a1 to a3b4047 Compare August 11, 2025 10:12

vigashinikesavan reviewed Dec 16, 2025

View reviewed changes

klihub force-pushed the test/build/dra-driver branch 2 times, most recently from f492576 to 25c5540 Compare December 16, 2025 19:11

klihub force-pushed the test/build/dra-driver branch 4 times, most recently from 6b292c3 to 2c08fe3 Compare February 9, 2026 18:29

klihub added 12 commits February 10, 2026 13:27

Makefile: add shorthand aliasen for some targets.

2b3a577

A dear child has many names... Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

log: add AsYaml() for YAML-formatted log (blocks).

86e4c7a

AsYaml can be used to produced YAML-formatted log blocks with something like this: logger.DebugBlock(" <my-obj> ", "%s", log.AsYaml(my-obj)) Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

pkg/kubernetes: split out client setup code from agent.

5dcb66d

Split out K8s client setup code from agent to make it more generally available to any kind of plugin, not just resource management ones. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

pkg/kubernetes: allow setting accepted and wire content types.

42ec102

Allow setting the content types a client accepts and the type it uses on the wire. Provide constants for JSON and protobuf content types. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

pkg/kubernetes: split out watch wrapper from agent.

45a14d1

Split out K8s watch wrapper setup code from agent to make it generally available to any kind of plugin, not just resource management ones. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

agent: expose node name, kube client and config.

8814064

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

sysfs: enumerate caches, add CPU DRA device abstraction.

9497e4d

Sort caches by level, kind, and id to enumerate caches globally. Add a common DRA device representation for CPUs. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

policy: add getter for sysfs.System being used.

eec1371

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

klihub added 3 commits February 10, 2026 13:52

resmgr: flush pending updates after claim unprepare.

b2f9e8f

Flush pending changes immediately using an unsolicited update to containers instead of waiting for the next NRI event to do so. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

topology-aware: bolt on a DRA CPU driver frontend.

cdaa70e

Publish CPUs as DRA devices. Use a sledgehammer to bolt allocation and release of DRA-claimed CPUs on top of the current implementation. Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

docs: add a README for the DRA prototype.

9deb65b

Signed-off-by: Krisztian Litkey <krisztian.litkey@intel.com>

klihub force-pushed the test/build/dra-driver branch from 2c08fe3 to 9deb65b Compare February 10, 2026 11:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[proto]: bolt a DRA driver frontend on the topology-aware policy.#536

[proto]: bolt a DRA driver frontend on the topology-aware policy.#536
klihub wants to merge 15 commits intocontainers:mainfrom
klihub:test/build/dra-driver

klihub commented Jun 14, 2025 •

edited

Loading

Uh oh!

vigashinikesavan Dec 16, 2025

Uh oh!

klihub Dec 16, 2025 •

edited

Loading

Uh oh!

vigashinikesavan Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

klihub commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vigashinikesavan Dec 16, 2025

Choose a reason for hiding this comment

Uh oh!

klihub Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vigashinikesavan Dec 19, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

klihub commented Jun 14, 2025 •

edited

Loading

klihub Dec 16, 2025 •

edited

Loading