Skip to content

hypervisor: refactor VMM overhead into shared library and use in eve-k#5681

Open
zedi-pramodh wants to merge 1 commit into
lf-edge:masterfrom
zedi-pramodh:vmm-overhead
Open

hypervisor: refactor VMM overhead into shared library and use in eve-k#5681
zedi-pramodh wants to merge 1 commit into
lf-edge:masterfrom
zedi-pramodh:vmm-overhead

Conversation

@zedi-pramodh
Copy link
Copy Markdown

This commit adds VMM Overhead support to eve-k

Extract VMM overhead calculation functions from kvm.go into a new shared file vmm_overhead.go (no build tag) so they are available to all hypervisor backends:

  • vmmOverhead: top-level function respecting global config, per-app override, and automatic estimation priority
  • estimatedVMMOverhead: combines all overhead components
  • ramVMMOverhead: 2.5% of domain RAM for page table overhead
  • qemuVMMOverhead: 20 MB for QEMU binaries/libraries
  • mmioVMMOverhead: 1% of total MMIO size for GPU/VGA passthrough devices
  • cpuVMMOverhead: 3 MB per vCPU
  • undefinedVMMOverhead: 350 MB base overhead for QEMU internal use

kvm.go: remove the above functions (now in vmm_overhead.go); CountMemOverhead wrapper is unchanged.

kubevirt.go: add CountMemOverhead method on kubevirtContext that delegates to the shared vmmOverhead function, replacing the inherited zero-overhead ctrdContext implementation. Add uuid import.

How to test and validate this PR

  1. Create a VM with VMM overhead value
  2. Verify that value is calculated and published in zedmanager AppInstanceStatus
  3. VM still starts with actual RAM value provided, verified that getRemainingMemory(ctx) uses this overhead in remaining memory calculation

Changelog notes

Users can now use VMM overhead value in AppInstance config for eve-k

Checklist

  • I've provided a proper description
  • I've added the proper documentation
  • I've tested my PR on amd64 device
  • I've written the test verification instructions
  • I've set the proper labels to this PR

And the last but not least:

  • I've checked the boxes above, or I've provided a good reason why I didn't
    check them.

Please, check the boxes above after submitting the PR in interactive mode.

@zedi-pramodh
Copy link
Copy Markdown
Author

Verified domain metrics to see overhead got added..
[kube] root@c2c8906f-4230-4fd6-99c7-4456fcebc6d4:/run/domainmgr/DomainMetric$ cat 5454a0a1-7da4-4278-bb5a-a6c91c09fbb3.json | jq
{
"UUIDandVersion": {
"UUID": "5454a0a1-7da4-4278-bb5a-a6c91c09fbb3",
"Version": "1"
},
"CPUTotalNs": 0,
"CPUScaled": 2,
"AllocatedMB": 2648, --> 600MB overhead
"UsedMemory": 305,
"MaxUsedMemory": 305,
"AvailableMemory": 1598,
"UsedMemoryPercent": 0.11518126888217523,
"LastHeard": "2026-03-17T00:17:13.383722193Z",
"Activated": true,
"NodeName": ""
}

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 17, 2026

Codecov Report

❌ Patch coverage is 32.53968% with 85 lines in your changes missing coverage. Please review.
✅ Project coverage is 21.08%. Comparing base (2caf795) to head (58203bc).

Files with missing lines Patch % Lines
pkg/pillar/hypervisor/vmm_overhead.go 38.46% 43 Missing and 5 partials ⚠️
pkg/pillar/hypervisor/kubevirt.go 30.55% 24 Missing and 1 partial ⚠️
pkg/pillar/cmd/zedmanager/updatestatus.go 0.00% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5681      +/-   ##
==========================================
+ Coverage   20.64%   21.08%   +0.44%     
==========================================
  Files         489      500      +11     
  Lines       90431    92160    +1729     
==========================================
+ Hits        18667    19432     +765     
- Misses      70187    70963     +776     
- Partials     1577     1765     +188     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors KVM VMM memory overhead estimation into a shared hypervisor helper and applies the same overhead calculation to the KubeVirt (“eve-k”) backend so that AppInstance memory accounting can include hypervisor overhead.

Changes:

  • Extracted VMM overhead calculation helpers from kvm.go into a new shared vmm_overhead.go.
  • Updated KubeVirt backend to report/track VMM memory overhead (including adding it into AllocatedMB metrics output).
  • Removed the duplicated overhead helpers from kvm.go (KVM continues to call the same vmmOverhead wrapper).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
pkg/pillar/hypervisor/vmm_overhead.go New shared VMM overhead estimation implementation (RAM/QEMU/MMIO/CPU/base components + global/per-app override priority).
pkg/pillar/hypervisor/kvm.go Removes overhead helper implementations now moved into the shared file.
pkg/pillar/hypervisor/kubevirt.go Adds CountMemOverhead for kubevirt and propagates overhead into domain/pod metrics accounting via stored metadata.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment thread pkg/pillar/hypervisor/kubevirt.go
Comment thread pkg/pillar/hypervisor/kubevirt.go Outdated
Comment on lines +515 to +521
overhead, err := vmmOverhead(domainName, config.UUIDandVersion.UUID,
int64(config.Memory), int64(config.VMMMaxMem),
int64(config.MaxCpus), int64(config.VCpus), config.IoAdapterList, aa, nil)
if err != nil {
logrus.Warnf("CreateReplicaVMIConfig: vmmOverhead failed for %s: %v, using 0", domainName, err)
overhead = 0
}
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will fix this

Comment thread pkg/pillar/hypervisor/kubevirt.go Outdated
Comment thread pkg/pillar/hypervisor/kubevirt.go Outdated
// Add VMM overhead to AllocatedMB so it reflects total host memory
// consumption (guest RAM + hypervisor overhead), consistent with KVM.
if vmis, ok := ctx.vmiList[n]; ok && vmis.memOverhead > 0 {
r.AllocatedMB += uint32(vmis.memOverhead / (1024 * 1024))
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

valid point, will fix this.

Comment thread pkg/pillar/hypervisor/kubevirt.go Outdated
// overhead so it reflects total host memory consumption, consistent with KVM.
allocatedMB := uint32(memoryLimits.Value()) / BytesInMegabyte
if vmis != nil && vmis.memOverhead > 0 {
allocatedMB += uint32(vmis.memOverhead / uint64(BytesInMegabyte))
Comment thread pkg/pillar/hypervisor/vmm_overhead.go
Comment thread pkg/pillar/hypervisor/vmm_overhead.go Outdated
}

// memory allocated by QEMU for its own purposes.
// statistical analysis did not revile any correlation between
Copy link
Copy Markdown
Contributor

@eriknordmark eriknordmark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The copilot comments seems relevant. Please review them and see what might need to be changed.

@zedi-pramodh
Copy link
Copy Markdown
Author

The copilot comments seems relevant. Please review them and see what might need to be changed.

Looks like Claude and copilot does not like each other :) let me review and get back. thanks

@github-actions github-actions Bot requested a review from eriknordmark March 23, 2026 21:33
@eriknordmark
Copy link
Copy Markdown
Contributor

@zedi-pramodh can you rebase this on master so we can get the eve-k build to work?

@zedi-pramodh
Copy link
Copy Markdown
Author

Rebased and pushed the commits

Copy link
Copy Markdown
Contributor

@rucoder rucoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zedi-pramodh squash both commits together. it doesn't make sens to have "fix review comments" on master

var err error
overhead, err = vmmOverhead(domainName, config.UUIDandVersion.UUID,
int64(config.Memory), int64(config.VMMMaxMem),
int64(config.MaxCpus), int64(config.VCpus), config.IoAdapterList, aa, ctx.globalConfig)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, you should check extra carefully that the adapters list is up to date: the adapters have already been reserved, and the domain knows what it owns. We had an issue with that in the past: when we called the function before it was resolved, it just silently returned zero.
You can check the fix here and decide if the code in the kubevirt version is already fine. https://github.com/lf-edge/eve/pull/5413/changes

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I blindly moved all your fixes to common file. So those are already considered.

Copy link
Copy Markdown
Member

@OhmSpectator OhmSpectator left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some doubts about making this code generic. The functions we use to estimate vmmOverhead were tuned exactly for KVM + QEMU VMs. We did a set of experiments and code research to figure them out. Other setups (not pure KVM) most probably have other overhead. So, I would say we should not blindly copy the logic and expect it to work the same. Even if kubevirt or any other hyp also uses a similar setup, I guess they can have some other overheads that might be reflected in THEIR formula.

@zedi-pramodh
Copy link
Copy Markdown
Author

I have some doubts about making this code generic. The functions we use to estimate vmmOverhead were tuned exactly for KVM + QEMU VMs. We did a set of experiments and code research to figure them out. Other setups (not pure KVM) most probably have other overhead. So, I would say we should not blindly copy the logic and expect it to work the same. Even if kubevirt or any other hyp also uses a similar setup, I guess they can have some other overheads that might be reflected in THEIR formula.

Good question. kubevirt is a Kubernetes API wrapper for kvm + qemu. So all the work done for eve-kvm is valid here too. Coming to kubevirt itself having the overhead, they handle that inside their implementation and user need not know about that AFAIU. https://kubevirt.io/user-guide/compute/virtual_hardware/#memory-overhead

@zedi-pramodh
Copy link
Copy Markdown
Author

Merged the commits and rebased to master.

@zedi-pramodh
Copy link
Copy Markdown
Author

Rebased to latest master

@zedi-pramodh zedi-pramodh force-pushed the vmm-overhead branch 2 times, most recently from dd84610 to 726553a Compare April 28, 2026 17:58
@zedi-pramodh
Copy link
Copy Markdown
Author

Rebased to latest master.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zedi-pramodh I know you didn't touch this part, but it doesn't hurt to fix a codespell and get 100% Yetus passing. Appreciate if you could fix it.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok will do that

Copy link
Copy Markdown
Contributor

@rene rene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @rucoder , pls, take another look and remove your request to changes if you are ok with the current changes.

@github-actions github-actions Bot requested a review from rene May 7, 2026 03:52
@zedi-pramodh
Copy link
Copy Markdown
Author

rebased to master again :)

@github-actions github-actions Bot requested a review from rene May 17, 2026 02:41
@zedi-pramodh
Copy link
Copy Markdown
Author

rebased to master again :)

…nto KubeVirt

Extract VMM overhead calculation functions from kvm.go into a new shared
file vmm_overhead.go (no build tag) so they are available to all hypervisor
backends:
- vmmOverhead: top-level function respecting global config, per-app override,
  and automatic estimation priority
- estimatedVMMOverhead: combines all overhead components
- ramVMMOverhead: 2.5% of domain RAM for page table overhead
- qemuVMMOverhead: 20 MB for QEMU binaries/libraries
- mmioVMMOverhead: 1% of total MMIO size for GPU/VGA passthrough devices
- cpuVMMOverhead: 3 MB per vCPU
- undefinedVMMOverhead: 350 MB base overhead for QEMU internal use

kvm.go: remove the above functions (now in vmm_overhead.go); CountMemOverhead
wrapper is unchanged.

kubevirt.go: add CountMemOverhead method on kubevirtContext that delegates to
the shared vmmOverhead function, replacing the inherited zero-overhead
ctrdContext implementation. Add uuid import.

include VMM overhead in DomainMetric.AllocatedMB

Previously AllocatedMB in KubeVirt DomainMetric was set to the Kubernetes
resource limit (guest RAM only), so reported metrics and device-level
AllocatedAppsMB undercounted total host memory consumption by the full
VMM overhead.

Calculate and store the VMM overhead (via the shared vmmOverhead function)
in vmiMetaData when the domain config is created — both for VMI replicasets
(CreateReplicaVMIConfig) and pod replicasets (CreateReplicaPodConfig).

In GetDomsCPUMem (VMI virt-handler metrics path), add the stored overhead
to AllocatedMB in the post-processing loop after all metrics are filled.

In getPodMetrics (pod replicaset path), add the stored overhead to
AllocatedMB before constructing the DomainMetric.

This makes KubeVirt consistent with KVM, where AllocatedMB is read from
the cgroup HierarchicalMemoryLimit which naturally includes QEMU overhead.

- updatestatus.go: skip CountMemOverhead call when VirtualizationMode is NOHYPER
- kubevirt.go: skip vmmOverhead in CreateReplicaVMIConfig and
  CreateReplicaPodConfig when VirtualizationMode is NOHYPER

Store globalConfig on kubevirtContext in Setup() so that CreateReplicaVMIConfig
and CreateReplicaPodConfig can pass it to vmmOverhead instead of nil.

This ensures the global VmmMemoryLimitInMiB override is respected when
calculating VMM overhead during domain config creation, consistent with
the CountMemOverhead path.

Replace truncating integer division (memOverhead / (1024 * 1024)) with
roundFromBytesToMbytes() when adding VMM overhead to AllocatedMB. This
avoids under-reporting overhead by up to ~1 MB and is consistent with
how other memory conversions are done in the codebase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Signed-off-by: Pramodh Pallapothu <pramodh@zededa.com>
@zedi-pramodh
Copy link
Copy Markdown
Author

rebased to master again !!! This has been sitting here for 2.5 months now :)

Copy link
Copy Markdown
Contributor

@andrewd-zededa andrewd-zededa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory limit cast will cause issues with larger VMs

}
// AllocatedMB is the Kubernetes resource limit (guest RAM only); add VMM
// overhead so it reflects total host memory consumption, consistent with KVM.
allocatedMB := uint32(memoryLimits.Value()) / BytesInMegabyte
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The memory limits value is an int64, this division must be inside the cast, otherwise the uint32 cast will wrap for a VM using 4GB+

// CountMemOverhead returns the memory overhead estimation for a KubeVirt VM domain.
func (ctx kubevirtContext) CountMemOverhead(domainName string, domainUUID uuid.UUID, domainRAMSize int64, vmmMaxMem int64,
domainMaxCpus int64, domainVCpus int64, domainIoAdapterList []types.IoAdapter, aa *types.AssignableAdapters,
globalConfig *types.ConfigItemValueMap) (uint64, error) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globalConfig *types.ConfigItemValueMap is passed in as a func parameter but already available as ctx.globalConfig. Can this be consolidated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants