A complete, real-world hands-on guide to running two CNCF sandbox projects (kagent and HAMi) on Nebius AI Cloud, with Nebius Token Factory as the LLM backend. Every command was executed live. Every output is real.
You are about to start reading a text that is far removed from any nonsense produced by prompt outputs.
kagent makes AI agents actual K8s resources: declarative, version-controlled, RBAC-governed, and inspectable with kubectl.
HAMi virtualizes GPU resources at the K8s scheduler level, allowing multiple pods to share a single physical GPU with hard memory and compute isolation.
Running these two together on Nebius AI Cloud creates something genuinely new.
An AI agent system that manages GPU-virtualized infrastructure, all without depending on any closed-source AI provider.
This is not a tutorial built from doc.
Every command in this article was run on a live my Nebius tarantula-instance VM with a real NVIDIA L40S GPU.
The outputs are real. The errors I hit are documented and explained. The fixes are verified.
┌─────────────────────────────────────────────────────────────────┐
│ Nebius AI Cloud │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ my tarantula-instance (L40S GPU VM) │ │
│ │ │ │
│ │ ┌──────────────────────────────────────────────────┐ │ │
│ │ │ k3s K8s Cluster │ │ │
│ │ │ │ │ │
│ │ │ ┌─────────────────────┐ ┌──────────────────┐ │ │ │
│ │ │ │ kagent namespace │ │ kube-system ns │ │ │ │
│ │ │ │ │ │ │ │ │ │
│ │ │ │ ┌───────────────┐ │ │ ┌────────────┐ │ │ │ │
│ │ │ │ │ k8s-agent │ │ │ │ hami- │ │ │ │ │
│ │ │ │ │ helm-agent │ │ │ │ scheduler │ │ │ │ │
│ │ │ │ │ sre- │ │ │ │ │ │ │ │ │
│ │ │ │ │ orchestrator │ │ │ │ hami- │ │ │ │ │
│ │ │ │ │ + 8 more │ │ │ │ device- │ │ │ │ │
│ │ │ │ └───────────────┘ │ │ │ plugin │ │ │ │ │
│ │ │ │ │ │ └────────────┘ │ │ │ │
│ │ │ │ ┌───────────────┐ │ └──────────────────┘ │ │ │
│ │ │ │ │ ModelConfig │ │ │ │ │
│ │ │ │ │ (Nebius TF) │ │ ┌──────────────────┐ │ │ │
│ │ │ │ └───────────────┘ │ │ Physical L40S │ │ │ │
│ │ │ │ │ │ GPU (46GB VRAM) │ │ │ │
│ │ │ │ ┌───────────────┐ │ │ → 10x vGPU │ │ │ │
│ │ │ │ │ kagent-tools │ │ │ via HAMi │ │ │ │
│ │ │ │ │ MCP Server │ │ └──────────────────┘ │ │ │
│ │ │ │ └───────────────┘ │ │ │ │
│ │ │ └─────────────────────┘ │ │ │
│ │ └──────────────────────────────────────────────────┘ │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────┐ │
│ │ Nebius Token Factory │ │
│ │ (OpenAI-compatible API) │ │
│ │ Model: Llama 3.3 70B │ │
│ │ Endpoint: api.studio.nebius.ai│ │
│ └──────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
Data flow for every agent interaction:
- User sends task via CLI (
kagent invoke) or UI (http://89.169.110.178:8080) - kagent controller routes to the appropriate agent pod
- Agent pod calls Nebius Token Factory API with the task + system prompt + tool definitions
- Llama 3.3 70B decides which tool to call (e.g.,
k8s_get_resources,k8s_describe_resource) - kagent-tools MCP server executes the K8s API call
- Result is returned to the LLM for interpretation
- LLM generates a human-readable response
- Response streams back to the user
GitHub: https://github.com/kagent-dev/kagent
Doc: https://kagent.dev
kagent introduces a fundamentally different model for AI agent deployment. Rather than treating agents as app-layer concerns, it treats them as infrastructure. Every agent is a K8s Custom Resource. Every tool server is a K8s resource. Every model config is a K8s resource.
GitHub: https://github.com/Project-HAMi/HAMi
Doc: https://project-hami.io
HAMi is a K8s-native GPU virtualization solution.
Originally known as k8s-vGPU-scheduler, it was accepted into the CNCF sandbox in late 2024.
HAMi works by:
- Installing a custom scheduler (
hami-scheduler) that intercepts GPU resource requests - Deploying a device plugin (
hami-device-plugin) on each GPU node that registers virtual GPU resources with the kubelet - Injecting a
libvgpu.solibrary into GPU workload containers via volume mounts that intercepts CUDA API calls to enforce memory and compute limits
A single physical GPU appears as multiple virtual GPUs to K8s, with hard isolation enforced at the CUDA driver level.
In my test:
- 1x NVIDIA L40S (46GB VRAM) → 10x virtual GPUs (
nvidia.com/gpu: 10) - Each virtual GPU can be allocated specific memory limits via annotations
- HAMi scheduler enforces these limits. Requesting more than available results in
Pendingpods
For this article, I have used a single VM instance (tarantula-instance) with:
GPU: 1x NVIDIA L40S (46GB GDDR6 VRAM)
CPU: 8 vCPUs (Intel Ice Lake)
RAM: 32GB
Disk: 80GB SSD
OS: Ubuntu 24.04 LTS for NVIDIA GPUs (CUDA 13)
Region: eu-north1
Nebius Token Factory provides an OpenAI-compatible API endpoint at:
https://api.studio.nebius.ai/v1
Models available at time of testing (selected):
| Model ID | Type |
|---|---|
meta-llama/Llama-3.3-70B-Instruct |
Chat/Instruction |
meta-llama/Meta-Llama-3.1-8B-Instruct |
Chat/Instruction (fast) |
Qwen/Qwen3-32B |
Chat/Instruction |
Qwen/Qwen3-235B-A22B-Instruct-2507 |
Large MoE |
deepseek-ai/DeepSeek-V3.2-fast |
Chat/Code |
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 |
Large |
openai/gpt-oss-120b |
GPT OSS |
I used meta-llama/Llama-3.3-70B-Instruct for all kagent tests.
k3s is a CNCF-certified, lightweight K8s distribution ideal for single-node and edge deployments. It bundles everything into a single binary and starts in seconds. For my test, k3s gives us a full K8s cluster on a single VM without the overhead of kubeadm or kind.
Before installing anything, I verified the VM's hardware:
ubuntu@nebius-tarantula:~$ uname -a
Linux nebius-tarantula 6.11.0-1016-nvidia #16-Ubuntu SMP PREEMPT_DYNAMIC Sun Sep 21 17:06:33 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@nebius-tarantula:~$ nvidia-smi
Sat Apr 18 06:29:18 2026
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 580.126.09 Driver Version: 580.126.09 CUDA Version: 13.0 |
+-----------------------------------------------------------------------------------------+
| 0 NVIDIA L40S On | 00000000:8D:00.0 Off | 0 |
| N/A 31C P8 34W / 325W | 0MiB / 46068MiB | 0% Default |
+-----------------------------------------------------------------------------------------+
ubuntu@nebius-tarantula:~$ free -h
total used free shared buff/cache available
Mem: 31Gi 3.7Gi 25Gi 115Mi 2.2Gi 27GiWhat this tells us:
- The NVIDIA driver (580.126.09) is pre-installed on the Nebius GPU image
- The L40S has 46,068 MiB (approximately 45GB) of VRAM available
- 27GB RAM is available for K8s workloads
- The kernel is an NVIDIA-optimized build
curl -sfL https://get.k3s.io | sh -Output:
[INFO] Finding release for channel stable
[INFO] Using v1.34.6+k3s1 as release
[INFO] Downloading binary https://github.com/k3s-io/k3s/releases/download/v1.34.6%2Bk3s1/k3s
[INFO] Installing k3s to /usr/local/bin/k3s
[INFO] Creating /usr/local/bin/kubectl symlink to k3s
[INFO] Creating /usr/local/bin/crictl symlink to k3s
[INFO] systemd: Creating service file /etc/systemd/system/k3s.service
[INFO] systemd: Enabling k3s unit
[INFO] systemd: Starting k3s
k3s installs as a systemd service and starts automatically.
It also creates symlinks for kubectl and crictl.
k3s stores its kubeconfig at /etc/rancher/k3s/k3s.yaml, which is owned by root.
I copy it to the user's home directory:
mkdir -p ~/.kube
sudo cp /etc/rancher/k3s/k3s.yaml ~/.kube/config
sudo chown ubuntu:ubuntu ~/.kube/config
export KUBECONFIG=~/.kube/configubuntu@nebius-tarantula:~$ kgn
NAME STATUS ROLES AGE VERSION
nebius-tarantula Ready control-plane 48s v1.34.6+k3s1curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bashOutput:
Downloading https://get.helm.sh/helm-v3.20.2-linux-amd64.tar.gz
helm installed into /usr/local/bin/helm
The package manager for K8s. I'll use it to install both kagent and HAMi.
k create namespace kagentkagent uses a split Helm chart strategy:
kagent-crds: Installs only the Custom Resource Definitions. CRDs must be installed before the main chart because the main chart creates instances of those CRDs.kagent: Installs the actual kagent components (controller, engine, UI, agents, tool servers).
This separation is a Helm best practice: if you upgrade kagent, the CRDs can be updated independently without risk of accidentally deleting custom resources.
helm install kagent-crds \
oci://ghcr.io/kagent-dev/kagent/helm/kagent-crds \
--namespace kagentOutput:
Pulled: ghcr.io/kagent-dev/kagent/helm/kagent-crds:0.8.6
Digest: sha256:768aec3722e15470af7d976c769a465dc164dbef9ccca6eb1912a7dfb2747db1
NAME: kagent-crds
STATUS: deployed
REVISION: 1
This is where my setup differs from every other kagent tutorial. Instead of providing an OpenAI API key, I provide:
- A Nebius Token Factory API key
- The Nebius Token Factory base URL
- A Llama model available on Nebius
export NEBIUS_API_KEY="v1.fdfdfdfdfdfdfdfdfdf..." # Your Nebius Token Factory API key
helm install kagent \
oci://ghcr.io/kagent-dev/kagent/helm/kagent \
--namespace kagent \
--set providers.default=openAI \
--set providers.openAI.apiKey=$NEBIUS_API_KEY \
--set providers.openAI.endpoint=https://api.studio.nebius.ai/v1 \
--set providers.openAI.model=meta-llama/Llama-3.3-70B-InstructWhy providers.default=openAI?
kagent uses openAI as the provider type because Nebius Token Factory implements the OpenAI-compatible API specification.
The provider type determines which API client code to use.
Since Nebius implements the same /v1/chat/completions endpoint as OpenAI, the openAI provider works perfectly.
Output (abbreviated):
Pulled: ghcr.io/kagent-dev/kagent/helm/kagent:0.8.6
NAME: kagent
STATUS: deployed
REVISION: 1
DEPLOYED COMPONENTS:
- Controller: kagent-controller
- Engine: kagent-engine (AI agent runtime)
- UI: kagent-ui (web interface)
- KMCP Controller: kagent-kmcp-manager-controller
ENABLED AGENTS:
- argo-rollouts-agent
- cilium-debug-agent
- cilium-manager-agent
- cilium-policy-agent
- helm-agent
- istio-agent
- k8s-agent
- kgateway-agent
- observability-agent
- promql-agent
10 agents are deployed by default. Each is a specialized AI agent with its own system prompt, tool set, and deployment. This is one of kagent's strongest features: a prod-ready agent catalog out of the box.
After installation, I discovered that the providers.openAI.endpoint Helm value doesn't map to the openAI.baseUrl field in the ModelConfig CRD.
I needed to patch it directly:
# First, verify the API key is in the secret
k create secret generic kagent-openai \
--from-literal=OPENAI_API_KEY=$NEBIUS_API_KEY \
-n kagent --dry-run=client -o yaml | k apply -f -
# Then patch the ModelConfig to use Nebius endpoint
k patch modelconfig default-model-config --type=merge -p '{
"spec": {
"openAI": {
"baseUrl": "https://api.studio.nebius.ai/v1"
},
"model": "meta-llama/Llama-3.3-70B-Instruct"
}
}'The Helm --set providers.openAI.endpoint flag creates the secret correctly but doesn't configure the baseUrl in the ModelConfig.
Always verify with k get modelconfig default-model-config -o yaml after installation.
ubuntu@nebius-tarantula:~$ kgp -n kagent
NAME READY STATUS RESTARTS AGE
argo-rollouts-conversion-agent-7b77468d59-x2lb7 1/1 Running 0 2m18s
cilium-debug-agent-5b786d455d-jq29s 1/1 Running 0 2m18s
cilium-manager-agent-ff5cd9578-pxwtf 1/1 Running 0 2m18s
cilium-policy-agent-8448976f56-jvz9q 1/1 Running 0 2m18s
helm-agent-57479bd6c6-jvfm2 1/1 Running 0 2m18s
istio-agent-5d5bfc8ff-7fbdz 1/1 Running 0 2m18s
k8s-agent-f86f9f96f-7lv7r 1/1 Running 0 2m18s
kagent-controller-7f8c569d89-p6br9 1/1 Running 0 3m30s
kagent-grafana-mcp-696d674dff-2vlvf 1/1 Running 0 3m30s
kagent-kmcp-controller-manager-777746db7c-xxkpg 1/1 Running 0 3m30s
kagent-postgresql-5d75cf4d86-lnd5f 1/1 Running 0 3m30s
kagent-querydoc-74d9f7bbdc-gk8lg 1/1 Running 0 3m30s
kagent-tools-77c6f575c8-vd2s7 1/1 Running 0 3m30s
kagent-ui-7fcd65ff6d-hcdph 1/1 Running 0 3m30s
kgateway-agent-6674fd5f79-8knkx 1/1 Running 0 2m17s
observability-agent-79f8bfb465-8sn4p 1/1 Running 0 2m18s
promql-agent-68bb8c45d6-7hcwt 1/1 Running 0 2m17s17 pods, all Running:
- 10 agent pods (one per agent type)
- kagent-controller (the main K8s controller)
- kagent-postgresql (session/conversation storage)
- kagent-tools (the MCP tool server)
- kagent-ui (the web dashboard)
- kagent-grafana-mcp (Grafana MCP server)
- kagent-querydoc (doc query MCP server)
- kagent-kmcp-controller-manager (MCP server lifecycle manager)
- Log in to studio.nebius.ai or tokenfactory.nebius.com
- Navigate to API Keys in the left sidebar
- Click Create API Key
- Set an expiration date (I used 5 years for this test)
- Copy the key securely. Treat it like a password.
The key follows a Nebius-specific format but is used identically to an OpenAI API key in API calls.
To see which models are available on your Nebius Token Factory account:
curl https://api.studio.nebius.ai/v1/models \
-H "Authorization: Bearer $NEBIUS_API_KEY" | python3 -m json.tool | grep '"id"'Output (selected):
"id": "meta-llama/Meta-Llama-3.1-8B-Instruct",
"id": "nvidia/Llama-3_1-Nemotron-Ultra-253B-v1",
"id": "meta-llama/Llama-3.3-70B-Instruct",
"id": "Qwen/Qwen3-235B-A22B-Instruct-2507",
"id": "Qwen/Qwen3-32B",
"id": "deepseek-ai/DeepSeek-V3.2-fast",
"id": "openai/gpt-oss-120b",
"id": "MiniMaxAI/MiniMax-M2.5-fast",Why I chose meta-llama/Llama-3.3-70B-Instruct: This model has strong instruction-following capabilities and handles structured tool-use (function calling) reliably.
For K8s operations where the agent must decide which kubectl-equivalent tool to call with specific parameters, a 70B model provides the reasoning depth needed.
After patching, the ModelConfig should look like this:
k get modelconfig default-model-config -o yamlapiVersion: kagent.dev/v1alpha2
kind: ModelConfig
metadata:
name: default-model-config
namespace: kagent
spec:
apiKeySecret: kagent-openai
apiKeySecretKey: OPENAI_API_KEY
model: meta-llama/Llama-3.3-70B-Instruct
openAI:
baseUrl: https://api.studio.nebius.ai/v1
provider: OpenAI
status:
conditions:
- message: Model config accepted
reason: ModelConfigReconciled
status: "True"
type: AcceptedThe provider: OpenAI field doesn't mean I'm using OpenAI.
It means I'm using the OpenAI-compatible API protocol.
The actual requests go to https://api.studio.nebius.ai/v1 with a Nebius API key.
This is the power of the OpenAI-compatible API standard.
Any provider implementing it can be swapped in.
k get crds | grep kagentagents.kagent.dev 2026-04-18T06:32:21Z
mcpservers.kagent.dev 2026-04-18T06:32:21Z
memories.kagent.dev 2026-04-18T06:32:21Z
modelconfigs.kagent.dev 2026-04-18T06:32:21Z
modelproviderconfigs.kagent.dev 2026-04-18T06:32:21Z
remotemcpservers.kagent.dev 2026-04-18T06:32:21Z
toolservers.kagent.dev 2026-04-18T06:32:21Z
7 CRDs. Each one represents a different concern in the kagent architecture. Let's explore each.
An Agent resource defines a complete AI agent. Let's look at the k8s-agent:
k describe agent k8s-agentThe key fields are:
spec.declarative.modelConfig: References the ModelConfig to use (default-model-config)spec.declarative.systemMessage: The full system prompt that defines agent behaviorspec.declarative.tools: List of MCP tools the agent can accessspec.declarative.runtime:pythonorgo(the agent runtime engine)spec.declarative.a2aConfig.skills: Agent capabilities advertised via the A2A protocolstatus.conditions: Current state -AcceptedandReady
The k8s-agent's system prompt defines it as "KubeAssist" with access to 19 tools covering everything from k8s_get_resources to k8s_apply_manifest to k8s_execute_command.
k get remotemcpserversNAME PROTOCOL URL ACCEPTED
kagent-grafana-mcp STREAMABLE_HTTP http://kagent-grafana-mcp.kagent:8000/mcp True
kagent-tool-server STREAMABLE_HTTP http://kagent-tools.kagent:8084/mcp True
Two MCP servers are registered:
kagent-tool-server: The main K8s tool server with 175 tools across K8s, Argo, Cilium, Helm, Istio, and morekagent-grafana-mcp: A Grafana integration server for metrics and dashboards
By using MCP, kagent tools can be consumed by any MCP-compatible AI system, not just kagent.
k get agents,modelconfigs,toolservers,memories,remotemcpserversNAME TYPE RUNTIME READY ACCEPTED
agent.kagent.dev/argo-rollouts-conversion-agent Declarative python True True
agent.kagent.dev/cilium-debug-agent Declarative python True True
agent.kagent.dev/cilium-manager-agent Declarative python True True
agent.kagent.dev/cilium-policy-agent Declarative python True True
agent.kagent.dev/helm-agent Declarative python True True
agent.kagent.dev/istio-agent Declarative python True True
agent.kagent.dev/k8s-agent Declarative python True True
agent.kagent.dev/kgateway-agent Declarative python True True
agent.kagent.dev/observability-agent Declarative python True True
agent.kagent.dev/promql-agent Declarative python True True
NAME PROVIDER MODEL
modelconfig.kagent.dev/default-model-config OpenAI meta-llama/Llama-3.3-70B-Instruct
NAME PROTOCOL URL ACCEPTED
remotemcpserver.kagent.dev/kagent-grafana-mcp STREAMABLE_HTTP http://kagent-grafana-mcp.kagent:8000/mcp True
remotemcpserver.kagent.dev/kagent-tool-server STREAMABLE_HTTP http://kagent-tools.kagent:8084/mcp True
All 10 agents are READY=True and ACCEPTED=True.
The controller has reconciled every resource to its desired state.
Before installing HAMi, I verified that the NVIDIA Container Toolkit was working correctly:
nvidia-container-cli --versioncli-version: 1.18.2
lib-version: 1.18.2
build date: 2026-01-21T10:48+00:00
nvidia-container-cli infoNVRM version: 580.126.09
CUDA version: 13.0
Device Index: 0
Device Minor: 0
Model: NVIDIA L40S
Brand: Nvidia
GPU UUID: GPU-fdfdf-fdfd-4545-4545-gfgfg
Bus Location: 00000000:8d:00.0
Architecture: 8.9
The NVIDIA container runtime is installed and can communicate with the GPU driver.
The GPU UUID (GPU-fdfdf-fdfd-4545-4545-gfgfg) will appear throughout HAMi's allocation logs, providing a consistent identifier for the physical GPU.
k3s uses its own embedded containerd instance, not the system containerd. This is crucial for HAMi integration. k3s's containerd config lives at:
sudo cat /var/lib/rancher/k3s/agent/etc/containerd/config.toml# File generated by k3s. DO NOT EDIT. Use config-v3.toml.tmpl instead.
version = 3
imports = ["/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d/*.toml"]
root = "/var/lib/rancher/k3s/agent/containerd"
state = "/run/k3s/containerd"
[grpc]
address = "/run/k3s/containerd/containerd.sock"Two observations:
- k3s uses version 3 containerd config (newer format with
plugins.'io.containerd.cri.v1.runtime') - k3s uses a different containerd socket (
/run/k3s/containerd/containerd.sock) than system containerd
The config also already has an nvidia runtime section:
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia']
runtime_type = "io.containerd.runc.v2"
[plugins.'io.containerd.cri.v1.runtime'.containerd.runtimes.'nvidia'.options]
BinaryName = "/usr/bin/nvidia-container-runtime"
SystemdCgroup = trueThis means k3s already knows about the nvidia runtime. I just need to ensure HAMi can use it.
The default nvidia-ctk runtime configure --runtime=containerd command creates a config file at /etc/containerd/conf.d/99-nvidia.toml using the version 1 containerd format.
This conflicts with k3s's version 3 format.
The fix is to remove this conflicting config:
sudo rm /etc/containerd/conf.d/99-nvidia.toml
sudo systemctl restart containerd
sudo systemctl restart k3sThis allows k3s to use its own built-in nvidia runtime config without interference.
helm repo add hami-charts https://project-hami.github.io/HAMi/
helm repo updatek label node nebius-tarantula gpu=onWhy label the node? HAMi's device plugin DaemonSet uses a node selector for gpu=on.
Without this label, the DaemonSet won't schedule on the node and HAMi won't see the GPU.
helm install hami hami-charts/hami \
--set scheduler.kubeScheduler.imageTag=v1.34.6 \
--set devicePlugin.nvidiaDriverRoot=/usr \
-n kube-systemscheduler.kubeScheduler.imageTag=v1.34.6: Must match the k3s K8s version exactly. HAMi's scheduler extends the default K8s scheduler, so version compatibility is critical.devicePlugin.nvidiaDriverRoot=/usr: Tells HAMi where to find the NVIDIA libraries on the host. On Ubuntu with standard NVIDIA drivers, libraries are in/usr/liband binaries in/usr/bin, so/usris the correct root.
kgp -n kube-system | grep hamihami-device-plugin-cw6t5 2/2 Running 0 2m58s
hami-scheduler-84f8888fc5-fqrp8 2/2 Running 0 3m40s
- hami-device-plugin: Contains
device-plugin(registers vGPU resources) andvgpu-monitor(metrics collection) - hami-scheduler: Contains the HAMi scheduler extender and a webhook for pod mutation
Understanding exactly how HAMi integrates with containerd and K8s is key to troubleshooting and prod deployments.
When a pod requests nvidia.com/gpu: 1, the following happens:
- Admission webhook intercepts the pod creation request and sets
schedulerName: hami-scheduler - hami-scheduler evaluates available virtual GPU resources across nodes
- HAMi adds allocation annotations to the pod:
hami.io/bind-phase: success hami.io/vgpu-devices-allocated: GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7,NVIDIA,46068,100:; hami.io/vgpu-node: nebius-tarantula - hami-device-plugin receives the Allocate call from kubelet and injects:
NVIDIA_VISIBLE_DEVICESenv variable (the physical GPU UUID)CUDA_DEVICE_MEMORY_LIMIT_0(memory limit in MiB)CUDA_DEVICE_SM_LIMIT(compute core percentage)- Volume mounts for
libvgpu.so(the CUDA intercept library) /etc/ld.so.preloadpointing tolibvgpu.so
The HAMi device plugin log reveals exactly what gets injected:
Allocate Response [&ContainerAllocateResponse{
Envs: map[string]string{
CUDA_DEVICE_MEMORY_LIMIT_0: 46068m,
CUDA_DEVICE_MEMORY_SHARED_CACHE: /usr/local/vgpu/4fgff375-fgfg-fgfg-fgfg-1e78b8cd030f.cache,
CUDA_DEVICE_SM_LIMIT: 100,
NVIDIA_VISIBLE_DEVICES: GPU-745dfdfd-dfdf-dfdf-dfdf-dfdfdfdfdfdf,
},
Mounts: []*Mount{
&Mount{
ContainerPath: /usr/local/vgpu/libvgpu.so,
HostPath: /usr/local/vgpu/libvgpu.so.v2.8.1,
ReadOnly: true,
},
&Mount{
ContainerPath: /etc/ld.so.preload,
HostPath: /usr/local/vgpu/ld.so.preload,
ReadOnly: true,
},
},
}]
/etc/ld.so.preload causes the Linux dynamic linker to load libvgpu.so before any other library in every process. libvgpu.so intercepts CUDA API calls between the CUDA runtime (libcudart.so) and the CUDA driver (libcuda.so), enforcing memory and compute limits transparently.
kgn nebius-tarantula -o jsonpath='{.status.allocatable}' | python3 -m json.tool{
"cpu": "8",
"ephemeral-storage": "77982375671",
"hugepages-1Gi": "0",
"hugepages-2Mi": "0",
"memory": "32865164Ki",
"nvidia.com/gpu": "10",
"pods": "110"
}This is the proof that HAMi is working.
Before HAMi, nvidia.com/gpu was absent from the allocatable resources.
Now it shows 10.
HAMi has virtualized the single physical L40S into 10 virtual GPUs.
The node annotation confirms the GPU registration:
k describe node nebius-tarantula | grep "hami.io/node-nvidia-register" -A1hami.io/node-nvidia-register:
[{"id":"GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7","count":10,"devmem":46068,"devcore":100,"type":"NVIDIA L40S","mode":"hami-core","health":true}]
Breaking down this JSON:
id: The physical GPU UUIDcount: 10 virtual GPUs registereddevmem: 46,068 MiB total VRAMdevcore: 100% of GPU compute cores availablemode:hami-core, HAMi's software-based CUDA interception modehealth: true - GPU is healthy
# Start port-forward for the controller API
k port-forward svc/kagent-controller 8083:8083 -n kagent --address 0.0.0.0 &
# Invoke the k8s-agent
kagent invoke --agent k8s-agent \
--task "What is the status of this cluster? List all running pods." \
--streamThe agent's response stream shows the complete reasoning process:
- Tool call decision: The LLM decides to call
k8s_get_resourceswith parameters{"all_namespaces":"true","output":"wide","resource_type":"pods"} - Tool execution: The MCP server executes the equivalent of
kgp -A -o wide - All 25 running pods with their IPs, nodes, and status
- Synthesis: The LLM summarizes: "The cluster has 25 running pods across different namespaces, including kagent, and kube-system."
Token usage: 6,131 total tokens (4,655 prompt + 1,476 completion)
kagent invoke --agent k8s-agent \
--task "Check if there are any pods with high restart counts or in error state. Give me a health summary." \
--streamAgent response:
Based on the provided output, there are no pods with high restart counts or in error state.
Health summary:
- Total pods: 24
- Running pods: 24
- Pods with restarts: 0
- Error pods: 0
The cluster appears to be in a good state.
Before HAMi was installed, I asked the agent about GPU availability:
kagent invoke --agent k8s-agent \
--task "Describe the node nebius-tarantula in detail. What GPUs are available?" \
--streamAgent response (before HAMi):
"The node 'nebius-tarantula' does not have any GPUs available. The output of the describe node command shows no indication of any GPU devices."
This confirms that without HAMi (or the NVIDIA device plugin), K8s has no visibility into the GPU. It's just a CPU node from K8s' perspective.
kagent invoke --agent k8s-agent \
--task "Check the node nebius-tarantula allocatable resources. How many GPUs are available and what type?" \
--streamAgent response (after HAMi):
"The node 'nebius-tarantula' has 10 GPUs available, and they are of type 'NVIDIA L40S'."
The agent correctly reads the HAMi-registered virtual GPU count and identifies the GPU model from the node annotations.
One of the most interesting tests: asking the agent to describe itself using the K8s API.
kagent invoke --agent k8s-agent \
--task "Describe yourself: what CRD defines you, what tools do you have, and what is your modelconfig? Use kubectl to inspect your own agent resource." \
--streamThe agent:
- Tried
k8s_describe_resourceforagent k8s_agent -n default, failed (wrong namespace and name format), - Tried
k8s_get_resourcesfor agents. Found all agents with their namespaces, - Tried
k8s_describe_resourceforagent k8s-agent -n kagent, succeeded, - Read its own CRD definition including system prompt, tool list, and model config,
- Synthesized a response explaining its own architecture.
Final response:
"The K8s AI Agent System Prompt is defined by the CustomResourceDefinition (CRD) named 'Agent' in the 'kagent.dev' API group. The agent has access to various tools for diagnosing and solving K8s issues, including informational tools like GetResources, DescribeResource, and GetEvents, as well as modification tools like CreateResource, ApplyManifest, and PatchResource. The agent's model config is set to 'default-model-config'."
This demonstrates genuine meta-awareness: the agent can read and understand its own K8s definition.
kagent invoke --agent k8s-agent \
--task "Create a simple nginx deployment with 2 replicas in the default namespace. Name it 'test-nginx'." \
--streamThe agent generated and applied this manifest:
apiVersion: apps/v1
kind: Deployment
metadata:
name: test-nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:latest
ports:
- containerPort: 80The agent deployed to the kagent namespace (its own namespace) rather than default.
This is because the agent pod runs in kagent and the tool server's kubectl context defaults to that namespace.
In prod, you would configure the tool server's kubeconfig with appropriate namespace defaults.
kagent invoke --agent k8s-agent \
--task "Delete the deployment named test-nginx in the kagent namespace." \
--streamThe complete create → verify → delete cycle demonstrates that kagent agents can manage the full K8s resource lifecycle through natural language instructions.
kagent invoke --agent helm-agent \
--task "List all helm releases installed in this cluster with their status and namespaces." \
--streamThe helm-agent first tried with --namespace all flag which failed (invalid Helm flag), then retried without the namespace flag:
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
kagent kagent 1 2026-04-18 07:27:17.040520124 +0000 UTC deployed kagent-0.8.6
kagent-crds kagent 1 2026-04-18 07:27:12.176735317 +0000 UTC deployed kagent-crds-0.8.6
The agent automatically retried with different parameters when the first attempt failed. This retry-and-adapt behavior is emergent from the LLM's reasoning. It wasn't explicitly programmed. The agent recognized the error message and adjusted its approach.
Beyond the built-in agents, kagent allows creating custom agents declaratively. I created an SRE orchestrator agent that can delegate to both the k8s-agent and promql-agent:
apiVersion: kagent.dev/v1alpha2
kind: Agent
metadata:
name: sre-orchestrator
namespace: kagent
spec:
type: Declarative
description: "SRE Orchestrator that can delegate to k8s-agent and promql-agent"
declarative:
modelConfig: default-model-config
systemMessage: |
You are an SRE orchestrator. You have access to K8s tools and can
delegate to specialized agents. When asked about metrics, use promql-agent.
When asked about cluster state, use k8s tools directly.
tools:
- type: McpServer
mcpServer:
name: kagent-tool-server
kind: RemoteMCPServer
apiGroup: kagent.dev
toolNames:
- k8s_get_resources
- k8s_describe_resource
- k8s_get_pod_logs
- k8s_get_events
- type: Agent
agent:
name: promql-agentk apply -f sre-orchestrator.yamlThe kagent controller immediately:
- Detected the new Agent resource
- Created a Deployment for the sre-orchestrator pod
- Set up the A2A endpoint at
http://sre-orchestrator.kagent:8080 - Registered promql-agent as a callable tool
k get agent sre-orchestratorNAME TYPE RUNTIME READY ACCEPTED
sre-orchestrator Declarative python True True
Ready in under 2 minutes.
The Agent-to-Agent protocol is one of kagent's most powerful features. It allows agents to call other agents as tools, enabling complex multi-agent workflows.
When an agent is listed as a type: Agent tool, kagent:
- Creates an MCP server endpoint for that agent at
/.well-known/agent-card.json - Exposes the agent's skills as callable tools
- Routes inter-agent calls through the kagent controller
kagent invoke --agent sre-orchestrator \
--task "Check the status of all pods in kagent namespace, then ask promql-agent for the CPU usage query for these pods." \
--streamWatching the streaming output reveals the multi-agent workflow:
Step 1 > sre-orchestrator calls k8s tool directly:
{"name": "k8s_get_resources", "args": {"namespace": "kagent", "resource_type": "pod"}}Step 2 > sre-orchestrator delegates to promql-agent:
{"name": "kagent__NS__promql_agent", "args": {"request": "CPU usage query for pods in kagent namespace"}}Step 3 > promql-agent receives the request in its own session:
{"subagent_session_id": "2ab4bf39-379d-404c-8fda-0e6c9e28c5da"}What the session ID tells us:
The subagent_session_id is different from the parent session ID.
Each agent runs in its own isolated session with its own context window.
The sre-orchestrator doesn't see promql-agent's internal reasoning, only its final response.
Token accounting:
- sre-orchestrator used: 2,438 tokens
- promql-agent sub-session used: 1,869 tokens
- Total: 4,307 tokens for the multi-agent interaction
This demonstrates true A2A delegation: the orchestrator doesn't need to know how to write PromQL queries. It just delegates to a specialist.
This is the centerpiece of the entire test: using kagent AI agents to manage HAMi GPU-virtualized workloads.
HAMi-annotated node details returned by the agent
kagent invoke --agent k8s-agent \
--task "Create a pod named 'gpu-test-1' in default namespace that requests 1 nvidia.com/gpu with 20000 GPU memory limit using HAMi annotations (nvidia.com/gpumem: 20000). Use nvidia/cuda:12.3.0-base-ubuntu22.04 image and run sleep infinity." \
--streamThe agent generated the correct HAMi-annotated manifest:
apiVersion: v1
kind: Pod
metadata:
name: gpu-test-1
namespace: default
annotations:
nvidia.com/gpumem: "20000"
spec:
containers:
- name: gpu-test
image: nvidia/cuda:12.3.0-base-ubuntu22.04
command: ["sleep", "infinity"]
resources:
limits:
nvidia.com/gpu: 1HAMi immediately picked up the pod and annotated it with allocation details:
hami.io/bind-phase: success
hami.io/vgpu-devices-allocated: GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7,NVIDIA,46068,100:;
nvidia.com/gpumem: 20000
I then created a second pod requesting 15,000 MiB of GPU memory:
apiVersion: v1
kind: Pod
metadata:
name: gpu-test-3
namespace: kagent
annotations:
nvidia.com/gpumem: "15000"
spec:
containers:
- name: gpu-test-3
image: ubuntu:22.04
command: ["sleep", "infinity"]
resources:
limits:
nvidia.com/gpu: 1Both pods were allocated to the same physical GPU (GPU-745c1c2c).
HAMi's scheduler assigned them successfully because:
- gpu-test-2: 20,000 MiB requested
- gpu-test-3: 15,000 MiB requested
- Total: 35,000 MiB, well within the 46,068 MiB physical limit
kagent invoke --agent k8s-agent \
--task "Look at pods gpu-test-2 and gpu-test-3 in kagent namespace. Check their HAMi annotations and explain how HAMi is sharing the GPU between these two pods. What GPU UUID is being used?" \
--streamAgent response:
"The HAMi annotations are being used to manage the GPU resources between the two pods. The 'hami.io/vgpu-devices-allocated: GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7,NVIDIA,46068,100:;' annotation shows that the GPU device with UUID 'GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7' was allocated to the pod.
The 'nvidia.com/gpumem: 20000' annotation in the gpu-test-2 pod and the 'nvidia.com/gpumem: 15000' annotation in the gpu-test-3 pod indicate that the GPU memory is being shared between the two pods. In summary, the GPU device with UUID 'GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7' is being shared between the two pods."
The agent correctly identified the GPU UUID, understood the memory allocation annotations, and explained the sharing mechanism. All by reading K8s resource YAML through the MCP tool interface.
I created a pod requesting 11 virtual GPUs, more than the 10 available:
apiVersion: v1
kind: Pod
metadata:
name: gpu-overcommit-test
namespace: kagent
annotations:
nvidia.com/gpumem: "5000"
spec:
containers:
- name: gpu-overcommit
image: ubuntu:22.04
command: ["sleep", "infinity"]
resources:
limits:
nvidia.com/gpu: 11k apply -f gpu-overcommit-test.yaml
k describe pod gpu-overcommit-test | tail -10Events:
Warning FailedScheduling 3m10s hami-scheduler 0/1 nodes are available: 1 NodeUnfitPod. no new claims to deallocate, preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Warning FailedScheduling 28s (x2 over 3m10s) hami-scheduler 0/1 nodes are available: 1 NodeUnfitPod. no new claims to deallocate, preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Warning FilteringFailed 16s (x4 over 3m10s) hami-scheduler no available node, 1 nodes do not meet
The pod remains Pending indefinitely.
HAMi's scheduler correctly refuses to schedule a pod that requests more virtual GPUs than are available.
The error message NodeUnfitPod and FilteringFailed indicate that the HAMi scheduler's filtering phase rejected all available nodes.
This is precisely how a prod GPU virtualization system should behave: hard enforcement of resource limits, not best-effort.
HAMi exposes GPU metrics in Prometheus format at the monitoring service:
k get svc -n kube-system | grep hamihami-device-plugin-monitor NodePort 10.43.107.36 <none> 31992:31992/TCP 64m
hami-scheduler NodePort 10.43.242.176 <none> 443:31998/TCP,31993:31993/TCP 64m
curl http://localhost:31992/metrics# HELP HostCoreUtilization GPU core utilization
# TYPE HostCoreUtilization gauge
HostCoreUtilization{deviceidx="0",devicetype="NVIDIA-NVIDIA L40S",deviceuuid="GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7",zone="vGPU"} 0
# HELP HostGPUMemoryUsage GPU device memory usage
# TYPE HostGPUMemoryUsage gauge
HostGPUMemoryUsage{deviceidx="0",devicetype="NVIDIA-NVIDIA L40S",deviceuuid="GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7",zone="vGPU"} 6.40090112e+08
# HELP hami_build_info hami build metadata
# TYPE hami_build_info gauge
hami_build_info{build_date="20260417-04:25:02",compiler="gc",go_version="go1.25.5",platform="linux/amd64",revision="713ad31",version="v2.8.1"} 1
Reading these metrics:
HostCoreUtilization: Currently 0%, my test pods are runningsleep infinity, not using GPU computeHostGPUMemoryUsage: 640,090,112 bytes (~610 MiB), this is the CUDA runtime overhead, not workload memoryhami_build_info: Confirms HAMi v2.8.1 is running, built with Go 1.25.5
I pointed the k8s-agent at the HAMi monitoring service and asked it to analyze the available metrics:
kagent invoke --agent k8s-agent \
--task "The HAMi GPU monitor is running at http://10.43.107.36:31992/metrics. Check the service 'hami-device-plugin-monitor' in kube-system namespace and tell me what GPU metrics are available." \
--streamThe agent:
- Described the
hami-device-plugin-monitorservice - Checked service connectivity (got 404 at root, correct, metrics are at
/metrics) - Listed all hami pods in kube-system
- Retrieved device plugin logs and analyzed the GPU allocation events
From the device plugin logs, the agent extracted:
CUDA_DEVICE_MEMORY_LIMIT_0: 46068m
CUDA_DEVICE_MEMORY_SHARED_CACHE: /usr/local/vgpu/[uuid].cache
CUDA_DEVICE_SM_LIMIT: 100
NVIDIA_VISIBLE_DEVICES: GPU-745c1c2c-56d7-168a-ca56-97e3dd8e0db7
These are the env variables that HAMi injects into each GPU-using container. The agent understood and explained their purpose.
The kagent CLI (version 0.9.0-beta5) provides several resource management commands beyond invoke:
kagent get agent+----+---------------------------------------+----------------------+------------------+----------+
| # | NAME | CREATED | DEPLOYMENT_READY | ACCEPTED |
+----+---------------------------------------+----------------------+------------------+----------+
| 1 | kagent/kgateway-agent | 2026-04-18T07:27:18Z | true | true |
| 2 | kagent/observability-agent | 2026-04-18T07:27:18Z | true | true |
| 3 | kagent/promql-agent | 2026-04-18T07:27:18Z | true | true |
| 4 | kagent/sre-orchestrator | 2026-04-18T07:45:56Z | true | true |
| 5 | kagent/cilium-manager-agent | 2026-04-18T07:27:18Z | true | true |
| 6 | kagent/cilium-policy-agent | 2026-04-18T07:27:18Z | true | true |
| 7 | kagent/istio-agent | 2026-04-18T07:27:18Z | true | true |
| 8 | kagent/k8s-agent | 2026-04-18T07:27:18Z | true | true |
| 9 | kagent/argo-rollouts-conversion-agent | 2026-04-18T07:27:18Z | true | true |
| 10 | kagent/cilium-debug-agent | 2026-04-18T07:27:18Z | true | true |
| 11 | kagent/helm-agent | 2026-04-18T07:27:18Z | true | true |
+----+---------------------------------------+----------------------+------------------+----------+
Note that sre-orchestrator (row 4) shows a later creation timestamp.
This is my custom agent, created during the session.
kagent get session+----+------------------------------------------+-------------------------+------------------------------+----------------------+
| # | ID | NAME | AGENT | CREATED |
+----+------------------------------------------+-------------------------+------------------------------+----------------------+
| 1 | ctx-31dcebe5-... | List all running pod... | kagent__NS__k8s_agent | 2026-04-18T07:34:56Z |
| 2 | 94aca692-... | List all helm releas... | kagent__NS__helm_agent | 2026-04-18T07:39:46Z |
| 3 | 84249c1c-... | Create a simple ngin... | kagent__NS__k8s_agent | 2026-04-18T07:40:50Z |
| 4 | a862ac3e-... | Delete the deploymen... | kagent__NS__k8s_agent | 2026-04-18T07:42:45Z |
| 5 | aaa28834-... | What PromQL query wo... | kagent__NS__promql_agent | 2026-04-18T07:44:18Z |
| 6 | 6e4e0146-... | Check the status of ... | kagent__NS__sre_orchestrator | 2026-04-18T07:46:36Z |
| 7 | 2ab4bf39-... | CPU usage query for ... | kagent__NS__promql_agent | 2026-04-18T07:46:40Z |
| 8 | 1368a699-... | Describe yourself: w... | kagent__NS__k8s_agent | 2026-04-18T07:48:03Z |
+----+------------------------------------------+-------------------------+------------------------------+----------------------+
Critical observation about session 6 and 7: Session 6 is the sre-orchestrator session. Session 7 is a promql-agent session created at nearly the same time.
This is the A2A sub-session.
The orchestrator delegated to promql-agent, creating a separate session stored in PostgreSQL.
The session history provides a complete audit trail of all agent interactions.
kagent get toolThis returns all 175 tools organized by category:
- Argo (8 tools): rollout management, plugin verification
- Cilium (15 tools): network policy, endpoint management, debugging
- Helm (12 tools): release management, chart operations
- Istio (18 tools): service mesh config, traffic management
- K8s (19 tools): full kubectl-equivalent operations
- Gateway (12 tools): kgateway/Gateway API management
- Prometheus (8 tools): PromQL execution, metric queries
- Grafana (various): dashboard management via MCP
Every kagent agent exposes a standard A2A agent card at /.well-known/agent-card.json.
This is how agents discover each other's capabilities.
curl http://10.42.0.68:8080/.well-known/agent-card.json | python3 -m json.tool{
"capabilities": {
"pushNotifications": false,
"stateTransitionHistory": true,
"streaming": true
},
"defaultInputModes": ["text"],
"defaultOutputModes": ["text"],
"description": "An K8s Expert AI Agent specializing in cluster operations, troubleshooting, and maintenance.",
"name": "k8s_agent",
"preferredTransport": "JSONRPC",
"protocolVersion": "0.3.0",
"skills": [
{
"description": "The ability to analyze and diagnose K8s Cluster issues.",
"examples": [
"What is the status of my cluster?",
"How can I troubleshoot a failing pod?",
"What are the resource limits for my nodes?"
],
"id": "cluster-diagnostics",
"name": "Cluster Diagnostics",
"tags": ["cluster", "diagnostics"]
},
{
"id": "resource-management",
"name": "Resource Management",
"tags": ["resource", "management"]
},
{
"id": "security-audit",
"name": "Security Audit",
"tags": ["security", "audit"]
}
],
"url": "http://k8s-agent.kagent:8080",
"version": ""
}curl http://10.42.0.81:8080/.well-known/agent-card.json | python3 -m json.tool{
"capabilities": {
"pushNotifications": false,
"stateTransitionHistory": true,
"streaming": true
},
"name": "sre_orchestrator",
"protocolVersion": "0.3.0",
"skills": [],
"url": "http://sre-orchestrator.kagent:8080"
}The sre-orchestrator has "skills": [], empty.
This is because I didn't define any a2aConfig.skills in my custom agent YAML.
The k8s-agent has 3 skills defined in its Helm chart values.
In prod, you'd want to define skills for your custom agents to make them discoverable.
The protocolVersion: "0.3.0" confirms both agents use the same A2A protocol version, ensuring compatibility.
The kagent UI is accessible at http://<node-ip>:8080 after port-forwarding:
k port-forward svc/kagent-ui 8080:8080 -n kagent --address 0.0.0.0 &k get crd memories.kagent.dev -o yaml | grep -A 10 "provider:"provider:
default: Pinecone
description: The provider of the memory
enum:
- Pinecone
type: stringThe Memory CRD in kagent v0.8.6 supports only Pinecone as the memory provider. Pinecone is a managed vector database service.
This means long-term agent memory requires:
- A Pinecone account and API key
- A Pinecone index configured for the agent's embeddings
In my test env, I didn't have Pinecone configured, so the Memory CRD returned no resources. When an agent says "I will remember that...", it's only remembering within the current session (stored in PostgreSQL). Cross-session memory requires Pinecone.
Conversations are persisted in the bundled PostgreSQL instance.
The kagent get session command shows all historical sessions.
Within a session, the full conversation history is available to the agent (subject to context window limits).
kagent supports context compaction. When enabled, older messages in a long conversation are summarized to reduce token usage while preserving key information:
spec:
declarative:
context:
compaction:
compactionInterval: 5 # Summarize every 5 turnskagent:
- The CRD-based architecture is genuinely powerful for prod deployments
- 10 prod-ready agents out of the box
- The A2A protocol works reliably for multi-agent delegation
- The kagent CLI provides a clean interface for both management and invocation
- The UI is polished and functional
- Session persistence in PostgreSQL gives full audit capability
- Connecting to Nebius Token Factory (or any OpenAI-compatible API) works perfectly
HAMi:
- GPU virtualization works as advertised: 1 L40S → 10 virtual GPUs
- The scheduler correctly enforces limits (overcommit protection)
- Prometheus metrics are available and informative
- The device plugin log provides detailed allocation tracing
- Integration with k3s works after the containerd config conflict is resolved
The combination:
- kagent agents can read, understand, and explain HAMi GPU allocations
- Agent-driven GPU workload deployment with HAMi annotations works
- The full stack requires zero external AI provider dependencies
kagent v0.8.6:
- Memory CRD only supports Pinecone (no built-in vector DB option)
kmcp mcp initcommand creates empty projects without templates- Some tool calls produce JSON function call syntax as text rather than executing (intermittent)
- ModelConfig
endpointHelm value requires manualbaseUrlpatch post-install - Pod log retrieval requires the exact pod name (including hash suffix)
spec.declarative.memory.enabledfield not recognized (schema mismatch in docs)
HAMi v2.8.1:
- HAMi WebUI requires separate installation (not included in main Helm chart)
libvgpu.soinjection requires compatible base images. Pure ubuntu:22.04 without CUDA causes container crashes- The conflicting
/etc/containerd/conf.d/99-nvidia.toml(version 1 format) must be removed before installation on k3s hami-schedulerPending status after upgrade (old scheduler pod slow to terminate) is cosmetic
The combination is not just technically sound.
It's architecturally coherent.
Everything is K8s-native, everything is declarative, everything is auditable.
The same kubectl that manages your deployments manages your AI agents.
The same K8s scheduler that places your pods places your GPU workloads, with HAMi providing the virtualization layer.
Both kagent and HAMi are actively developed CNCF sandbox projects.
The issues I encountered are known and being worked on.
The future of AI infrastructure is K8s-native.
kagent + HAMi + Nebius shows what that looks like.
- kagent GitHub: https://github.com/kagent-dev/kagent
- kagent Doc: https://kagent.dev
- HAMi GitHub: https://github.com/Project-HAMi/HAMi
- HAMi Doc: https://project-hami.io
- Nebius Token Factory: https://tokenfactory.nebius.com
- Nebius AI Cloud: https://nebius.com
- CNCF Sandbox Projects: https://landscape.cncf.io
- Model Context Protocol: https://modelcontextprotocol.io
- A2A Protocol: https://google.github.io/A2A

