Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 9 additions & 9 deletions blog/kcd-beijing-2026-dra-gpu-scheduling/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ The problem with the traditional model is its limited expressiveness:
- Multi-card combinations
- Topology (NUMA / NVLink)

👉 This directly leads to:
This directly leads to:

- Scheduling logic leakage (extender / sidecar)
- Increased system complexity
Expand All @@ -91,7 +91,7 @@ Key change:

A key slide in the PPT, often overlooked:

### 👉 DRA request looks like this
### DRA request looks like this

```yaml
spec:
Expand Down Expand Up @@ -119,15 +119,15 @@ resources:
nvidia.com/gpu: 1
```

👉 The conclusion is clear:
The conclusion is clear:

> **DRA is an upgrade in capability, but UX is clearly degraded.**

## HAMi-DRA's Key Breakthrough: Automation

One of the most valuable parts of this talk:

### 👉 Webhook Automatically Generates ResourceClaim
### Webhook Automatically Generates ResourceClaim

HAMi's approach is not to have users "use DRA directly", but:

Expand Down Expand Up @@ -178,7 +178,7 @@ DRA driver is not just "registering resources", but full lifecycle management:
- Environment variable management
- Temporary directories (cache / lock)

👉 This means:
This means:

> **GPU scheduling has entered the runtime orchestration layer, not just simple resource allocation.**

Expand All @@ -191,7 +191,7 @@ A key benchmark from the PPT:
- HAMi (traditional): up to ~42,000
- HAMi-DRA: significantly reduced (~30%+ improvement)

👉 This shows:
This shows:

> **DRA's resource pre-binding mechanism can reduce scheduling conflicts and retries**

Expand All @@ -211,7 +211,7 @@ An underestimated change:
- ResourceClaim: resource allocation
- → **Resource perspective is first-class**

👉 The change:
The change:

> **Observability shifts from "inference" to "direct modeling"**

Expand All @@ -227,7 +227,7 @@ For example:
- PCI bus ID
- GPU attributes

👉 This is a bigger narrative:
This is a bigger narrative:

> **DRA is the starting point for heterogeneous compute abstraction**

Expand All @@ -249,7 +249,7 @@ Connecting these points reveals a bigger trend:

- Scheduling logic → resource declaration

👉 Essentially:
Essentially:

> **Kubernetes is evolving into the AI Infra Control Plane**

Expand Down
Loading