Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Commit e3132cc

Browse files
uniemimutkatila
authored andcommitted
tile resource support and refactoring
Code cleanups and support for tile resource and related telemetry originated node labels. Co-authored-by: Tuomas Katila <[email protected]> Co-authored-by: Ukri Niemimuukko <[email protected]>
1 parent 416bf0c commit e3132cc

24 files changed

+2713
-561
lines changed

.github/workflows/end-to-end-test.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,10 +11,12 @@ jobs:
1111
- name: Set up Go version
1212
uses: actions/setup-go@v1
1313
with:
14-
go-version: 1.16
14+
go-version: 1.17
1515
- name: Get tools for cluster installation
1616
run: ./.github/scripts/e2e_get_tools.sh
1717
- name: Set up cluster with TAS and custom metrics
1818
run: ./.github/scripts/e2e_setup_cluster.sh
1919
- name: Run end to end tests
20-
run: cd .github/e2e/&& go test -v e2e_test.go
20+
run: cd .github/e2e/&& go test -v e2e_test.go
21+
- name: Clean up
22+
run: ./.github/scripts/e2e_teardown_cluster.sh && ./.github/scripts/e2e_cleanup.sh

.github/workflows/go-build-and-test.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ jobs:
1515
- name: Set up Go
1616
uses: actions/setup-go@v2
1717
with:
18-
go-version: 1.16
18+
go-version: 1.17
1919

2020
- name: Build
2121
run: make test
@@ -27,7 +27,7 @@ jobs:
2727
runs-on: ubuntu-latest
2828
strategy:
2929
matrix:
30-
go-version: [ 1.16.x]
30+
go-version: [ 1.17.x]
3131
steps:
3232
- uses: actions/checkout@v2
3333

.github/workflows/static-analysis.yaml

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,16 @@ jobs:
1919
name: Hadolint
2020
steps:
2121
- uses: actions/checkout@v2
22-
- run: wget -q https://github.com/hadolint/hadolint/releases/download/v2.7.0/hadolint-Linux-x86_64 -O hadolint; chmod +x hadolint ; find . -type f \( -name "Dockerfile*" \) -print0 | xargs -n 1 -0 ./hadolint ;
22+
- run: wget -q https://github.com/hadolint/hadolint/releases/download/v2.8.0/hadolint-Linux-x86_64 -O hadolint; chmod +x hadolint ; find . -type f \( -name "Dockerfile*" \) -print0 | xargs -n 1 -0 ./hadolint ;
2323
gofmt-imports:
2424
runs-on: ubuntu-latest
2525
name: Go Fmt and Go Import
2626
steps:
2727
- uses: actions/checkout@v2
2828
- uses: actions/setup-go@v2
2929
with:
30-
go-version: 1.16
31-
- run: go get golang.org/x/tools/cmd/goimports; test -z $(goimports -l .) && test -z $(gofmt -l .)
30+
go-version: 1.17
31+
- run: go install golang.org/x/tools/cmd/goimports@v0.1.9; test -z $(goimports -l .) && test -z $(gofmt -l .)
3232

3333
golangci-TAS:
3434
strategy:
@@ -40,7 +40,7 @@ jobs:
4040
- uses: actions/checkout@v2
4141
- uses: actions/setup-go@v2
4242
with:
43-
go-version: 1.16
43+
go-version: 1.17
4444
- name: golangci-lint-TAS
4545
uses: golangci/golangci-lint-action@v2
4646
with:

gpu-aware-scheduling/README.md

Lines changed: 21 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# GPU Aware Scheduling
22
GPU Aware Scheduling (GAS) allows using GPU resources such as memory amount for scheduling decisions in Kubernetes. It is used to optimize scheduling decisions when the POD resource requirements include the use of several GPUS or fragments of GPUs on a node, instead of traditionally mapping a GPU to a pod.
33

4-
GPU Aware Scheduling is deployed in a single pod on a Kubernetes Cluster.
4+
GPU Aware Scheduling is deployed in a single pod on a Kubernetes Cluster.
55

66
**This software is a pre-production alpha version and should not be deployed to production servers.**
77

@@ -22,8 +22,9 @@ GAS tries to be agnostic about resource types. It doesn't try to have an underst
2222

2323
GAS heavily utilizes annotations. It itself annotates PODs after making filtering decisions on them, with a precise timestamp at annotation named "gas-ts". The timestamp can then be used for figuring out the time-order of the GAS-made scheduling decision for example during the GPU-plugin resource allocation phase, if the GPU-plugin wants to know the order of GPU-resource consuming POD deploying inside the node. Another annotation which GAS adds is "gas-container-cards". It will have the names of the cards selected for the containers. Containers are separated by "|", and card names are separated by ",". Thus a two-container POD in which both containers use 2 GPUs, could get an annotation "card0,card1|card2,card3". These annotations are then consumed by the Intel GPU device plugin.
2424

25-
GAS also expects labels to be in place for the nodes, in order to be able to keep book of the cluster GPU resource status. Nodes with GPUs shall be labeled with label name "gpu.intel.com/cards" and value shall be in form "card0.card1.card2.card3"... where the card names match with the intel GPUs which are currently found under /sys/class/drm folder, and the dot serves as separator. GAS expects all GPUs of the same node to be homogeneous in their resource capacity, and calculates the GPU extended resource capacity as evenly distributed to the GPUs listed by that label.
25+
Along with the "gas-container-cards" annotation there can be a "gas-container-tiles" annotation. This annotation is created when a container requests tile resources (gpu.intel.com/tiles). The gtX marking for tiles follows the sysfs entries under /sys/class/drm/cardX/gt/ where the "cardX" can be any card in the system. "gas-container-tiles" annotation marks the card+tile combos assigned to each container. For example a two container pod's annotation could be "card0:gt0+gt1|card0:gt2+gt3" where each container gets two tiles from the same GPU. The tile annotation is then converted to corresponding environment variables by the GPU plugin.
2626

27+
GAS also expects labels to be in place for the nodes, in order to be able to keep book of the cluster GPU resource status. Nodes with GPUs shall be labeled with label name "gpu.intel.com/cards" and value shall be in form "card0.card1.card2.card3"... where the card names match with the intel GPUs which are currently found under /sys/class/drm folder, and the dot serves as separator. GAS expects all GPUs of the same node to be homogeneous in their resource capacity, and calculates the GPU extended resource capacity as evenly distributed to the GPUs listed by that label.
2728

2829
## Usage with NFD and the GPU-plugin
2930
A worked example for GAS is available [here](docs/usage.md)
@@ -37,17 +38,18 @@ You should follow extender configuration instructions from the
3738
use GPU Aware Scheduling configurations, which can be found in the [deploy/extender-configuration](deploy/extender-configuration) folder.
3839

3940
#### Deploy GAS
40-
GPU Aware Scheduling uses go modules. It requires Go 1.16 with modules enabled in order to build. GAS has been tested with Kubernetes 1.22.
41+
GPU Aware Scheduling uses go modules. It requires Go 1.17 with modules enabled in order to build. GAS has been tested with Kubernetes 1.22.
4142
A yaml file for GAS is contained in the deploy folder along with its service and RBAC roles and permissions.
4243

43-
**Note:** If run without the unsafe flag a secret called extender-secret will need to be created with the cert and key for the TLS endpoint.
44-
GAS will not deploy if there is no secret available with the given deployment file.
44+
A secret called extender-secret will need to be created with the cert and key for the TLS endpoint. GAS will not deploy if there is no
45+
secret available with the given deployment file.
4546

4647
A secret can be created with:
4748

4849
``
49-
kubectl create secret tls extender-secret --cert /etc/kubernetes/<PATH_TO_CERT> --key /etc/kubernetes/<PATH_TO_KEY>
50+
kubectl create secret tls extender-secret --cert /etc/kubernetes/<PATH_TO_CERT> --key /etc/kubernetes/<PATH_TO_KEY>
5051
``
52+
5153
In order to build in your host:
5254

5355
``make build``
@@ -75,15 +77,18 @@ name |type | description| usage | default|
7577
|cert| string | location of the cert file for the TLS endpoint | --cert=/root/cert.txt| /etc/kubernetes/pki/ca.key
7678
|key| string | location of the key file for the TLS endpoint| --key=/root/key.txt | /etc/kubernetes/pki/ca.key
7779
|cacert| string | location of the ca certificate for the TLS endpoint| --key=/root/cacert.txt | /etc/kubernetes/pki/ca.crt
78-
|unsafe| bool | whether or not to listen on a TLS endpoint with the scheduler extender | --unsafe=true| false
7980
|enableAllowlist| bool | enable POD-annotation based GPU allowlist feature | --enableAllowlist| false
8081
|enableDenylist| bool | enable POD-annotation based GPU denylist feature | --enableDenylist| false
82+
|balancedResource| string | enable named resource balancing between GPUs | --balancedResource| ""
83+
84+
#### Balanced resource (optional)
85+
GAS can be configured to balance named resources so that the resource requests are distributed as evenly as possible between the GPUs. For example if the balanced resource is set to "tiles" and the containers request 1 tile each, the first container could get tile from "card0", the second from "card1", the third again from "card0" and so on.
8186

8287
## Adding the resource to make a deployment use GAS Scheduler Extender
8388

84-
For example, in a deployment file:
89+
For example, in a deployment file:
8590
```
86-
apiVersion: extensions/v1beta1
91+
apiVersion: apps/v1
8792
kind: Deployment
8893
metadata:
8994
name: demo-app
@@ -93,7 +98,7 @@ spec:
9398
replicas: 1
9499
selector:
95100
matchLabels:
96-
app: demo
101+
app: demo
97102
template:
98103
metadata:
99104
labels:
@@ -123,6 +128,12 @@ GAS Scheduler Extender is set up to use in-Cluster config in order to access the
123128
Additionally GAS Scheduler Extender listens on a TLS endpoint which requires a cert and a key to be supplied.
124129
These are passed to the executable using command line flags. In the provided deployment these certs are added in a Kubernetes secret which is mounted in the pod and passed as flags to the executable from there.
125130

131+
## License
132+
133+
[Apache License, Version 2.0](./LICENSE). All of the source code required to build the GPU Aware Scheduling is available under Open Source
134+
licenses. The source code files identify external Go modules used. The binary is distributed as a container image on
135+
[DockerHub](https://hub.docker.com/r/intel/gpu-extender). The container image contains license texts under folder `/licenses`.
136+
126137
## Communication and contribution
127138

128139
Report a bug by [filing a new issue](https://github.com/intel/platform-aware-scheduling/issues).

gpu-aware-scheduling/cmd/gas-scheduler-extender/main.go

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ package main
22

33
import (
44
"flag"
5+
"os"
56

67
"github.com/intel/platform-aware-scheduling/extender"
78
"github.com/intel/platform-aware-scheduling/gpu-aware-scheduling/pkg/gpuscheduler"
@@ -10,28 +11,29 @@ import (
1011

1112
func main() {
1213
var (
13-
kubeConfig, port, certFile, keyFile, caFile string
14-
unsafe, enableAllowlist, enableDenylist bool
14+
kubeConfig, port, certFile, keyFile, caFile, balancedRes string
15+
enableAllowlist, enableDenylist bool
1516
)
1617

1718
flag.StringVar(&kubeConfig, "kubeConfig", "/root/.kube/config", "location of kubernetes config file")
1819
flag.StringVar(&port, "port", "9001", "port on which the scheduler extender will listen")
1920
flag.StringVar(&certFile, "cert", "/etc/kubernetes/pki/ca.crt", "cert file extender will use for authentication")
2021
flag.StringVar(&keyFile, "key", "/etc/kubernetes/pki/ca.key", "key file extender will use for authentication")
2122
flag.StringVar(&caFile, "cacert", "/etc/kubernetes/pki/ca.crt", "ca file extender will use for authentication")
22-
flag.BoolVar(&unsafe, "unsafe", false, "unsafe instances of GPU aware scheduler will be served over simple http.")
2323
flag.BoolVar(&enableAllowlist, "enableAllowlist", false, "enable allowed GPUs annotation (csv list of names)")
2424
flag.BoolVar(&enableDenylist, "enableDenylist", false, "enable denied GPUs annotation (csv list of names)")
25+
flag.StringVar(&balancedRes, "balancedResource", "", "enable resource balacing within a node")
2526
klog.InitFlags(nil)
2627
flag.Parse()
2728

2829
kubeClient, _, err := extender.GetKubeClient(kubeConfig)
2930
if err != nil {
30-
panic(err)
31+
klog.Error("couldn't get kube client, cannot continue: ", err.Error())
32+
os.Exit(1)
3133
}
3234

33-
gasscheduler := gpuscheduler.NewGASExtender(kubeClient, enableAllowlist, enableDenylist)
35+
gasscheduler := gpuscheduler.NewGASExtender(kubeClient, enableAllowlist, enableDenylist, balancedRes)
3436
sch := extender.Server{Scheduler: gasscheduler}
35-
sch.StartServer(port, certFile, keyFile, caFile, unsafe)
37+
sch.StartServer(port, certFile, keyFile, caFile, false)
3638
klog.Flush()
3739
}

gpu-aware-scheduling/deploy/gas-deployment.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,9 @@ spec:
3333
readOnlyRootFilesystem: true
3434
runAsNonRoot: true
3535
runAsUser: 10001
36+
allowPrivilegeEscalation: false
37+
seccompProfile:
38+
type: RuntimeDefault
3639
volumeMounts:
3740
- name: certs
3841
mountPath: /gas/cert

gpu-aware-scheduling/deploy/images/Dockerfile_gpu-extender

Lines changed: 8 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,33 +1,11 @@
1-
#
2-
# Copyright (c) 2021 Intel Corporation
3-
#
4-
# Licensed under the Apache License, Version 2.0 (the "License");
5-
# you may not use this file except in compliance with the License.
6-
# You may obtain a copy of the License at
7-
#
8-
# http://www.apache.org/licenses/LICENSE-2.0
9-
#
10-
# Unless required by applicable law or agreed to in writing, software
11-
# distributed under the License is distributed on an "AS IS" BASIS,
12-
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13-
# See the License for the specific language governing permissions and
14-
# limitations under the License.
15-
#
16-
FROM golang:1.16-alpine as user_builder
17-
RUN adduser -D -u 10001 gas
18-
19-
FROM golang:1.16-alpine as builder
20-
ARG DIR=gpu-aware-scheduling
21-
ARG SRC_ROOT=/src_root
22-
COPY . ${SRC_ROOT}
23-
24-
RUN mkdir -p /install_root/etc
25-
COPY --from=user_builder /etc/passwd /install_root/etc/passwd
26-
27-
WORKDIR ${SRC_ROOT}/${DIR}
28-
RUN CGO_ENABLED=0 GO111MODULE=on go build -ldflags="-s -w" -o /install_root/extender ./cmd/gas-scheduler-extender \
29-
&& install -D ${SRC_ROOT}/${DIR}/LICENSE /install_root/usr/local/share/package-licenses/gpu-aware-scheduling/LICENSE \
30-
&& scripts/copy-modules-licenses.sh ./cmd/gas-scheduler-extender /install_root/usr/local/share/
1+
# SPDX-License-Identifier: Apache-2.0
2+
3+
FROM golang:1.17.7-alpine as builder
4+
COPY . /src_root
5+
WORKDIR /src_root/gpu-aware-scheduling
6+
RUN mkdir -p /install_root/etc && adduser -D -u 10001 gas && tail -1 /etc/passwd > /install_root/etc/passwd \
7+
&& CGO_ENABLED=0 GO111MODULE=on go build -ldflags="-s -w" -o /install_root/extender ./cmd/gas-scheduler-extender \
8+
&& GO111MODULE=on go run github.com/google/[email protected] save "./cmd/gas-scheduler-extender" --save_path /install_root/licenses
319

3210
FROM scratch
3311
WORKDIR /
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
GPU Aware Scheduling (GAS) is a K8s extender which allows using GPU resources such as memory amount for
2+
scheduling decisions in Kubernetes. It also supports telemetry based node labels for controlling GPU usage.
3+
4+
For further information check github at:
5+
6+
https://github.com/intel/platform-aware-scheduling
7+
8+
https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling
9+
10+
LEGAL NOTICE: By accessing, downloading or using this software and any required dependent software (the
11+
“Software Package”), you agree to the terms and conditions of the software license agreements for the
12+
Software Package, which may also include notices, disclaimers, or license terms for third party software
13+
included with the Software Package. Please refer to the “third-party-programs.txt” or other similarly-named
14+
text file for additional details.

gpu-aware-scheduling/docs/usage.md

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ Basically all versions starting with [v0.6.0](https://github.com/kubernetes-sigs
1616

1717
For picking up the labels printed by the hook installed by the GPU-plugin initcontainer, deploy nfd master with this kind of command in its yaml:
1818
```
19-
command: ["nfd-master", "--resource-labels=gpu.intel.com/memory.max,gpu.intel.com/millicores", "--extra-label-ns=gpu.intel.com"]
19+
command: ["nfd-master", "--resource-labels=gpu.intel.com/memory.max,gpu.intel.com/millicores,gpu.intel.com/tiles", "--extra-label-ns=gpu.intel.com"]
2020
```
2121

22-
The above would promote two labels, "memory.max" and "millicores" to extended resources of the node that produces the labels.
22+
The above would promote three labels, "memory.max", "millicores" and "tiles" to extended resources of the node that produces the labels.
2323

2424
If you want to enable i915 capability scanning, the nfd worker needs to read debugfs, and therefore it needs to run as privileged, like this:
2525
```
@@ -63,6 +63,14 @@ Your PODs then, needs to ask for some GPU-resources. Like this:
6363
gpu.intel.com/memory.max: 10M
6464
```
6565

66+
Or like this for tiles:
67+
```
68+
resources:
69+
limits:
70+
gpu.intel.com/i915: 1
71+
gpu.intel.com/tiles: 2
72+
```
73+
6674
A complete example pod yaml is located in [docs/example](./example)
6775

6876
## Node Label support

gpu-aware-scheduling/go.mod

Lines changed: 45 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,53 @@
11
module github.com/intel/platform-aware-scheduling/gpu-aware-scheduling
22

3-
go 1.16
3+
go 1.17
44

55
require (
6-
github.com/intel/platform-aware-scheduling/extender v0.0.0-00010101000000-000000000000
7-
github.com/smartystreets/goconvey v1.7.0
6+
github.com/intel/platform-aware-scheduling/extender v0.1.0
7+
github.com/smartystreets/goconvey v1.7.2
88
github.com/stretchr/testify v1.7.0
9-
k8s.io/api v0.22.2
10-
k8s.io/apimachinery v0.22.2
11-
k8s.io/client-go v0.22.2
12-
k8s.io/klog/v2 v2.30.0
9+
k8s.io/api v0.23.3
10+
k8s.io/apimachinery v0.23.3
11+
k8s.io/client-go v0.23.3
12+
k8s.io/klog/v2 v2.40.1
1313
)
1414

15-
replace (
16-
github.com/intel/platform-aware-scheduling/extender => ../extender
17-
github.com/intel/platform-aware-scheduling/gpu-aware-scheduling => ../gpu-aware-scheduling
15+
require (
16+
github.com/davecgh/go-spew v1.1.1 // indirect
17+
github.com/evanphx/json-patch v5.6.0+incompatible // indirect
18+
github.com/go-logr/logr v1.2.2 // indirect
19+
github.com/gogo/protobuf v1.3.2 // indirect
20+
github.com/golang/protobuf v1.5.2 // indirect
21+
github.com/google/go-cmp v0.5.7 // indirect
22+
github.com/google/gofuzz v1.2.0 // indirect
23+
github.com/googleapis/gnostic v0.5.5 // indirect
24+
github.com/gopherjs/gopherjs v0.0.0-20220104163920-15ed2e8cf2bd // indirect
25+
github.com/imdario/mergo v0.3.12 // indirect
26+
github.com/json-iterator/go v1.1.12 // indirect
27+
github.com/jtolds/gls v4.20.0+incompatible // indirect
28+
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
29+
github.com/modern-go/reflect2 v1.0.2 // indirect
30+
github.com/pkg/errors v0.9.1 // indirect
31+
github.com/pmezard/go-difflib v1.0.0 // indirect
32+
github.com/smartystreets/assertions v1.2.1 // indirect
33+
github.com/spf13/pflag v1.0.5 // indirect
34+
github.com/stretchr/objx v0.3.0 // indirect
35+
golang.org/x/net v0.0.0-20220127074510-2fabfed7e28f // indirect
36+
golang.org/x/oauth2 v0.0.0-20211104180415-d3ed0bb246c8 // indirect
37+
golang.org/x/sys v0.0.0-20220114195835-da31bd327af9 // indirect
38+
golang.org/x/term v0.0.0-20210927222741-03fcf44c2211 // indirect
39+
golang.org/x/text v0.3.7 // indirect
40+
golang.org/x/time v0.0.0-20211116232009-f0f3c7e86c11 // indirect
41+
google.golang.org/appengine v1.6.7 // indirect
42+
google.golang.org/protobuf v1.27.1 // indirect
43+
gopkg.in/inf.v0 v0.9.1 // indirect
44+
gopkg.in/yaml.v2 v2.4.0 // indirect
45+
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b // indirect
46+
k8s.io/kube-openapi v0.0.0-20220124234850-424119656bbf // indirect
47+
k8s.io/utils v0.0.0-20220127004650-9b3446523e65 // indirect
48+
sigs.k8s.io/json v0.0.0-20211208200746-9f7c6b3444d2 // indirect
49+
sigs.k8s.io/structured-merge-diff/v4 v4.2.1 // indirect
50+
sigs.k8s.io/yaml v1.3.0 // indirect
1851
)
52+
53+
replace github.com/intel/platform-aware-scheduling/gpu-aware-scheduling => ../gpu-aware-scheduling

0 commit comments

Comments
 (0)