You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Sep 9, 2024. It is now read-only.
Utilizing a container orchestration engine like Kubernetes, CPU resources are allocated from a pool of platforms
6
6
entirely based on availability, without taking into account specific features like Intel Speed Select Technology (SST).
@@ -29,16 +29,17 @@ specifically Intel SST.
29
29
30
30
### Use Cases:
31
31
32
-
-*High performance workload known at peak times.*
33
-
May want to pre-schedule nodes to move to a performance profile during peak times to minimize spin up.
34
-
At times not during peak, may want to move to a power saving profile.
32
+
- Users may want to pre-schedule nodes to move to a performance profile during peak times to minimize spin up.
33
+
At times not during peak, they may want to move to a power saving profile.
35
34
-*Unpredictable machine use.*
36
-
May use machine learning through monitoring to determine profiles that predict a peak need for a compute, to spin up
35
+
Users may use machine learning through monitoring to determine profiles that predict a peak need for compute, to spin up
37
36
ahead of time.
38
37
-*Power Optimization over Performance.*
39
-
A cloud may be interested in fast response time, but not in maximal response time, so may choose to spin up cores on
38
+
A user may be interested in fast response time, but not in maximal response time, so may choose to spin up cores on
40
39
demand and only those cores used but want to remain in power-saving mode the rest of the time.
41
40
41
+
### Further Info:
42
+
Please see the _diagrams-docs_ directory for diagrams with a visual breakdown of the power manager and its components.
42
43
## Functionality of the Kubernetes Power Manager
43
44
44
45
-**SST-BF - (Speed Select Technology - Base Frequency)**
@@ -76,18 +77,25 @@ specifically Intel SST.
76
77
Time of Day is designed to allow the user to select a specific time of day that they can put all their unused CPUs
77
78
into “sleep” state and then reverse the process and select another time to return to an “active” state.
78
79
79
-
-**P-State**
80
-
81
-
Modern Intel CPUs automatically employ the Intel P_State CPU power scaling driver. This driver is integrated rather
82
-
than a module, giving it precedence over other drivers. For Sandy Bridge and newer CPUs, this driver is currently used
83
-
automatically. The BIOS P-State settings might be disregarded by Intel P-State.
84
-
The Intel P-State driver utilizes the "Performance" and "Powersave" governors.
85
-
***Performance***
86
-
The CPUfreq governor "performance" sets the CPU statically to the highest frequency within the borders of
87
-
scaling_min_freq and scaling_max_freq.
88
-
***Powersave***
89
-
The CPUfreq governor "powersave" sets the CPU statically to the lowest frequency within the borders of
90
-
scaling_min_freq and scaling_max_freq.
80
+
-**Scaling Drivers**
81
+
***P-State**
82
+
83
+
Modern Intel CPUs automatically employ the Intel P_State CPU power scaling driver. This driver is integrated rather
84
+
than a module, giving it precedence over other drivers. For Sandy Bridge and newer CPUs, this driver is currently used
85
+
automatically. The BIOS P-State settings might be disregarded by Intel P-State.
86
+
The Intel P-State driver utilizes the "Performance" and "Powersave" governors.
87
+
***Performance***
88
+
The CPUfreq governor "performance" sets the CPU statically to the highest frequency within the borders of
89
+
scaling_min_freq and scaling_max_freq.
90
+
***Powersave***
91
+
The CPUfreq governor "powersave" sets the CPU statically to the lowest frequency within the borders of
92
+
scaling_min_freq and scaling_max_freq.
93
+
***acpi-cpufreq**
94
+
95
+
The acpi-cpufreq driver setting operates much like the P-state driver but has a different set of available governors. For more information see [here](https://www.kernel.org/doc/html/v4.12/admin-guide/pm/cpufreq.html).
96
+
One thing to note is that acpi-cpufreq reports the base clock as the frequency hardware limits however the P-state driver uses turbo frequency limits.
97
+
Both drivers can make use of turbo frequency; however, acpi-cpufreq can exceed hardware frequency limits when using turbo frequency.
98
+
This is important to take into account when setting frequencies for profiles.
91
99
92
100
-**Uncore**
93
101
The largest part of modern CPUs is outside the actual cores. On Intel CPUs this is part is called the "Uncore" and has
@@ -121,7 +129,7 @@ specifically Intel SST.
121
129
Note: NFD is recommended, but not essential. Node labels can also be applied manually. See
122
130
the [NFD repo](https://github.com/kubernetes-sigs/node-feature-discovery#feature-labels) for a full list of features
123
131
labels.
124
-
* In the kubelet configuration file the cpuManagerPolicy has to set to "static", and the reservedSystemCPUs are set to
132
+
***Important**: In the kubelet configuration file the cpuManagerPolicy has to set to "static", and the reservedSystemCPUs must be set to
125
133
the desired value:
126
134
127
135
````yaml
@@ -173,6 +181,44 @@ syncFrequency: 0s
173
181
volumeStatsAggPeriod: 0s
174
182
````
175
183
184
+
## Deploying the Kubernetes Power Manager using Helm
185
+
186
+
The Kubernetes Power Manager includes a helm chart for the latest releases, allowing the user to easily deploy
187
+
everything that is needed for the overarching operator and the node agent to run. The following versions are
188
+
supported with helm charts:
189
+
190
+
* v2.0.0
191
+
* v2.1.0
192
+
* v2.2.0
193
+
* v2.3.0
194
+
195
+
When set up using the provided helm charts, the following will be deployed:
196
+
197
+
* The intel-power namespace
198
+
* The RBAC rules for the operator and node agent
199
+
* The operator deployment itself
200
+
* The operator's power config
201
+
* A shared power profile
202
+
203
+
To change any of the values the above are deployed with, edit the values.yaml file of the relevant helm chart.
204
+
205
+
To deploy the Kubernetes Power Manager using Helm, you must have Helm installed. For more information on installing
206
+
Helm, see the installation guide here https://helm.sh/docs/intro/install/.
207
+
208
+
The Kubernetes Power Manager has make targets for each version available. To deploy the latest version, use the following command:
209
+
210
+
`make helm-install`
211
+
212
+
To uninstall the latest version, use the following command:
213
+
214
+
`make helm-uninstall`
215
+
216
+
Or you can use the following commands to deploy a specific version of the Kubernetes Power Manager:
217
+
218
+
`make helm-install-v2.2.0`
219
+
`make helm-install-v2.1.0`
220
+
`make helm-install-v2.0.0`
221
+
176
222
## Working environments
177
223
178
224
The Kubernetes Power Manager has been tested in different environments.
@@ -538,49 +584,6 @@ spec:
538
584
C1: true
539
585
C6: false
540
586
````
541
-
### Time Of Day
542
-
The TIme Of Day feature allows users to change the configuration of their system at a given time each day. This is done through the use of a `timeofdaycronjob`
543
-
which schedules itself for a specific time each day and gives users the option of tuning cstates, the shared pool profile as well as the profile used by individual pods.
544
-
#### Example
545
-
````yaml
546
-
apiVersion: power.intel.com/v1
547
-
kind: TimeOfDay
548
-
metadata:
549
-
name: timeofday-sample
550
-
namespace: intel-power
551
-
spec:
552
-
timeZone: "Eire"
553
-
schedule:
554
-
- time: "14:24"
555
-
# this sets the profile for the shared pool
556
-
powerProfile: balance-performance
557
-
# this transitions a pod with the given name from one profile to another
558
-
pods:
559
-
four-pod-v2:
560
-
performance: balance-performance
561
-
four-pod-v4:
562
-
balance-power: performance
563
-
# this field simply takes a cstate spec
564
-
cState:
565
-
sharedPoolCStates:
566
-
C1: false
567
-
C6: true
568
-
- time: "14:26"
569
-
powerProfile: performance
570
-
cState:
571
-
sharedPoolCStates:
572
-
C1: true
573
-
C6: false
574
-
- time: "14:28"
575
-
powerProfile: balance-power
576
-
reservedCPUs: [0,1]
577
-
````
578
-
When applying changes to the shared pool, users must specify the CPUs reserved by the system. Additionally the user must specify a timezone to schedule with.
579
-
The configuration for Time Of Day consists of a schedule list. Each item in the list consists of a time and any desired changes to the system.
580
-
The `profile` field specifies the desired profile for the shared pool.
581
-
The `pods` field is used to change the profile associated with a specific pod.
582
-
It should be noted that when changing the profile of an individual pod, the user must specify its name and current profile at the time as well as the desired new profile.
583
-
Finally the `cState` field accepts the spec values from a CStates configuration and applies them to the system.
584
587
585
588
### intel-pstate CPU Performance Scaling Driver
586
589
The intel_pstate is a part of the CPU performance scaling subsystem in the Linux kernel (CPUFreq).
@@ -597,6 +600,11 @@ Finally the `cState` field accepts the spec values from a CStates configuration
597
600
##### Performance Governor
598
601
The CPUfreq governor "performance" sets the CPU statically to the highest frequency within the borders of scaling_min_freq and scaling_max_freq.
599
602
603
+
### acpi-cpufreq scaling driver
604
+
An alternative to the P-state driver is the acpi-cpufreq driver.
605
+
It operates in a similar fashion to the P-state driver but offers a different set of governors which can be seen [here](https://www.kernel.org/doc/html/v4.12/admin-guide/pm/cpufreq.html).
606
+
One notable difference between the P-state and acpi-cpufreq driver is that the afformentioned scaling_max_freq value is limited to base clock frequencies rather than turbo frequencies.
607
+
When turbo is enabled the core frequency will still be capable of exceeding base clock frequencies and the value of scaling_max_freq.
600
608
601
609
### Time Of Day
602
610
@@ -616,29 +624,44 @@ metadata:
616
624
spec:
617
625
timeZone: "Eire"
618
626
schedule:
619
-
- time: "14:24"
627
+
- time: "14:56"
620
628
# this sets the profile for the shared pool
621
-
powerProfile: balance-performance
622
-
# this transitions a pod with the given name from one profile to another
629
+
powerProfile: balance-power
630
+
# this transitions exclusive pods matching a given label from one profile to another
631
+
# please ensure that only pods to be used by power manager have this label
623
632
pods:
624
-
four-pod-v2:
625
-
performance: balance-performance
626
-
four-pod-v4:
627
-
balance-power: performance
633
+
- labels:
634
+
matchLabels:
635
+
power: "true"
636
+
target: balance-performance
637
+
- labels:
638
+
matchLabels:
639
+
special: "false"
640
+
target: balance-performance
628
641
# this field simply takes a cstate spec
629
642
cState:
630
643
sharedPoolCStates:
631
644
C1: false
632
645
C6: true
633
-
- time: "14:26"
634
-
powerProfile: performance
646
+
- time: "23:57"
647
+
powerProfile: shared
635
648
cState:
636
649
sharedPoolCStates:
637
650
C1: true
638
651
C6: false
639
-
- time: "14:28"
652
+
pods:
653
+
- labels:
654
+
matchLabels:
655
+
power: "true"
656
+
target: performance
657
+
- labels:
658
+
matchLabels:
659
+
special: "false"
660
+
target: balance-power
661
+
- time: "14:35"
640
662
powerProfile: balance-power
641
663
reservedCPUs: [ 0,1 ]
664
+
642
665
```
643
666
644
667
When applying changes to the shared pool, users must specify the CPUs reserved by the system. Additionally the user must
@@ -647,8 +670,8 @@ The configuration for Time Of Day consists of a schedule list. Each item in the
647
670
changes to the system.
648
671
The `profile` field specifies the desired profile for the shared pool.
649
672
The `pods` field is used to change the profile associated with a specific pod.
650
-
It should be noted that when changing the profile of an individual pod, the user must specify its name and current
651
-
profile at the time as well as the desired new profile.
673
+
To change the profile of specific pods users must provide a set of labels and profiles. When a pod matching a label is found it will be placed in a workload that matches the requested profile.
674
+
Please note that all pods matching a provided label must be configured for use with the power manager by requesting an intial profile and dedicated cores.
652
675
Finally the `cState` field accepts the spec values from a CStates configuration and applies them to the system.
653
676
654
677
### Uncore Frequency
@@ -675,37 +698,6 @@ spec:
675
698
max: 2400000
676
699
````
677
700
678
-
### intel-pstate CPU Performance Scaling Driver
679
-
680
-
The intel_pstate is a part of the CPU performance scaling subsystem in the Linux kernel (CPUFreq).
681
-
682
-
In some situations it is desirable or even necessary to run the program as fast as possible and then there is no reason
683
-
to use any P-states different from the highest one (i.e. the highest-performance frequency/voltage configuration
684
-
available). In some other cases, however, it may not be necessary to execute instructions so quickly and maintaining the
685
-
highest available CPU capacity for a relatively long time without utilizing it entirely may be regarded as wasteful. It
686
-
also may not be physically possible to maintain maximum CPU capacity for too long for thermal or power supply capacity
687
-
reasons or similar. To cover those cases, there are hardware interfaces allowing CPUs to be switched between different
688
-
frequency/voltage configurations or (in the ACPI terminology) to be put into different P-states.
689
-
690
-
#### P-State Governors
691
-
692
-
In order to offer dynamic frequency scaling, the cpufreq core must be able to tell these drivers of a "target
693
-
frequency". So these specific drivers will be transformed to offer a "->target/target_index/fast_switch()" call instead
694
-
of the "->setpolicy()" call. For set_policy drivers, all stays the same, though.
695
-
696
-
The cpufreq governors decide what frequency within the CPUfreq policy should be used. The P-state driver utilizes the "
697
-
powersave" and "performance" governors.
698
-
699
-
##### Powersave Governor
700
-
701
-
The CPUfreq governor "powersave" sets the CPU statically to the lowest frequency within the borders of scaling_min_freq
702
-
and scaling_max_freq.
703
-
704
-
##### Performance Governor
705
-
706
-
The CPUfreq governor "performance" sets the CPU statically to the highest frequency within the borders of
0 commit comments