Skip to content
This repository was archived by the owner on Jan 29, 2025. It is now read-only.

Commit 02afdee

Browse files
committed
Update docs related to 0.7.0 release
This commit will: - update descheduler versions used for demos - update power docs with handling for collectd missing read permissions for energy files Signed-off-by: Madalina Lazar <[email protected]>
1 parent 767e0c6 commit 02afdee

File tree

3 files changed

+31
-4
lines changed

3 files changed

+31
-4
lines changed

telemetry-aware-scheduling/docs/health-metric-example.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,7 @@ We can see that only two of our three nodes was deemed suitable because it didn'
155155
Descheduler can be installed as a binary with the instructions from the [project repo](https://github.com/kubernetes-sigs/descheduler). This demo has been tested with the following Descheduler versions:
156156
1. **v0.23.1** and older
157157
2. **v0.27.1**
158+
3. **v0.28.1**
158159

159160
Telemetry Aware Scheduler and the following Descheduler versions **v0.24.x** to **v0.26.x** seem to have compatibility issues (https://github.com/intel/platform-aware-scheduling/issues/90#issuecomment-1169012485 links to an issue cut to the Descheduler project team). The problem seems to have been fixed in Descheduler **v0.27.1**.
160161

telemetry-aware-scheduling/docs/power/README.md

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,24 @@ The agent should soon be up and running on each node in the cluster. This can be
3737

3838
`kubectl get pods -nmonitoring -lapp.kubernetes.io/name=collectd`
3939

40+
Ensure the collectd pod(s) - one per each node in your cluster, have been set-up properly. Using `kubectl logs collectd-**** -n monitoring` open the pod logs and ensure there are no errors/warning and that you see something similar to:
41+
42+
```
43+
plugin_load: plugin "python" successfully loaded.
44+
plugin_load: plugin "write_prometheus" successfully loaded.
45+
write_prometheus plugin: Listening on [::]:9103.
46+
write_prometheus plugin: MHD_OPTION_EXTERNAL_LOGGER is not the first option specified for the daemon. Some messages may be printed by the standard MHD logger.
47+
48+
Initialization complete, entering read-loop.
49+
```
50+
51+
If you encounter errors like the one below in your collectd pod, check read permissions for the file in question. From running this demo so far, it seems like collectd requires at least read permissions for the ***'others'*** permission group in order to access sys energy files.
52+
53+
```
54+
Unhandled python exception in read callback: PermissionError: [Errno 13] Permission denied: '/sys/devices/virtual/powercap/intel-rapl/intel-rapl:1/energy_uj'
55+
read-function of plugin `python.pkgpower' failed. Will suspend it for 20.000 seconds.
56+
```
57+
4058
Additionally we can check that collectd is indeed serving metrics by running `collectd_ip=$(kubectl get svc -n monitoring -o json | jq -r '.items[] | select(.metadata.name=="collectd").spec.clusterIP'); curl -x "" $collectd_ip:9103/metrics`. The output should look similar to the lines below:
4159

4260
```
@@ -208,6 +226,13 @@ This command should return some json objects containing pods and associated powe
208226
```
209227
*Note* If the above command returns an error there is some issue in the metrics pipeline that requires resolution before continuing.
210228

229+
If the `node_package_power_per_pod` metric is visible but it doesn't have a value ("items" is an empty array like in the response below) please check both the `node_package_power_per_pod` and `package_power_avg` metrics in Prometheus UI. Assuming the collectd and kube-state-metrics pods have been set-up correctly, you should see `node_package_power_per_pod` and `package_power_avg` metrics for each node in your cluster. Expect a similar behaviour for collectd metrics like `collectd_package_*_TDP_power`.
230+
If you notice metrics are missing for some nodes, start by investigating which nodes are affected and check logs for the pods in question.
231+
232+
```json
233+
{"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/node_package_power_per_pod"},"items":[]}
234+
```
235+
211236
### 4: Create TAS Telemetry Policy for power
212237
On the Kubernetes BMRA set up Telemetry Aware Scheduling is already installed and integrated with the Kubernetes control plane. In order to allow our cluster to make scheduling decisions based on current power usage we simply need to create a telemetry policy and associate appropriate pods with that policy.
213238

@@ -219,7 +244,7 @@ then
219244

220245
We can see our policy by running:
221246

222-
``kubectl describe taspolicy power-sensitive-scheduling-policy``
247+
``kubectl describe taspolicy power-sensitive-scheduling-policy -n power-demo``
223248

224249
### 5: Create horizontal autoscaler for power
225250
The Horizontal Pod Autoscaler is an in-built part of Kubernetes which allows scaling decisions to be made based on arbitrary metrics. In this case we're going to scale our workload based on the average power usage on nodes with current instances of our workload.
@@ -230,7 +255,7 @@ To create this Autoscaler:
230255

231256
We can see the details of our autoscaler with:
232257

233-
``kubectl describe hpa power-autoscaler``
258+
``kubectl describe hpa power-autoscaler -n power-demo``
234259

235260
Don't worry if the autoscaler shows errors at this point. It won't work until we've deployed an application it's interested in - based on the "scaleTargetRef" field in its specification file.
236261

@@ -280,7 +305,7 @@ TAS will schedule it to the node with the lowest power usage, and will avoid sch
280305

281306
```
282307
283-
kubectl logs -l app=tas
308+
kubectl logs -l app=tas -n telemetry-aware-scheduling
284309
285310
2020/XX/XX XX:XX:XX filter request recieved
286311
2020/XX/XX XX:XX:XX node1 package_power_avg = 72.133
@@ -297,7 +322,7 @@ Once the stressng pod starts to run the horizontal pod autoscaler will notice an
297322

298323
```
299324
300-
watch kubectl describe hpa power-autoscaler
325+
watch kubectl describe hpa power-autoscaler -n power-demo
301326
302327
Name: power-autoscaler
303328
Namespace: default

telemetry-aware-scheduling/docs/strategy-labeling-example.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -266,6 +266,7 @@ Once the metric changes for a given node, and it returns to a schedulable condit
266266
This demo has been tested with the following Descheduler versions:
267267
1. **v0.23.1** and older
268268
2. **v0.27.1**
269+
3. **v0.28.0**
269270

270271
Telemetry Aware Scheduler and the following descheduler versions **v0.24.x** to **v0.26.x** seem to have compatibility issues (https://github.com/intel/platform-aware-scheduling/issues/90#issuecomment-1169012485 links to an issue cut to the Descheduler project team). The problem seems to have been fixed in Descheduler **v0.27.1**.
271272

0 commit comments

Comments
 (0)