You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jan 29, 2025. It is now read-only.
This commit will:
- update descheduler versions used for demos
- update power docs with handling for collectd missing read permissions for energy files
Signed-off-by: Madalina Lazar <[email protected]>
Copy file name to clipboardExpand all lines: telemetry-aware-scheduling/docs/health-metric-example.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -155,6 +155,7 @@ We can see that only two of our three nodes was deemed suitable because it didn'
155
155
Descheduler can be installed as a binary with the instructions from the [project repo](https://github.com/kubernetes-sigs/descheduler). This demo has been tested with the following Descheduler versions:
156
156
1.**v0.23.1** and older
157
157
2.**v0.27.1**
158
+
3.**v0.28.1**
158
159
159
160
Telemetry Aware Scheduler and the following Descheduler versions **v0.24.x** to **v0.26.x** seem to have compatibility issues (https://github.com/intel/platform-aware-scheduling/issues/90#issuecomment-1169012485 links to an issue cut to the Descheduler project team). The problem seems to have been fixed in Descheduler **v0.27.1**.
Copy file name to clipboardExpand all lines: telemetry-aware-scheduling/docs/power/README.md
+29-4Lines changed: 29 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -37,6 +37,24 @@ The agent should soon be up and running on each node in the cluster. This can be
37
37
38
38
`kubectl get pods -nmonitoring -lapp.kubernetes.io/name=collectd`
39
39
40
+
Ensure the collectd pod(s) - one per each node in your cluster, have been set-up properly. Using `kubectl logs collectd-**** -n monitoring` open the pod logs and ensure there are no errors/warning and that you see something similar to:
write_prometheus plugin: MHD_OPTION_EXTERNAL_LOGGER is not the first option specified for the daemon. Some messages may be printed by the standard MHD logger.
47
+
48
+
Initialization complete, entering read-loop.
49
+
```
50
+
51
+
If you encounter errors like the one below in your collectd pod, check read permissions for the file in question. From running this demo so far, it seems like collectd requires at least read permissions for the ***'others'*** permission group in order to access sys energy files.
read-function of plugin `python.pkgpower' failed. Will suspend it for 20.000 seconds.
56
+
```
57
+
40
58
Additionally we can check that collectd is indeed serving metrics by running `collectd_ip=$(kubectl get svc -n monitoring -o json | jq -r '.items[] | select(.metadata.name=="collectd").spec.clusterIP'); curl -x "" $collectd_ip:9103/metrics`. The output should look similar to the lines below:
41
59
42
60
```
@@ -208,6 +226,13 @@ This command should return some json objects containing pods and associated powe
208
226
```
209
227
*Note* If the above command returns an error there is some issue in the metrics pipeline that requires resolution before continuing.
210
228
229
+
If the `node_package_power_per_pod` metric is visible but it doesn't have a value ("items" is an empty array like in the response below) please check both the `node_package_power_per_pod` and `package_power_avg` metrics in Prometheus UI. Assuming the collectd and kube-state-metrics pods have been set-up correctly, you should see `node_package_power_per_pod` and `package_power_avg` metrics for each node in your cluster. Expect a similar behaviour for collectd metrics like `collectd_package_*_TDP_power`.
230
+
If you notice metrics are missing for some nodes, start by investigating which nodes are affected and check logs for the pods in question.
On the Kubernetes BMRA set up Telemetry Aware Scheduling is already installed and integrated with the Kubernetes control plane. In order to allow our cluster to make scheduling decisions based on current power usage we simply need to create a telemetry policy and associate appropriate pods with that policy.
The Horizontal Pod Autoscaler is an in-built part of Kubernetes which allows scaling decisions to be made based on arbitrary metrics. In this case we're going to scale our workload based on the average power usage on nodes with current instances of our workload.
Don't worry if the autoscaler shows errors at this point. It won't work until we've deployed an application it's interested in - based on the "scaleTargetRef" field in its specification file.
236
261
@@ -280,7 +305,7 @@ TAS will schedule it to the node with the lowest power usage, and will avoid sch
Copy file name to clipboardExpand all lines: telemetry-aware-scheduling/docs/strategy-labeling-example.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -266,6 +266,7 @@ Once the metric changes for a given node, and it returns to a schedulable condit
266
266
This demo has been tested with the following Descheduler versions:
267
267
1.**v0.23.1** and older
268
268
2.**v0.27.1**
269
+
3.**v0.28.0**
269
270
270
271
Telemetry Aware Scheduler and the following descheduler versions **v0.24.x** to **v0.26.x** seem to have compatibility issues (https://github.com/intel/platform-aware-scheduling/issues/90#issuecomment-1169012485 links to an issue cut to the Descheduler project team). The problem seems to have been fixed in Descheduler **v0.27.1**.
0 commit comments