Replies: 1 comment 1 reply
-
|
adding a But this expression uses subquery (may be there is a better way to add We cannot create expression like this via Pyrra right now as I need |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am trying to provide and SLO for platform services like istio, nginx-ingress-controller, etc. None of the existing SLO types like ratio/latency etc seem to be helping since I want to evaluate uptime of the nginx as a service, istiod as a service etc.
So I attempted to use BoolGauge which is promised to work for blackbox exporter type of situation.
Here is my SLO config
What I observe is that if the pods and the svc connected to those pods are up - I get
pyrra_availability = 100%and alsoerror budget = 100%But once I shutdown the pods to test error budget depletion.. availability metric as well as budget crashes to zero. I would have expected budget to burn down slowly.If I change the timeslot to the area where pods were up.. The pyrra_availability is reported as 100%.

Any idea what configuration is being wrongly done here? OR is this a bug in Pyrra recording rule expressions?
My analysis showed me that: both below expressions have
sum(up:sum1w{job="kubernetes-pods",namespace="app4",slo="sample-svc-uptime-slo"})andsum(up:count1w{job="kubernetes-pods",namespace="app4",slo="sample-svc-uptime-slo"})which in-turn use..
sum by (__name__, job, namespace) (sum_over_time(up{job="kubernetes-pods",namespace="app4"}[1w]))and
sum by (__name__, job, namespace) (count_over_time(up{job="kubernetes-pods",namespace="app4"}[1w]))Both these expressions sum_over_time and count_over_time have identical graphs. which is why the availability plummets to zero, I think.
Is my usage of
upas metric wrong for such kind of SLO evaluation?Beta Was this translation helpful? Give feedback.
All reactions