Add EWMA and load biasing crates for failure-aware P2C balancing by unleashed · Pull Request #4537 · linkerd/linkerd2-proxy

unleashed · 2026-05-21T18:36:26Z

Today the proxy's P2C load balancer uses Tower's PeakEwma, which tracks
only round-trip time. An endpoint returning fast 503s or 429s looks
"fast" to PeakEwma, so P2C keeps routing traffic to it. This is exactly the
opposite of what operators want.

This PR adds the building blocks to make P2C failure-aware, but does not
wire anything in the proxy stack yet to keep the reviews' scope manageable.
Follow-up PR's will make use of these building blocks to activate this code
and implement related features in the circuit breaker.

Here are the main components:

linkerd-ewma. A standalone EWMA crate that supports non-mutating
time-projected reads and dual-metric tracking (RTT + penalty) under a
single lock. Tower's internal RttEstimate is private, mutates on read,
and cannot support the penalty dimension.
retry_after module in linkerd-http-classify. Parsers for HTTP
Retry-After (delay-seconds and HTTP-date per RFC 7231) and gRPC
grpc-retry-pushback-ms (per gRPC A6 spec), so the load biaser and the
upcoming circuit breaker can honor server backoff hints.
linkerd-load-biaser. A Tower Service wrapper implementing
tower::load::Load that tracks per-endpoint RTT via EWMA and injects
temporary load penalties on failure responses (HTTP 429/503/5xx, gRPC
RESOURCE_EXHAUSTED/UNAVAILABLE). When a Retry-After hint is present the
penalty is amplified to remain meaningful through the server-requested
backoff window. The load metric is max(rtt * (pending + 1), penalty),
giving P2C the ability to steer traffic away from unhealthy endpoints while
preserving the same behavior as PeakEwma when all of them are healthy.

Introduce linkerd-ewma, a general-purpose exponentially-weighted moving average crate. The crate provides five public methods on an Ewma struct: new (initializes with INFINITY sentinel), get (returns stored value), add (blends a new sample using exponential decay), add_peak (replaces stored value when the new sample exceeds it), and add_rate (derives a rate from the inverse of the elapsed interval and feeds it through add). This is being added in spite of tower::PeakEwma because this is not limited to middleware-based RTT computing. We specifically plan to use this implementation for a load biasing feature and a success-rate circuit breaker policy, which would otherwise not be possible. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Extend linkerd-ewma with the API surface needed for success-rate circuit breaking. A MIN_DECAY constant (1 ms) is now applied in both constructors so that a zero-duration decay never produces division-by-zero or NaN results in downstream arithmetic. New methods: new_with_value sets an explicit initial sample instead of the INFINITY sentinel, reset overwrites both value and timestamp for breaker recovery, and get_at projects the stored value forward through exponential decay without mutating internal state. Also add_peak is now decay-aware: it projects the stored value to the candidate timestamp before deciding whether to replace it, and it unconditionally replaces INFINITY so that the first real sample always takes effect even at the construction timestamp. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

Add a retry_after module to linkerd-http-classify with shared parsing functions for extracting backoff hints from HTTP and gRPC responses. parse_retry_after handles 429/503 responses with both delay-seconds and HTTP-date formats per RFC 7231, capping the returned duration at a caller-specified maximum. parse_grpc_retry_pushback reads the grpc-retry-pushback-ms header per the gRPC A6 spec, rejecting negative values and capping positive ones. We use the httpdate crate for the actual RFC 7231 HTTP-date parsing. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

…re penalties Introduce the linkerd-load-biaser crate, which wraps any tower::Service to provide per-endpoint load metrics for P2C balancing. The crate tracks request latency via EWMA and injects penalties when failure responses are detected, steering traffic away from unhealthy endpoints. Penalty injection covers HTTP 429/503/5xx and gRPC RESOURCE_EXHAUSTED/UNAVAILABLE trailers-only responses (not streaming gRPC failures since we can only access headers here). For responses with backoff hints, Retry-After on HTTP 429/503 or grpc-retry-pushback-ms on gRPC trailers-only errors, the penalty is amplified so that the EWMA value remains meaningful through the server-requested backoff window. The amplification is clamped to prevent infinity from permanently disabling the endpoint. The load metric is computed as `max(rtt * (pending + 1), penalty)`, where `rtt` is the peak-EWMA latency, and `pending` is the number of in-flight requests. This is returned via tower::load::Load for direct P2C integration. The load biaser is disabled by default, preserving RTT-only behavior (PeakEwma equivalent), unless explicitly activated. Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

These cover the complete load biasing lifecycle, including penalty injection, hint parsing, cancellation safety via PinnedDrop, and backwards-compatible behavior when disabled (ie. RTT-only behavior equivalent to PeakEwma). Signed-off-by: Alejandro Martinez Ruiz <amr@buoyant.io>

raykroeker

@unleashed Thanks for the documentation. It really helps understand the intent.
+100

cratelyn · 2026-05-21T20:51:04Z

 futures = { version = "0.3", default-features = false }
 http = { workspace = true }
 http-body = { workspace = true }
+httpdate.workspace = true


nit: just to follow convention with the rest of the file

Suggested change

httpdate.workspace = true

httpdate = { workspace = true }

cratelyn · 2026-05-22T20:24:31Z

A standalone EWMA crate that supports non-mutating
time-projected reads and dual-metric tracking (RTT + penalty) under a
single lock. Tower's internal RttEstimate is private, mutates on read,
and cannot support the penalty dimension.

i'm having trouble connecting the dot from this note in the pull request description to this code.

i don't see an rtt_estimate value, like the tower code in question. is that just value here?

it would also be helpful if we could record some of this rationale in a crate-level //! doc comment, so that we retain some record about why we're not using tower's equivalent EWMA.

copy-pasting the excerpt quoted above is fine by me :)

cratelyn · 2026-05-22T20:29:17Z

+publish = { workspace = true }
+
+[dependencies]
+tokio = { version = "1", features = ["time"] }


maybe default-features = false would be nice here too, since we're not setting up a runtime or anything else in this crate. time interfaces seem like all we're using here!

cratelyn · 2026-05-22T20:31:20Z

+    #[test]
+    fn parse_grpc_pushback_positive() {
+        let mut headers = HeaderMap::new();
+        headers.insert("grpc-retry-pushback-ms", HeaderValue::from_static("5000"));


should this and the tests below use the GRPC_RETRY_PUSHBACK_MS? tests above use http::header::RETRY_AFTER, so that'd be consistent.

cratelyn · 2026-05-22T20:32:51Z

+linkerd-ewma = { path = "../ewma" }
+futures = { version = "0.3", default-features = false }
+http = { workspace = true }
+linkerd-http-classify = { path = "../http/classify" }
+linkerd-stack = { path = "../stack" }
+parking_lot = "0.12"
+pin-project = "1"
+tokio = { version = "1", features = ["io-util", "net", "time"] }
+tokio-test = { version = "0.4", optional = true }
+tower = { workspace = true, features = ["load"] }
+tower-service = { workspace = true }
+tracing = { workspace = true }


Suggested change

linkerd-ewma = { path = "../ewma" }

futures = { version = "0.3", default-features = false }

http = { workspace = true }

linkerd-http-classify = { path = "../http/classify" }

linkerd-stack = { path = "../stack" }

parking_lot = "0.12"

pin-project = "1"

tokio = { version = "1", features = ["io-util", "net", "time"] }

tokio-test = { version = "0.4", optional = true }

tower = { workspace = true, features = ["load"] }

tower-service = { workspace = true }

tracing = { workspace = true }

futures = { version = "0.3", default-features = false }

http = { workspace = true }

parking_lot = "0.12"

pin-project = "1"

tokio = { version = "1", features = ["io-util", "net", "time"] }

tokio-test = { version = "0.4", optional = true }

tower = { workspace = true, features = ["load"] }

tower-service = { workspace = true }

tracing = { workspace = true }

linkerd-ewma = { path = "../ewma" }

linkerd-http-classify = { path = "../http/classify" }

linkerd-stack = { path = "../stack" }

nit, alphabetize and pulling path-based dependencies into a separate block

cratelyn

thanks for breaking these additions out into a standalone pull request, separate from the changes we'll be making in our proxy stack(s). that really helped expedite review of this.

cratelyn · 2026-05-22T20:49:04Z

+    fn attach_parsed_rate_limit_hint(&mut self, _max: Duration) {
+        // Store the uncapped value. Each consumer applies their own cap via
+        // rate_limit_hint(max).
+        if let Some(d) = linkerd_http_classify::retry_after::parse_retry_after(
+            self.status(),
+            self.headers(),
+            Duration::MAX,
+        ) {
+            self.extensions_mut().insert(CachedRateLimitHint(d));
+            return;
+        }


i'm a little confused about why this _max isn't used. do we have any implementations of this trait where we do use the max parameter in this method? i don't see it ever used.

cratelyn · 2026-05-22T20:51:58Z

+/// via `rate_limit_hint(max)`, so different callers (e.g. load biaser vs
+/// circuit breaker) can use different maximums from the same cached value.
+#[derive(Clone, Copy, Debug)]
+pub struct CachedRateLimitHint(pub Duration);


does this need to be pub?

the uncapped value, along with the fact that this is intended for use with rate_limit_hint, makes me wonder if a constructor pub fn new could work for creating these, while preventing accidents with an uncapped duration in the future.

unleashed added 5 commits May 21, 2026 20:25

unleashed requested a review from cratelyn May 21, 2026 18:36

unleashed requested a review from a team as a code owner May 21, 2026 18:36

raykroeker reviewed May 22, 2026

View reviewed changes

cratelyn assigned unleashed May 22, 2026

cratelyn reviewed May 22, 2026

View reviewed changes

cratelyn approved these changes May 22, 2026

View reviewed changes

cratelyn reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add EWMA and load biasing crates for failure-aware P2C balancing#4537

Add EWMA and load biasing crates for failure-aware P2C balancing#4537
unleashed wants to merge 5 commits into
mainfrom
amr/load-biaser

unleashed commented May 21, 2026

Uh oh!

raykroeker left a comment

Uh oh!

cratelyn May 21, 2026

Uh oh!

cratelyn May 22, 2026

Uh oh!

cratelyn May 22, 2026

Uh oh!

cratelyn May 22, 2026

Uh oh!

cratelyn May 22, 2026

Uh oh!

cratelyn May 22, 2026

Uh oh!

Uh oh!

cratelyn left a comment

Uh oh!

cratelyn May 22, 2026

Uh oh!

cratelyn May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

unleashed commented May 21, 2026

Uh oh!

raykroeker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cratelyn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants