Skip to content

[DB-1872] Improve pinned persistent subscription performance under burst load#5576

Draft
timothycoleman wants to merge 1 commit intoshaan1337/fix-wrong-ps-checkpointfrom
timothycoleman/persistent-subscriptions
Draft

[DB-1872] Improve pinned persistent subscription performance under burst load#5576
timothycoleman wants to merge 1 commit intoshaan1337/fix-wrong-ps-checkpointfrom
timothycoleman/persistent-subscriptions

Conversation

@timothycoleman
Copy link
Copy Markdown
Contributor

@timothycoleman timothycoleman commented Apr 13, 2026

For pinned strategies when events arrive in a burst, each NotifyLiveSubscriptionMessage call was scanning the entire buffer to push events to clients. Client acks are queued behind the event flood in the same FIFO queue, so clients appear full and every scan is fruitless. With N events, this results in O(N²) total work.

Instead of pushing synchronously on event arrival, we now schedule a push message on the PS queue. This lets events accumulate in the buffer cheaply, gives acks a chance to interleave, and batches the buffer scan.

All PushToClients call sites use the deferred path for consistency, since the buffer scan cost applies equally to acks, nacks, timeouts, and client changes.

When events arrive in a burst, each NotifyLiveSubscriptionMessage call
was scanning the entire buffer to push events to clients. Client acks
are queued behind the event flood in the same FIFO queue, so clients
appear full and every scan is fruitless. With N events, this results
in O(N²) total work.

Instead of pushing synchronously on event arrival, we now schedule a
push message on the PS queue. This lets events accumulate in the
buffer cheaply, gives acks a chance to interleave, and batches the
buffer scan.

All PushToClients call sites use the deferred path for consistency,
since the buffer scan cost applies equally to acks, nacks, timeouts,
and client changes.
@timothycoleman timothycoleman changed the title Improve persistent subscription performance under burst load Improve pinned persistent subscription performance under burst load Apr 13, 2026
@timothycoleman timothycoleman changed the title Improve pinned persistent subscription performance under burst load [DB-1872] Improve pinned persistent subscription performance under burst load Apr 13, 2026
@linear
Copy link
Copy Markdown

linear bot commented Apr 13, 2026

@timothycoleman
Copy link
Copy Markdown
Contributor Author

/review

@qodo-code-review
Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Concurrency

The new deferred push mechanism relies on _pushScheduled guarded by _lock, plus an external scheduler callback into PushToClientsFromSchedule(). Validate that the scheduler cannot invoke PushToClientsFromSchedule() concurrently, out-of-order, or after disposal, and that no push requests can be lost (e.g., schedule requested, callback delayed, more events arrive, _pushScheduled prevents re-scheduling, then callback is dropped).

private void SchedulePushToClients() {
	Debug.Assert(Monitor.IsEntered(_lock));
	if (_pushScheduled)
		return;

	_pushScheduled = true;
	_settings.PushScheduler.SchedulePush();
}

public void PushToClientsFromSchedule() {
	lock (_lock) {
		_pushScheduled = false;
		PushToClients();
	}
}

private void PushToClients() {
	lock (_lock) {
Deadlock risk

SchedulePushToClients() asserts Monitor.IsEntered(_lock) and then calls _settings.PushScheduler.SchedulePush() while still under _lock. If the production scheduler implementation synchronously triggers the push callback (directly or indirectly), it could re-enter the subscription and contend on _lock, risking deadlock or priority inversions. Consider ensuring SchedulePush() is non-blocking / does not call back inline, or releasing the lock before invoking the scheduler.

private void SchedulePushToClients() {
	Debug.Assert(Monitor.IsEntered(_lock));
	if (_pushScheduled)
		return;

	_pushScheduled = true;
	_settings.PushScheduler.SchedulePush();
}
Behavior change

Changing OutstandingMessage to a readonly struct can change copying semantics and performance characteristics (more defensive copying if used via interfaces/boxing, and inability to mutate fields may surface hidden assumptions). Validate all call sites still behave correctly and there are no unintended extra copies in hot paths.

public readonly struct OutstandingMessage {
	public readonly ResolvedEvent ResolvedEvent;
	public readonly int RetryCount;
	public readonly Guid EventId;
📄 References
  1. No matching references available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant