Skip to content

Conversation

@unixslayer
Copy link
Member

@unixslayer unixslayer commented Apr 16, 2025

Why is this change proposed?

due Prooph may change database projections status to idle after first cycle when triggered with keepRunning: false, non-polling asynchronous database projections should be triggered with keepRunning: true

Description of Changes

Issue occurs when resetting asynchronous projection. With keepRunning: false Prooph will run it once for amount of events defined with load_count option and change its status to idle. Ecotone checks projection status after first run and runs it again while status remains either as rebuilding or running.

I made following assumptions:

  • InMemory projections are not managed by locks like database projections
  • running polling projection makes application responsible for triggering projection
  • resetting polling projection acts like it was triggered
  • asynchronous, ES projection should "listen" for incoming events

Pull Request Contribution Terms

  • I have read and agree to the contribution terms outlined in CONTRIBUTING.

@unixslayer
Copy link
Member Author

CC: @dgafka @jlabedo @lifinsky

@unixslayer unixslayer requested a review from dgafka April 16, 2025 18:41
@unixslayer unixslayer force-pushed the keep-projections-running branch from 14197db to eae0245 Compare April 16, 2025 18:58
@unixslayer
Copy link
Member Author

I just ran into an idea to pass 'keepRunning' as a running option with false for default. Let me know your opinion.

@unixslayer unixslayer marked this pull request as draft April 16, 2025 21:47
@lifinsky
Copy link
Contributor

If the projection is synchronous, then after a successful projection run triggered by the event stream, its status should always transition to idle. For polling projections, the status should also be idle when there are no new events to process. In the case of projections handled asynchronously via a message queue (i.e., through asynchronous event handlers), it may be necessary to define an activity timeout — for example, if no relevant events are received within a given period (X), the projection status should also become idle. Additionally, it’s important to consider that a projection may be subscribed to a stream, but the incoming events might not modify the projection state — for instance, when the projection listens to multiple aggregates and the received events are unrelated.

@unixslayer
Copy link
Member Author

unixslayer commented Apr 17, 2025

@lifinsky so asynchronous projections could be safely run with keepRunning: true due they can be stopped by a stop signal either sent manually calling ecotone:es:stop-projection or by activity timeout.

Although the problem is actually in the way Prooph runs projections and shows up when you reset or start new projection with existing stream with lower load_count.

Ecotone runs projection with keepRunning: false depending on a Prooph's bug (prooph/pdo-event-store#246) which kept projection with running status (packages/PdoEventSourcing/src/Config/InboundChannelAdapter/ProjectionEventHandler.php:52). But that bug was solved and now, projections with keepRunning: false will stop and change to idle when:

  • loaded events are handled
  • stop signal was sent in the meantime

I managed to prepare a PR in Prooph prooph/pdo-event-store#257 so the only change required may be to increase minimum version of prooph/pdo-event-store once released.

@unixslayer
Copy link
Member Author

@dgafka @lifinsky

I've changed my approach. Instead of running projections with keepRunning : true, Ecotone projection manager can check if projection has reached the end of streams it is using.

I'm almost done, changes should be ready tomorrow...or right after Easter ;)

@lifinsky
Copy link
Contributor

@dgafka @lifinsky

I've changed my approach. Instead of running projections with keepRunning : true, Ecotone projection manager can check if projection has reached the end of streams it is using.

I'm almost done, changes should be ready tomorrow...or right after Easter ;)

If a gap occurs in a synchronous projection and it cannot be resolved within the retry window, the projection will lag behind the stream — in this case, the status should not be idle.

@unixslayer
Copy link
Member Author

@lifinsky I agree and that is the reason I've opened the PR in Prooph repo

@unixslayer unixslayer force-pushed the keep-projections-running branch from eae0245 to 259a736 Compare April 22, 2025 19:07
@unixslayer unixslayer marked this pull request as ready for review April 22, 2025 19:07
@unixslayer
Copy link
Member Author

unixslayer commented Apr 22, 2025

@dgafka prooph projections will now run until the end of stream

@lifinsky
Copy link
Contributor

@unixslayer Hi. If the number of retries during gap detection has been exhausted, we should not wait for the end of the stream, and the projection status should remain running.
Is there a test case for this scenario after the Prooph update?

@unixslayer
Copy link
Member Author

@lifinsky projection will always try to reach the end of the stream. I'll check that tomorrow but you're right, GapDetection should stop projection with status running if gap remains despite retries were exhausted and detection window is still in account. I've noticed that but have fixed on resetting projection.

Anyway, this is another issue to be addressed in Prooph.

@lifinsky
Copy link
Contributor

@lifinsky projection will always try to reach the end of the stream. I'll check that tomorrow but you're right, GapDetection should stop projection with status running if gap remains despite retries were exhausted and detection window is still in account. I've noticed that but have fixed on resetting projection.

Anyway, this is another issue to be addressed in Prooph.

Thanks. This is critical for synchronous projection execution, especially within transactional blocks or during the processing of an HTTP request (when execution time is limited, it's especially important). On the other hand, it’s possible to periodically check synchronous projections stuck in the running status and trigger their update if no new events are present — similar to how pooling projections work.

@dgafka
Copy link
Member

dgafka commented Apr 23, 2025

If the number of retries during gap detection has been exhausted, we should not wait for the end of the stream, and the projection status should remain running.

Can you elaborate more?
My expectation would be that if the gap detection was exhausted it means we can't decide whatever event was lost or not.
In that situation, I would expect us to continue till end of the stream, as otherwise if for example transaction was rolled back leaving a gap in sequence we would never continue projecting.

@dgafka dgafka merged commit 8448ec1 into ecotoneframework:main Apr 23, 2025
57 checks passed
@unixslayer
Copy link
Member Author

@lifinsky GapDetection gives you two detection windows:

  • retryConfig - for incoming events. If you expect e.g. 10 minutes gap, you may configure retries so it sums up to 10 minutes
  • detectionWindow- for resetting projection. detectionWindow is checked as first due when projection handles "old" events, which was recorded more than defined interval, there is no need to retry yet

I'd suggest, if you are expecting gaps and projection is not running only once (like one-time reports), to configure retryConfig so it exceeds time expected for gap to happen and detectionWindow with few minutes interval so resetting won't be delayed.

@lifinsky
Copy link
Contributor

I'd suggest, if you are expecting gaps and projection is not running only once (like one-time reports), to configure retryConfig so it exceeds time expected for gap to happen and detectionWindow with few minutes interval so resetting won't be delayed.

@unixslayer I’m primarily concerned about ensuring that our synchronization waiting time does not exceed acceptable timeouts. If, despite retries, synchronous projections still periodically enter the running status (gap is not resolved by retries), it’s a clear indication that they should be transitioned to asynchronous execution.

@unixslayer
Copy link
Member Author

@lifinsky Some time ago my client insisted that all projections should be synchronous due users wants to see the results of their actions immediately. And we've faced that problem, when having multiple users accessing the same model, delays were causing timeouts due to GapDetection.

IMO any projection, any read model should be considered eventually consistent by default. Therefore it is safer to make them asynchronous.

@lifinsky
Copy link
Contributor

@unixslayer Correct, if we do not have time to synchronize with the current GapDetection settings, then the projection should have the running status and lag behind, but not lead to a timeout or transaction rollbacks

@unixslayer
Copy link
Member Author

@lifinsky the case with projection status is that idle means projection had nothing left for handling. running will remain as long there are events loading. If exception occur during handling, projection will remain running with lock removed.

Projection won't be able to determine why gap occurred. You can read the comments in \Prooph\EventStore\Pdo\Projection\GapDetection to get more context. There is also a link to a thread about implementing gap detection.

@lifinsky
Copy link
Contributor

I’ve reviewed the Prooph code more carefully — and it’s actually much worse. Events will continue to play forward, and once the retry limit is exhausted, the event will effectively be skipped, causing the projection status to switch to idle.

This is likely one of the key reasons why Prooph is unsuitable for both synchronous projections and event-sourced projections with more than one consumer in high-load systems.

@lifinsky
Copy link
Contributor

But I agree that there can be rollback scenarios where it clearly makes sense to define a maximum wait time, which is generally addressed by the gap detection period. Unfortunately, though, this doesn't solve the issue of intensive parallel inserts.

In my opinion, falling behind is preferable to skipping events. However, if falling behind is acceptable, then it might actually be better to switch to a pooling projection implemented as a single worker per projection.

I’d be really interested to hear your production experience on this.

@unixslayer
Copy link
Member Author

unixslayer commented Apr 24, 2025

@lifinsky this is our default projection running configuration

ProjectionRunningConfiguration::createEventDriven($projection)
            ->withOption(
                PdoEventStoreReadModelProjector::OPTION_GAP_DETECTION,
                new GapDetection(
                    retryConfig: [0, 5000, 5000, 5000, 5000, 5000, 5000, 30000, 30000, 30000],
                    detectionWindow: new GapDetection\DateInterval('PT2M'),
                )
            )

with following assumptions:

  1. all projections are asynch - sync projections handling a model with concurrent access cause timeouts
  2. there is only one runner for each projection - running more than one is asking for concurrency at the projection level
  3. projection reset does not remove data if it is not necessary, rather overwrite it

I don't remember how many issues we've had reported from users in last 3 years regarding anything related to projections but I'm sure I could count them using single hand. The only issue we have, and this is 100% on our side, is that resetting projection can take more time than we'd like it to take but this has no impact on client.

Projections are eventually consisted by design therefore, anything that results from them cannot be considered as business invariant. We are using them only for listings, reports, notifications, etc.

However, just because above works for me it doesn't have to work for you. I don't know what kind of actual problems you're facing and how your architecture looks like. And what is the most important, how you model your domain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants