Change client IO pipeline to be inline with Java client design: IO thread pool, response threads, lock-free writes [HZ-5387]#1412
Merged
ihsandemir merged 59 commits intohazelcast:masterfrom Apr 10, 2026
Conversation
JackPGreen
requested changes
Mar 13, 2026
Contributor
JackPGreen
left a comment
There was a problem hiding this comment.
I've done a very limited review to the best of my abilities and added some comments.
330e3c9 to
1e980b2
Compare
JackPGreen
requested changes
Mar 17, 2026
db244dc to
6b5bf9c
Compare
f08646c to
a6db272
Compare
JackPGreen
previously approved these changes
Mar 26, 2026
Add two new client property constants to control threading: - IO_THREAD_COUNT (default 3): number of IO threads for networking - RESPONSE_THREAD_COUNT (default 2): number of response processing threads Includes static string definitions, constructor initializers, and accessor methods following the existing client_properties pattern.
When no explicit executor_pool_size is set in client_config, the user_executor_ thread pool now reads its size from the RESPONSE_THREAD_COUNT client property (default: 2), matching the Java client behavior. Previously it created an unbounded pool.
Instead of storing a single io_context& and resolver& as member fields, SocketFactory now receives them as parameters to create(). This enables callers to pass different io_context instances per connection, which is needed for multi-threaded IO support.
…ignment Replace single io_context/io_thread with vectors sized by IO_THREAD_COUNT property. New connections are assigned to IO contexts via atomic round-robin index, distributing network load across multiple threads.
Required for boost::concurrent_flat_map (global invocation registry) and boost::lockfree::queue (write queue MPSC).
Property: hazelcast.client.response.thread.count Default: 2 threads Follows IO_THREAD_COUNT pattern for configuration.
…rviceImpl Uses boost::concurrent_flat_map for thread-safe invocation tracking. Registers invocations in global map before writing to connection, mirroring Java client's registerInvocation pattern.
Introduce ClientResponseHandler with configurable thread count (hazelcast.client.response.thread.count, default 2) that processes responses on dedicated threads instead of IO threads. Uses per-thread MPSC queues with condition_variable for efficient distribution by correlation_id. Mirrors Java client's ClientResponseHandlerSupplier pattern.
Route regular responses through ClientResponseHandler.enqueue() instead of processing them directly on the IO thread. Backup events and listener events remain on the IO thread as they are lightweight. This offloads response deserialization and invocation completion to dedicated response threads.
Match the socket buffer size (DEFAULT_BUFFER_SIZE_BYTE = 128KB) for the ReadHandler to reduce the number of read syscalls and improve throughput for larger responses.
Replace per-message strand posting and individual async_write calls with a lock-free MPSC queue (boost::lockfree::queue). User threads push write entries without strand involvement. A single flush on the IO strand drains all queued messages into scatter-gather buffers for one batched async_write syscall. Also moves correlation ID generation to send() using an atomic counter on Connection, ensuring the global invocation map is populated with the correct ID before writing. Key changes: - BaseSocket: lock-free write_queue_, flush_write_queue() batch drain - Connection: allocate_call_id() with atomic counter - ClientInvocationServiceImpl::send(): generate + register correlation ID before write
Co-authored-by: Jack Green <JackPGreen@Gmail.com>
Add connection liveness check in ClientInvocationServiceImpl::send() to prevent invocations from being registered on already-closed connections. Without this check, notify_connection_closed() runs before the invocation is registered, and the subsequent async_write failure cannot re-trigger close (idempotent), leaving the invocation promise unfulfilled forever. A double-check after set_send_connection closes the remaining race window where the connection closes between the initial alive check and invocation registration.
BaseSocket contains boost::lockfree::queue which requires 64-byte alignment. In C++14 builds without -faligned-new, the default operator new cannot satisfy over-aligned allocations. Add class-level operator new/delete using posix_memalign/_aligned_malloc to handle the alignment requirement portably.
This reverts commit 38d9b90.
Enable C++17 over-aligned new support in C++14 mode instead of disabling it. This fixes the compile error on GCC where boost::lockfree::queue requires 64-byte alignment and -Werror promotes the aligned-new warning to an error.
**Bug 1 – synchronous future continuation inside catch handlers** `invoke()` and `invoke_urgent()` had a fast-path when the lifecycle was no longer running that attached a future continuation without an executor. On Boost.Thread, a no-executor continuation runs synchronously on the thread that resolves the promise. During shutdown `notify_exception_with_owned_permission` resolves the promise while inside two nested catch handlers; the continuation called `f.get()`, which re-threw, while those handlers were still active. On Windows (SEH) re-throwing inside active catch handlers corrupts the exception chain, producing an ACCESS_VIOLATION. On POSIX (Itanium ABI) the same pattern is safe. Fix: remove the no-executor shortcut entirely. Both paths now always schedule the continuation on the user executor, so `f.get()` runs on a separate thread, outside any catch handler. `id_seq->complete()` is always called; this is safe for both backpressure-enabled and no-backpressure CallIdSequence implementations. **Bug 2 – listener invocations erased after initial registration response** `ClientResponseHandler::process_response` called `invocation->notify(msg)` with the default `erase=true` for every successful response, including the initial registration confirmation for listener invocations. After that erase, subsequent server-pushed events (IS_EVENT_FLAG) arrived with the same correlation ID, `get_invocation()` returned null, and `process_event_message` dereferenced the null pointer (crash at `invocation->get_event_handler()`). This is a regression introduced when response processing was moved to `ClientResponseHandler`: the previous inline path preserved the distinction between ordinary invocations (erase after response) and listener invocations (keep in map for event routing). Fix: in `process_response`, check whether the invocation carries an event handler; if it does, pass `erase=false` to `notify()` so the invocation remains in the map. Listener invocations are still removed when explicitly deregistered (`deregister_invocation`) or when their connection closes (`notify_exception` with `erase=true`). Also add a defensive null check at the top of `process_event_message` to silently drop events whose invocation has already been removed (the normal race between deregistration and an in-flight event).
…tion so that we use user executor only when available, otherwise complete using boost::launch::sync in the same calling thread.
…tions/coverage-report (hazelcast#1425)
…zelcast#1427) See (and is blocked by) hazelcast/hazelcast-tpm#71
## Summary - The `enforce-code-formatting` CI step installs the default `clang-format` package on Ubuntu 24.04, which is version 18 - I locally use clang-format 22, which produces different formatting for the same `.clang-format` config - This causes false failures on PRs (e.g., hazelcast#1412) where code is correctly formatted per v22 but differs under v18 - Fix: install `clang-format-22` from the official LLVM apt repository to match the developer version
a813986 to
a9c71b3
Compare
emreyigit
requested changes
Apr 8, 2026
Align listener event dispatch with the Java client. Previously, listener registration invocations lived in the global invocations_ map forever because erase_invocation() skipped deregistration whenever an event handler was present. Every incoming event looked up the full ClientInvocation from that global map just to reach its event handler. Now, event handlers are stored in a per-Connection boost::concurrent_flat_map<int64_t, EventHandler> keyed by correlation id. Listener invocations are removed from the global map as soon as the registration response arrives, and connection close automatically scopes event handler cleanup. This mirrors TcpClientConnection's eventHandlerMap in the Java client. Changes: - Connection: add event_handler_map_ with add/get/remove_event_handler methods. Clear the map in close() before inner_close(), matching the Java ordering. - ClientInvocationServiceImpl::send(): after register_invocation(), install the invocation's event handler on the target connection. - ClientInvocation::erase_invocation(): drop the 'if (!this->event_handler_)' guard so every invocation is deregistered on completion. - Connection::handle_client_message(): for IS_EVENT_FLAG messages, look up the handler from this connection's event_handler_map_ and forward to listener_service_impl. - listener_service_impl: replace the two old handle_client_message overloads (which looked up a ClientInvocation by correlation id) with a single overload taking the EventHandler directly. Matching process_event_message overload follows. remove_event_handler() now erases from connection->remove_event_handler() instead of the global invocations map. - ClientInvocation: drop the 'bool erase' parameter from the entire notify / complete / notify_exception / notify_exception_with_owned_permission / complete_with_pending_response / notify_response chain. With listener invocations no longer pinned in the global map, there is no caller that needs to keep an entry alive past notification. complete() and notify_exception_with_owned_permission() now unconditionally call erase_invocation(), matching ClientInvocation.complete() in Java. - ClientResponseHandler::process_response: call invocation->notify() without the erase flag. Thread safety: boost::concurrent_flat_map supports concurrent insert_or_assign / erase / cvisit / clear without external synchronization, the same pattern already used by the invocations_ map. IO threads read via cvisit, send() writes via insert_or_assign, close() clears. Edge cases: - Connection closes before registration response: clear() drops the handler; re-registration via connection_added_internal re-invokes on the new connection and re-installs the handler. - Backup event handler: dispatched via BACKUP_EVENT_FLAG through response_handler_.accept(), not through event_handler_map_, so it is unaffected. - connection_removed_internal only touches listener-level tracking (registrations_); connection close handles event handler cleanup.
emreyigit
approved these changes
Apr 10, 2026
JackPGreen
added a commit
to JackPGreen/hazelcast-cpp-client
that referenced
this pull request
Apr 10, 2026
hazelcast#1412 inadvertently reverted hazelcast#1431, which caused the [PR builder to stop working again](https://github.com/hazelcast/hazelcast-cpp-client/actions/runs/24265361516/job/70859003057#step:2:32): > gh: User does not exist or is not a public member of the organization (HTTP 404) This was extra confusing because hazelcast#1431 was itself (partially) reverting hazelcast#1427. Because `pull_request_target` this will require a manual force-merge.
ihsandemir
pushed a commit
that referenced
this pull request
Apr 13, 2026
#1412 inadvertently reverted #1431, which caused the [PR builder to stop working again](https://github.com/hazelcast/hazelcast-cpp-client/actions/runs/24265361516/job/70859003057#step:2:32): > gh: User does not exist or is not a public member of the organization (HTTP 404) This was extra confusing because #1431 was itself (partially) reverting #1427. Because `pull_request_target` this will require a manual force-merge.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
We like to improve the throughput and performance of the client following the IO thread PR #1410 . We hope to gain some improvements in throughput and latency by making the IO pipeline design similar to Java current design as possible.
ClientResponseHandlerthreads (RESPONSE_THREAD_COUNTproperty, default 2), preventing IO thread stallsboost::lockfree::queueand scatter-gatherasync_write, coalescing multiple outbound messages into single syscallsboost::concurrent_flat_mapfor O(1) correlation ID lookup, replacing per-connection mapsClientInvocation::notify()after storingpending_response_to prevent invocations from getting stuck when backup acks arrive concurrently with primary responses (mirrors JavaBaseInvocation.notifyResponsepattern)Test plan
ClientMapTest.testGetwith 1000 entries — previously hung due to backup ack race conditionHazelcastTests1-8)