Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 44 additions & 16 deletions docs/products/kafka/concepts/follower-fetching.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,9 @@ sidebar_label: Follower fetching
import Followerfetching from "@site/static/images/content/figma/follower-fetching.png";
import RelatedPages from "@site/src/components/RelatedPages";

[Follower fetching](/docs/products/kafka/howto/enable-follower-fetching) in Aiven for Apache Kafka allows consumers to fetch data from the nearest replica instead of the leader.
This feature optimizes data fetching by leveraging Apache Kafka's rack awareness, which treats each availability zone (AZ) as a rack.
[Follower fetching](/docs/products/kafka/howto/enable-follower-fetching) in Aiven for Apache Kafka allows consumers to retrieve data from the nearest replica instead of always fetching from the partition leader.
This feature optimizes data fetching by leveraging Apache Kafka's rack awareness, which
treats each availability zone (AZ) as a rack.

## Benefits

Expand All @@ -23,25 +24,31 @@ Google Cloud.

## How it works

Aiven for Apache Kafka uses rack awareness to optimize data fetching and ensure
high availability.
Aiven for Apache Kafka uses rack awareness to optimize data fetching and maintain
availability. Each availability zone (AZ) is treated as a rack.

<img src={Followerfetching} className="centered" alt="Follow fetching" width="100%" />
<img src={Followerfetching} className="centered" alt="Follower fetching" width="100%" />

- **Rack awareness**: Rack awareness distributes data across different physical racks,
also known as availability zones (AZs), within a data center. This distribution
ensures data replication for fault tolerance. Each Apache Kafka broker has
a `broker.rack` setting corresponding to its specific AZ:
### Rack awareness

- **AWS**: Uses AZ IDs (for example, `use1-az1`) instead of AZ names.
- **Google Cloud**: Uses AZ names directly (for example, `europe-west1-b`).
Rack awareness distributes data across different physical racks, also known as
availability zones, within a data center. This distribution ensures data replication for
fault tolerance. Each Apache Kafka broker has a `broker.rack` setting corresponding to
its specific AZ:

- **Follower fetching**: Uses rack awareness to allow consumers to fetch data from
the nearest replica, reducing latency and costs. Apache Kafka consumers use
the `client.rack` setting to specify their AZ, ensuring they fetch data from the
closest replica.
- **AWS:** Uses AZ IDs such as `use1-az1`
- **Google Cloud:** Uses AZ names such as `europe-west1-b`

### `broker.rack` and `client.rack` settings
Aiven for Apache Kafka automatically manages the `broker.rack` setting, eliminating the
need for manual configuration.

### Follower fetching mechanism

Follower fetching builds on rack awareness to allow consumers to fetch data from the
nearest replica. Apache Kafka consumers use the `client.rack` setting to specify their
AZ, ensuring they fetch data from the closest replica when possible.

### Configuration settings

- `broker.rack`: This setting corresponds to the AZ where each Apache Kafka broker
is deployed and helps manage data replication efficiently. Apache Kafka brokers in
Expand All @@ -56,6 +63,27 @@ high availability.
same AZ. [Configure](/docs/products/kafka/howto/enable-follower-fetching#client-side-configuration)
this setting to retrieve data from the closest replica.

## Follower fetching in Kafka Connect and MirrorMaker 2

Aiven for Apache Kafka® Connect and Aiven for Apache Kafka® MirrorMaker 2 apply rack
awareness when follower fetching is [enabled](/docs/products/kafka/howto/enable-follower-fetching)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it enabled by default, if so can you confirm that. I was left wondering if it was.

on the Aiven for Apache Kafka service. Each service
sets a rack value based on the node’s availability zone (AZ) so that traffic is routed
to the closest replica.

### Kafka Connect

Kafka Connect assigns a rack value based on the AZ where each node runs. All source
connectors on that node inherit this value unless a connector specifies its own
consumer rack configuration.

### MirrorMaker 2

MirrorMaker 2 assigns a rack value based on the node’s AZ when
`follower_fetching_enabled=true` in the service configuration. A custom `rack_id` in the
integration configuration overrides the AZ-based value. Rack awareness is not applied to
integrations that connect to external Kafka clusters.

<RelatedPages/>

- [Enable follower fetching in Aiven for Apache Kafka](/docs/products/kafka/howto/enable-follower-fetching)
33 changes: 31 additions & 2 deletions docs/products/kafka/howto/enable-follower-fetching.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@ Follower fetching is supported on AWS (Amazon Web Services) and Google Cloud.

## Identify availability zone

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIce to have, a one liner suggestion on how to do so/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, we use the logical name, which maps randomly to the physical AZ name per AWS account. So this is fine if same account, if using cross account this is a limitation I believe.

Before configuring client-side rack awareness, identify the AZs where your Kafka brokers
run.

- **AWS**: Availability zone (AZ) names can vary across different accounts.
The same physical location might have different AZ names in different accounts. To
ensure consistency when configuring `client.rack`, use the AZ ID, which remains the same
Expand All @@ -39,8 +42,8 @@ Follower fetching is supported on AWS (Amazon Web Services) and Google Cloud.

## Enable follower fetching

Use either of the following methods to enable follower fetching on your
Aiven for Apache Kafka service:
Use one of the following methods to enable follower fetching on your Aiven for
Apache Kafka service.

<Tabs groupId="config-methods">
<TabItem value="console" label="Console" default>
Expand Down Expand Up @@ -164,6 +167,32 @@ client.rack=europe-west1-d
| Google Cloud | `europe-west1-c` | Fetch from the nearest replica in their AZ | Reduced latency and network costs |
| Google Cloud | `europe-west1-d` | Fetch from the leader (no matching `broker.rack`) | No follower fetching possible |

## Use follower fetching with Kafka Connect and MirrorMaker 2

Aiven for Apache Kafka® Connect and Aiven for Apache Kafka® MirrorMaker 2 use follower
fetching when it is enabled on your Aiven for Kafka service.

### Kafka Connect

Kafka Connect sets `consumer.client.rack` based on each node’s availability zone.
To disable rack awareness for a specific connector, set:

```json
{
"consumer.override.client.rack": "noop"
}
```

### MirrorMaker 2

MirrorMaker 2 uses a rack ID based on the node’s availability zone when
`follower_fetching_enabled=true` (default). You can override this by setting a `rack_id`
in the integration configuration.

Rack awareness does not apply to external Kafka clusters.

See [Configure rack awareness in MirrorMaker 2](/docs/products/kafka/kafka-mirrormaker/howto/mm2-rack-awareness).

## Verify follower fetching

After configuring follower fetching, monitor for a decrease in cross-availability zone
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,94 +2,98 @@
title: Configuration parameters for Aiven for Apache Kafka® MirrorMaker 2
---

Learn about the configuration layers in Aiven for Apache Kafka® MirrorMaker 2, including service, replication flow, and integration settings.
Optimize data replication and performance in your Kafka ecosystem.
Learn about the service, replication-flow, and integration configuration layers in Aiven for Apache Kafka® MirrorMaker 2 and how they affect replication performance.

## Configuration layers

Aiven for Apache Kafka® MirrorMaker 2 configurations are organized into three layers:
**service**, **replication flow**, and **integration**. Each layer controls a specific
aspect of the replication process.
Aiven for Apache Kafka® MirrorMaker 2 uses three configuration layers:

- **Service configurations**
- **Replication-flow configurations**
- **Integration configurations**

Each layer manages a specific part of the replication process.

### Service configurations

Service configurations control the behavior of nodes and workers in the
Aiven for Apache Kafka® MirrorMaker 2 cluster.
Service configurations control the behavior of nodes and workers in the MirrorMaker 2 cluster.

**Example of a service configuration**:
**Example**

- Parameter: [`kafka_mirrormaker.emit_checkpoints_enabled`](https://aiven.io/docs/products/kafka/kafka-mirrormaker/reference/advanced-params#kafka_mirrormaker_emit_checkpoints_enabled)
- Description: Enables or disables periodically emitting consumer group offset
checkpoints to the target cluster.
- Impact:
- Automatically restarts the workers.
- Restarts all connectors and tasks.
- **Parameter:** [`kafka_mirrormaker.emit_checkpoints_enabled`](https://aiven.io/docs/products/kafka/kafka-mirrormaker/reference/advanced-params#kafka_mirrormaker_emit_checkpoints_enabled)
- **Description:** Enables or disables periodic emission of consumer group offset checkpoints to the target cluster
- **Impact:**
- Restarts workers
- Restarts all connectors and tasks

### Replication-flow configurations

Replication-flow configurations manage the behavior of connectors, such as Source, Sink,
Checkpoint, and Heartbeat.
Replication-flow configurations control the behavior of connectors such as Source, Sink, Checkpoint, and Heartbeat connectors.

**Example of a replication-flow configuration**:
**Example**

- Parameter: [`topics`](https://registry.terraform.io/providers/aiven/aiven/latest/docs/resources/mirrormaker_replication_flow)
- Description: Specifies a list of topics or regular expressions to replicate.
For more information, see the [topics included in a replication flow](/docs/products/kafka/kafka-mirrormaker/concepts/replication-flow-topics-regex).
- Impact:
- Automatically restarts the affected connectors.
- Restarts their associated tasks.
- **Parameter:** [`topics`](https://registry.terraform.io/providers/aiven/aiven/latest/docs/resources/mirrormaker_replication_flow)
- **Description:** Specifies a list of topics or regular expressions to replicate.
For details, see
[Topics included in a replication flow](/docs/products/kafka/kafka-mirrormaker/concepts/replication-flow-topics-regex).
- **Impact:**
- Restarts the affected connectors
- Restarts their tasks

### Integration configurations

Integration configurations fine-tune the interaction between producers and consumers
within connectors.
Integration configurations refine how producers and consumers behave within connectors.

**Example of an integration configuration**:
**Example**

- Parameter: [`consumer_fetch_min_bytes`](https://registry.terraform.io/providers/aiven/aiven/latest/docs/resources/service_integration#nested-schema-for-kafka_mirrormaker_user_configkafka_mirrormaker)
- Description: Sets the minimum amount of data the server should return for a fetch
request.
- Impact:
- Automatically restarts the workers.
- Restarts all connectors and tasks.
- **Parameter:** [`consumer_fetch_min_bytes`](https://registry.terraform.io/providers/aiven/aiven/latest/docs/resources/service_integration#nested-schema-for-kafka_mirrormaker_user_configkafka_mirrormaker)
- **Description:** Sets the minimum amount of data the server returns for a fetch request
- **Impact:**
- Restarts workers
- Restarts all connectors and tasks

:::note
Most configuration parameters are derived from
[KIP-382: MirrorMaker 2.0 - Configuration Properties](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650722#KIP382:MirrorMaker2.0-ConnectorConfigurationProperties). Refer to this resource for additional details.
Many configuration parameters originate from
[KIP-382: MirrorMaker 2.0 configuration properties](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95650722#KIP382:MirrorMaker2.0-ConnectorConfigurationProperties).
:::

## Common parameters

This section describes common parameters that can be adjusted to optimize the performance
and behavior of Aiven for Apache Kafka MirrorMaker 2 replication.

1. **Optimize task allocation**:
Increase the value of
[`kafka_mirrormaker.tasks_max_per_cpu`](/docs/products/kafka/kafka-mirrormaker/reference/advanced-params#kafka_mirrormaker_tasks_max_per_cpu)
in the advanced configuration.
Setting this to match the number of partitions can improve performance.

1. **Align interval settings**:
Ensure the following interval settings match to achieve more frequent and synchronized
data replication:
- **Advanced configurations**:
- [`kafka_mirrormaker.emit_checkpoints_interval_seconds`](/docs/products/kafka/kafka-mirrormaker/reference/advanced-params#kafka_mirrormaker_emit_checkpoints_interval_seconds)
- [`kafka_mirrormaker.sync_group_offsets_interval_seconds`](/docs/products/kafka/kafka-mirrormaker/reference/advanced-params#kafka_mirrormaker_sync_group_offsets_interval_seconds)
- **Replication flow**:
- `Sync interval in seconds`

1. **Exclude internal topics**:
Add these patterns to your topic blacklist to exclude internal topics:
- `.*[\-\.]internal`
- `.*\.replica`
- `__.*`
- `connect.*`

1. **Adjust integration parameters**:
Modify these integration parameters based on your use case to improve producer and
consumer performance:
- `consumer_fetch_min_bytes`
- `producer_batch_size`
- `producer_buffer_memory`
- `producer_linger_ms`
- `producer_max_request_size`
Use these commonly adjusted parameters to improve replication performance and consistency.

### Optimize task allocation

Increase the value of
[`kafka_mirrormaker.tasks_max_per_cpu`](/docs/products/kafka/kafka-mirrormaker/reference/advanced-params#kafka_mirrormaker_tasks_max_per_cpu)
in the advanced configuration.

Setting this value close to the number of partitions can improve throughput.

### Align interval settings

Align interval-based settings to keep replication activity consistent.

- **Advanced configurations:**
- [`kafka_mirrormaker.emit_checkpoints_interval_seconds`](/docs/products/kafka/kafka-mirrormaker/reference/advanced-params#kafka_mirrormaker_emit_checkpoints_interval_seconds)
- [`kafka_mirrormaker.sync_group_offsets_interval_seconds`](/docs/products/kafka/kafka-mirrormaker/reference/advanced-params#kafka_mirrormaker_sync_group_offsets_interval_seconds)
- **Replication flow:**
- Sync interval in seconds

### Exclude internal topics

Add these patterns to the topic blacklist to avoid replicating internal or system topics:

- `.*[\-\.]internal`
- `.*\.replica`
- `__.*`
- `connect.*`

### Adjust integration parameters

Tune these integration parameters based on your workload to optimize producer and consumer behavior:

- `consumer_fetch_min_bytes`
- `producer_batch_size`
- `producer_buffer_memory`
- `producer_linger_ms`
- `producer_max_request_size`
Loading
Loading