Skip to content

Remove iceberg.connect.group.id config and derive it from consumer group ID#15234

Open
kumarpritam863 wants to merge 19 commits intoapache:mainfrom
kumarpritam863:take_connect_id_from_consumer_group_id
Open

Remove iceberg.connect.group.id config and derive it from consumer group ID#15234
kumarpritam863 wants to merge 19 commits intoapache:mainfrom
kumarpritam863:take_connect_id_from_consumer_group_id

Conversation

@kumarpritam863
Copy link
Contributor

Problem

The current implementation requires users to set a separate iceberg.connect.group.id configuration that must match the source consumer group ID (consumer.override.group.id / group.id) for the Kafka Connect coordinator to be elected correctly.

This has caused significant confusion and bugs among developers and users since the beginning, including:

  • Developers forgetting to set connect.group.id or setting it incorrectly
  • Misunderstanding that the two values are actually required to be identical
  • Subtle and hard-to-debug coordination issues

Problematic scenario example:

  1. Job A is running with consumer.override.group.id = "x-1"
  2. Job B is submitted with:
    • consumer.override.group.id = "cg"
    • connect.group.id = "x-1"
  3. Both jobs consume from the same topic

Result:
Even though the actual consumer group IDs are different (cg vs x-1) for job B, the coordinator election still happens based on connect.group.id = "x-1". This leads to:

  • Wrong group being used for coordination
  • Incorrect offset commits
  • Potential data loss/duplication or reprocessing
  • Very confusing behavior that violates the principle of least surprise

Solution

This PR:

  • Removes the connect.group.id configuration completely
  • Always derives the Connect coordinator group ID from the actual source consumer group ID (group.id / consumer.override.group.id)
  • Renames the internal concept/reference from connectGroupIdsourceConsumerGroupId for clarity (in code/comments where applicable)
  • Updates documentation and configuration validation accordingly

Benefits

  • Eliminates a entire class of misconfiguration bugs
  • Removes a redundant and confusing configuration option
  • Makes behavior more predictable and intuitive
  • Prevents the problematic scenario described above
  • Reduces cognitive load for users and maintainers

Breaking Change

No – Even if "iceberg.connect.group.id" is set, it will be ignored and the correct value will be derived from the consumer group id.

@kumarpritam863
Copy link
Contributor Author

@bryanck can you please take a look. I think this change is needed. Please let me know you thoughts on this.

@kumarpritam863
Copy link
Contributor Author

@danielcweeks @ajantha-bhat can we please review this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant