Skip to content

Conversation

@shidayang
Copy link
Contributor

Purpose

Support data-evolution for append only fixed hash table

Tests

API and Format

Documentation

@shidayang shidayang force-pushed the support-data-evolution-for-bucket branch from 6e05843 to 8c335f2 Compare November 21, 2025 10:02
@leaves12138
Copy link
Contributor

@JingsongLi

@shidayang shidayang changed the title [#6794711734] Support data evolution for bucket table [spark] Support data evolution for bucket table Nov 21, 2025
@shidayang shidayang force-pushed the support-data-evolution-for-bucket branch from 8c335f2 to 635b51e Compare November 21, 2025 11:06
private static void validateRowTracking(TableSchema schema, CoreOptions options) {
boolean rowTrackingEnabled = options.rowTrackingEnabled();
if (rowTrackingEnabled) {
checkArgument(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove whole checkArgument.

```
Notice that:
- Row tracking is only supported for unaware append tables, not for primary key tables. Which means you can't define `bucket` and `bucket-key` for the table.
- Row tracking is only supported for unaware or hash_fixed bucket append tables, not for primary key tables.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supported for append tables

@JingsongLi
Copy link
Contributor

And maybe you can add more tests for bucketed tables.

@shidayang shidayang force-pushed the support-data-evolution-for-bucket branch 2 times, most recently from 02df75f to 14d57f6 Compare November 27, 2025 04:41
@shidayang
Copy link
Contributor Author

And maybe you can add more tests for bucketed tables.

@JingsongLi I've made some modifications as required and consolidated the unit tests for both bucket and non-bucket cases. PTAL

@shidayang shidayang force-pushed the support-data-evolution-for-bucket branch from 14d57f6 to a483524 Compare November 27, 2025 07:30
checkArgument(
entry.file().fileSource().isPresent(),
"This is a bug, file source field for row-tracking table must present.");
if (entry.file().fileSource().get().equals(FileSource.APPEND)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why modify here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because when performing compact in writer, the file will be rewritten by compact before it is committed and the rowId is generated. This scenario will result in the file generated by compact not having the rowid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For row-tracking only tables, compact files should not be assigned new row ids, row id already in Data File.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a new bucketed append table mode here. Without considering sequence and compaction. Just like normal Append tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should have a new bucketed append table mode here. Without considering sequence and compaction. Just like normal Append tables.

good idea

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants