-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[spark] Support data evolution for bucket table #6649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[spark] Support data evolution for bucket table #6649
Conversation
6e05843 to
8c335f2
Compare
8c335f2 to
635b51e
Compare
| private static void validateRowTracking(TableSchema schema, CoreOptions options) { | ||
| boolean rowTrackingEnabled = options.rowTrackingEnabled(); | ||
| if (rowTrackingEnabled) { | ||
| checkArgument( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove whole checkArgument.
| ``` | ||
| Notice that: | ||
| - Row tracking is only supported for unaware append tables, not for primary key tables. Which means you can't define `bucket` and `bucket-key` for the table. | ||
| - Row tracking is only supported for unaware or hash_fixed bucket append tables, not for primary key tables. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
supported for append tables
|
And maybe you can add more tests for bucketed tables. |
02df75f to
14d57f6
Compare
@JingsongLi I've made some modifications as required and consolidated the unit tests for both bucket and non-bucket cases. PTAL |
14d57f6 to
a483524
Compare
| checkArgument( | ||
| entry.file().fileSource().isPresent(), | ||
| "This is a bug, file source field for row-tracking table must present."); | ||
| if (entry.file().fileSource().get().equals(FileSource.APPEND) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why modify here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because when performing compact in writer, the file will be rewritten by compact before it is committed and the rowId is generated. This scenario will result in the file generated by compact not having the rowid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For row-tracking only tables, compact files should not be assigned new row ids, row id already in Data File.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have a new bucketed append table mode here. Without considering sequence and compaction. Just like normal Append tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should have a new bucketed append table mode here. Without considering sequence and compaction. Just like normal Append tables.
good idea
Purpose
Support data-evolution for append only fixed hash table
Tests
API and Format
Documentation