-
Notifications
You must be signed in to change notification settings - Fork 12
export part - skip or throw on pending mutations and patch parts #1294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: antalya-25.8
Are you sure you want to change the base?
export part - skip or throw on pending mutations and patch parts #1294
Conversation
|
One sec and I will adapt it for export partition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 28e124685d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| const auto parts = getDataPartsVectorInPartitionForInternalUsage(MergeTreeDataPartState::Active, partition_id, &data_parts_lock); | ||
| { | ||
| auto data_parts_lock = lockParts(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either getMutationsSnapshot or getAlterConversionsForPart grabs a lock. Putting this in a scope to avoid dead locks
|
QA team has requested a report of why I think this is ready to be reviewed / merged. I have written several tests and performed some manual tests that cover the following cases for both part and partition export:
I have also tested that I have also manually tested that outdated parts can be exported if the setting is set to true. It does not work for partition export, and it is expected. -- Honestly, maybe this setting should not even exist. |
|
Messed up with the tests, need to fix it Fixed |
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d1d1f865b6
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
docs/en/engines/table-engines/mergetree-family/partition_export.md
Outdated
Show resolved
Hide resolved
| const auto alter_conversions = getAlterConversionsForPart(part, mutations_snapshot, query_context); | ||
|
|
||
| /// re-check `throw_on_pending_mutations` because `pending_mutations` might have been filled due to `throw_on_pending_patch_parts` | ||
| if (alter_conversions->hasMutations() && throw_on_pending_mutations) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: suggestion to check boolean flag at left and call method at right from &&. hasMutations is a fast method, just a check if list empty or not, but current order calls question "is hasMutations fast or not".
| { | ||
| .metadata_version = source_metadata_ptr->getMetadataVersion(), | ||
| .min_part_metadata_version = part->getMetadataVersion(), | ||
| .need_data_mutations = throw_on_pending_mutations, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ClickHouse throw some exception when this flag is set? made only fast check, but looks like only some parts are skipped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does ClickHouse throw some exception when this flag is set
It should not, at least in this context. The code you are reviewing is the code that process the export query request, not the code that actually exports the data. The mutations_snapshot_params is used to create mutations_snapshot and alter_conversions. None is used for anything but to check if there are pending mutations within this function.
but looks like only some parts are skipped.
I did not understand this, can you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unclean. why flags "need_something" are filled with values from "throw_on_some_condition".
When I see "throw_on_some_condition", I expect that setting causes exception, but here it changes other behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: To throw, I need to know if it exists. To know if it exists, I need to set it to true.
Ah, I see. The need_something flags dictacte if the mutation snapshot contains the mutations / patch parts and etc. If I hard code it to false, the snapshot will never include information about pending mutations / patch parts. In this case, I will not be able to check if they exist and will not be able to throw.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that setting causes exception, but here it changes other behavior.
It does throw, look at https://github.com/Altinity/ClickHouse/pull/1294/changes#diff-b7d33a7abda5f3365d5e4fb58ca98b756f6401372b2e2d5eb449a2bd3444fb5eR6286 and https://github.com/Altinity/ClickHouse/pull/1294/changes#diff-b7d33a7abda5f3365d5e4fb58ca98b756f6401372b2e2d5eb449a2bd3444fb5eR6293
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that it throw exception. I mean, setting does something more than only throw exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What else does it do in this case? I don't think so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say about naming, not about code logic :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting export_merge_tree_part_throw_on_pending_mutations change value of need_data_mutations field in mutations_snapshot_params. need_data_mutations changes behavior. So setting name is confusing, something like "export_merge_tree_part_do_not_export_on_pending_mutations" (I'm not good in naming) is more clean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need_data_mutations changes behavior.
Not in this case. Like I said here, in this code snippet the only effect of setting need_data_mutations is that the snapshot and alter conversions object will contain information about mutations.
And if it does, we throw. Because the only reason for that object to have mutations is: 1. there are pending mutations; 2. the user set throw = true
| part_name); | ||
| } | ||
|
|
||
| if (alter_conversions->hasPatches()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to check export_merge_tree_part_throw_on_pending_patch_parts here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No because of #1294 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add the check if you think it is more clear, but it is not needed at all
Closes #1241
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Add
export_merge_tree_part_throw_on_pending_mutationsandexport_merge_tree_part_throw_on_pending_patch_partsto control the behavior of pending mutations / patch parts on the export part feature. By default it will throw. It is important to note that when exporting an outdated it will never throw.Also add
export_merge_tree_part_allow_outdated_partsthat allows outdated parts to be exported. Defaults to false.Documentation entry for user-facing changes
Add
export_merge_tree_part_throw_on_pending_mutationsandexport_merge_tree_part_throw_on_pending_patch_partsto control the behavior of pending mutations / patch parts on the export part feature. By default it will throw. It is important to note that when exporting an outdated it will never throw.Also add
export_merge_tree_part_allow_outdated_partsthat allows outdated parts to be exported. Defaults to false.CI/CD Options
Exclude tests:
Regression jobs to run: