Skip to content

Conversation

@arthurpassos
Copy link
Collaborator

@arthurpassos arthurpassos commented Jan 13, 2026

Closes #1241

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add export_merge_tree_part_throw_on_pending_mutations and export_merge_tree_part_throw_on_pending_patch_parts to control the behavior of pending mutations / patch parts on the export part feature. By default it will throw. It is important to note that when exporting an outdated it will never throw.

Also add export_merge_tree_part_allow_outdated_parts that allows outdated parts to be exported. Defaults to false.

Documentation entry for user-facing changes

Add export_merge_tree_part_throw_on_pending_mutations and export_merge_tree_part_throw_on_pending_patch_parts to control the behavior of pending mutations / patch parts on the export part feature. By default it will throw. It is important to note that when exporting an outdated it will never throw.

Also add export_merge_tree_part_allow_outdated_parts that allows outdated parts to be exported. Defaults to false.

CI/CD Options

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • All Regression
  • Disable CI Cache

Regression jobs to run:

  • Fast suites (mostly <1h)
  • Aggregate Functions (2h)
  • Alter (1.5h)
  • Benchmark (30m)
  • ClickHouse Keeper (1h)
  • Iceberg (2h)
  • LDAP (1h)
  • Parquet (1.5h)
  • RBAC (1.5h)
  • SSL Server (1h)
  • S3 (2h)
  • Tiered Storage (2h)

@arthurpassos
Copy link
Collaborator Author

One sec and I will adapt it for export partition

@github-actions
Copy link

github-actions bot commented Jan 13, 2026

Workflow [PR], commit [3ee9ce3]

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 28e124685d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


const auto parts = getDataPartsVectorInPartitionForInternalUsage(MergeTreeDataPartState::Active, partition_id, &data_parts_lock);
{
auto data_parts_lock = lockParts();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either getMutationsSnapshot or getAlterConversionsForPart grabs a lock. Putting this in a scope to avoid dead locks

@arthurpassos
Copy link
Collaborator Author

QA team has requested a report of why I think this is ready to be reviewed / merged.

I have written several tests and performed some manual tests that cover the following cases for both part and partition export:

  1. Mutations and patch parts created before the export. Skip and throw should behave as expected.
  2. Mutations and patch parts created after the export has been issued, but not processed yet. Tests were implemented with iptable rules to simulate s3 outage. Should not throw or apply patches, should simply skip.

I have also tested that IN PARTITION works for replicated engines, they don't for plain merge tree tho.

I have also manually tested that outdated parts can be exported if the setting is set to true. It does not work for partition export, and it is expected. -- Honestly, maybe this setting should not even exist.

@arthurpassos
Copy link
Collaborator Author

arthurpassos commented Jan 15, 2026

Messed up with the tests, need to fix it

Fixed

@arthurpassos arthurpassos added the port-antalya PRs to be ported to all new Antalya releases label Jan 15, 2026
@arthurpassos
Copy link
Collaborator Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d1d1f865b6

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

const auto alter_conversions = getAlterConversionsForPart(part, mutations_snapshot, query_context);

/// re-check `throw_on_pending_mutations` because `pending_mutations` might have been filled due to `throw_on_pending_patch_parts`
if (alter_conversions->hasMutations() && throw_on_pending_mutations)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: suggestion to check boolean flag at left and call method at right from &&. hasMutations is a fast method, just a check if list empty or not, but current order calls question "is hasMutations fast or not".

{
.metadata_version = source_metadata_ptr->getMetadataVersion(),
.min_part_metadata_version = part->getMetadataVersion(),
.need_data_mutations = throw_on_pending_mutations,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ClickHouse throw some exception when this flag is set? made only fast check, but looks like only some parts are skipped.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ClickHouse throw some exception when this flag is set

It should not, at least in this context. The code you are reviewing is the code that process the export query request, not the code that actually exports the data. The mutations_snapshot_params is used to create mutations_snapshot and alter_conversions. None is used for anything but to check if there are pending mutations within this function.

but looks like only some parts are skipped.

I did not understand this, can you elaborate?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unclean. why flags "need_something" are filled with values from "throw_on_some_condition".
When I see "throw_on_some_condition", I expect that setting causes exception, but here it changes other behavior.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: To throw, I need to know if it exists. To know if it exists, I need to set it to true.

Ah, I see. The need_something flags dictacte if the mutation snapshot contains the mutations / patch parts and etc. If I hard code it to false, the snapshot will never include information about pending mutations / patch parts. In this case, I will not be able to check if they exist and will not be able to throw.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that it throw exception. I mean, setting does something more than only throw exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What else does it do in this case? I don't think so.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say about naming, not about code logic :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting export_merge_tree_part_throw_on_pending_mutations change value of need_data_mutations field in mutations_snapshot_params. need_data_mutations changes behavior. So setting name is confusing, something like "export_merge_tree_part_do_not_export_on_pending_mutations" (I'm not good in naming) is more clean.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need_data_mutations changes behavior.

Not in this case. Like I said here, in this code snippet the only effect of setting need_data_mutations is that the snapshot and alter conversions object will contain information about mutations.

And if it does, we throw. Because the only reason for that object to have mutations is: 1. there are pending mutations; 2. the user set throw = true

part_name);
}

if (alter_conversions->hasPatches())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to check export_merge_tree_part_throw_on_pending_patch_parts here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No because of #1294 (comment)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add the check if you think it is more clear, but it is not needed at all

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants