Avoid 500x500 hash shuffles

cluster: https://nda.ya.ru/t/BDBskrlm7P3mCa

query: https://paste.yandex-team.ru/0c9fa072-4f68-44fb-9e75-316176fef304

# Problem

In this query, 560 early aggregation tasks and 420 final aggregation tasks are created. This results in a hash shuffle of 420 × 560 -> about 250,000 channels.

Since we have almost no control over the memory consumed by channels, we hit ooms almost immediately.

I added a filter to reduce the amount of data being processed, and the query succeeded. Here is the plan of the successful run:

![Image](https://github.com/user-attachments/assets/f4a066eb-c243-4bec-a49d-e7f85225c8ad)

query with filter:
https://paste.yandex-team.ru/3c535fd8-33f1-47e1-8c32-9e59da269d3b

!Note:
The number of tasks may we overwritten for every stage manually using this pragma:

```
#pragma ydb.OverridePlanner = @@ [
                        { "tx": 0, "stage": 1, "tasks": 100 }
                    ] @@;
```

# Proposal

When planning the number of tasks, we should consider for the number of channels that will be created. And plan the number of tasks so that we can fit all the channels into memory.



reference:
https://github.com/ydb-platform/ydb/issues/28520#issuecomment-3527358273

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Avoid 500x500 hash shuffles #30015

Problem

Proposal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Avoid 500x500 hash shuffles #30015

Description

Problem

Proposal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions