Commit d50a794
authored
[opt](ann index) Make chunk size of index train configurable (#58645)
### What problem does this PR solve?
Previous pr: #57623
The current granularity for index training and data ingestion is set to
1M and is hard-coded, which makes index construction unnecessarily slow
in some scenarios. This should be made configurable and reduced when
appropriate.
For example, when having 1M vectors to add, and batch size of stream
load is set to 0.3M, this means we will have 3 stream load requests. If
it happens to make one request that having 0.3M to have 1 threads for
adding, whole process of load will be very slow. A typical cpu usage
will be like this:
<img width="1902" height="552" alt="image"
src="https://github.com/user-attachments/assets/65728e56-f333-4bd5-a54a-8c12d01668f1"
/>
We need to make batch size configurable so that we can modify them when
we need to do it.
For example, when we set batch size to 30K, we can have a more higher
avg cpu usage when we like this:
<img width="1890" height="554" alt="image"
src="https://github.com/user-attachments/assets/7d664b0e-b017-4a2e-bed8-e40f56ff97b7"
/>
**Default value is still 1M, small batch size will do a damage to the
recall of the hnsw.**1 parent 7ce7925 commit d50a794
File tree
4 files changed
+18
-6
lines changed- be/src
- common
- olap/rowset/segment_v2/ann_index
4 files changed
+18
-6
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1612 | 1612 | | |
1613 | 1613 | | |
1614 | 1614 | | |
| 1615 | + | |
| 1616 | + | |
| 1617 | + | |
| 1618 | + | |
| 1619 | + | |
| 1620 | + | |
1615 | 1621 | | |
1616 | 1622 | | |
1617 | 1623 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1674 | 1674 | | |
1675 | 1675 | | |
1676 | 1676 | | |
| 1677 | + | |
| 1678 | + | |
1677 | 1679 | | |
1678 | 1680 | | |
1679 | 1681 | | |
| |||
Lines changed: 6 additions & 4 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
81 | | - | |
| 81 | + | |
82 | 82 | | |
83 | 83 | | |
84 | 84 | | |
| |||
110 | 110 | | |
111 | 111 | | |
112 | 112 | | |
113 | | - | |
| 113 | + | |
114 | 114 | | |
115 | 115 | | |
116 | 116 | | |
| |||
122 | 122 | | |
123 | 123 | | |
124 | 124 | | |
125 | | - | |
126 | | - | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
127 | 129 | | |
128 | 130 | | |
129 | 131 | | |
| |||
Lines changed: 4 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
40 | 40 | | |
41 | 41 | | |
42 | 42 | | |
| 43 | + | |
43 | 44 | | |
44 | | - | |
| 45 | + | |
45 | 46 | | |
46 | | - | |
| 47 | + | |
47 | 48 | | |
| 49 | + | |
48 | 50 | | |
49 | 51 | | |
50 | 52 | | |
| |||
0 commit comments