Skip to content

Conversation

@yujun777
Copy link
Contributor

@yujun777 yujun777 commented Nov 19, 2025

What problem does this PR solve?

when do sample, it will use table.getRowCount() as rowsCount, but the table.getRowCount() may be stale because it depend on BE's report, then it may occur rowsCount < ndv.

Then when if 10 * rowsCount < ndv, the analyze sql will fail.

Then the regression test statistics/analyze_stats.groovy is not stable, and cause error:

Exception:
java.sql.SQLException: errCode = 2, detailMessage = Failed to analyze following columns:[id] Reasons: java.lang.RuntimeException: ColStatsData is invalid, skip analyzing. ('1763112020393--1-id',0,1763112019723,1763112020393,-1,'id',null,1,16,0,'1','201',64,'2025-11-14 17:41:14','105 :0.06 ;104 :0.06 ;103 :0.06 ;102 :0.06 ;101 :0.06 ;10 :0.06 ;9 :0.06 ;8 :0.06 ;7 :0.06 ;6 :0.06')
  at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
  at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
  at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
  at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:371)
  at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
  at org.apache.doris.regression.util.JdbcUtils$_executeToList_closure1.doCall(JdbcUtils.groovy:47)
  at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
  at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
  at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279)

so when do sample and scan whole table, we use count(1) to represent rowsCount.

Notice that this replace will not increase the excute cost, because the staticstic sql has contained count(1).

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Nov 19, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34374 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ee5b8de6a586dcbad6877d67fbe47497a8c7e16b, data reload: false

------ Round 1 ----------------------------------
q1	17615	5053	4867	4867
q2	2062	325	209	209
q3	10229	1295	718	718
q4	10225	933	373	373
q5	7529	2344	2352	2344
q6	186	174	137	137
q7	898	737	605	605
q8	9344	1360	1139	1139
q9	6982	5409	5388	5388
q10	6817	2237	1848	1848
q11	495	299	285	285
q12	345	363	241	241
q13	17787	3619	3001	3001
q14	236	229	212	212
q15	565	500	500	500
q16	1030	1025	947	947
q17	577	861	355	355
q18	7569	7192	7167	7167
q19	1098	955	547	547
q20	354	343	230	230
q21	3696	3108	2277	2277
q22	1072	1029	984	984
Total cold run time: 106711 ms
Total hot run time: 34374 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4892	4885	4907	4885
q2	308	383	327	327
q3	2150	2645	2322	2322
q4	1391	1777	1322	1322
q5	4192	4354	4721	4354
q6	229	179	140	140
q7	2038	1973	1835	1835
q8	2601	2604	2564	2564
q9	7606	7521	7555	7521
q10	3029	3381	2813	2813
q11	596	515	501	501
q12	702	774	640	640
q13	3486	4147	3239	3239
q14	291	298	279	279
q15	551	486	483	483
q16	1038	1104	1079	1079
q17	1231	1593	1393	1393
q18	7933	7580	7501	7501
q19	783	780	902	780
q20	2029	2016	1926	1926
q21	5079	4455	4327	4327
q22	1093	1039	974	974
Total cold run time: 53248 ms
Total hot run time: 51205 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188035 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ee5b8de6a586dcbad6877d67fbe47497a8c7e16b, data reload: false

query1	1037	410	389	389
query2	6552	1674	1678	1674
query3	6784	231	232	231
query4	25779	23393	22595	22595
query5	4919	679	532	532
query6	360	263	244	244
query7	4655	512	311	311
query8	319	277	259	259
query9	8694	2922	2935	2922
query10	514	361	320	320
query11	15933	15023	14778	14778
query12	202	126	148	126
query13	1683	579	438	438
query14	12465	9173	9067	9067
query15	242	190	176	176
query16	7700	667	515	515
query17	1588	764	621	621
query18	2028	428	325	325
query19	218	197	178	178
query20	129	129	123	123
query21	220	133	116	116
query22	4020	4240	4185	4185
query23	33856	33113	33178	33113
query24	7778	2360	2363	2360
query25	685	596	495	495
query26	1223	281	174	174
query27	2704	511	372	372
query28	4382	2272	2241	2241
query29	832	674	544	544
query30	302	234	203	203
query31	936	793	726	726
query32	99	87	88	87
query33	597	405	366	366
query34	790	865	524	524
query35	838	854	778	778
query36	964	994	912	912
query37	134	122	100	100
query38	3479	3620	3431	3431
query39	1533	1455	1407	1407
query40	244	138	143	138
query41	70	69	66	66
query42	131	121	120	120
query43	474	482	454	454
query44	1261	789	806	789
query45	194	189	176	176
query46	895	1008	649	649
query47	1794	1823	1757	1757
query48	416	429	339	339
query49	792	512	436	436
query50	647	688	410	410
query51	3909	3908	3843	3843
query52	119	124	114	114
query53	247	265	203	203
query54	350	340	317	317
query55	96	97	91	91
query56	368	366	357	357
query57	1194	1193	1103	1103
query58	371	298	288	288
query59	2554	2689	2570	2570
query60	383	368	351	351
query61	166	163	166	163
query62	784	708	658	658
query63	232	194	194	194
query64	4440	1184	899	899
query65	4075	3923	3977	3923
query66	1109	441	365	365
query67	15388	15136	14876	14876
query68	8477	954	628	628
query69	507	354	306	306
query70	1315	1333	1264	1264
query71	524	355	325	325
query72	6032	4948	4907	4907
query73	700	596	364	364
query74	9241	9145	8656	8656
query75	4093	3231	2796	2796
query76	3817	1135	761	761
query77	813	398	339	339
query78	9411	10209	8868	8868
query79	2670	869	627	627
query80	687	584	527	527
query81	482	252	231	231
query82	654	164	136	136
query83	287	267	247	247
query84	298	118	95	95
query85	909	486	456	456
query86	358	326	325	325
query87	3673	3703	3656	3656
query88	3264	2196	2235	2196
query89	391	332	295	295
query90	2055	240	228	228
query91	170	164	133	133
query92	87	79	76	76
query93	1170	988	687	687
query94	711	438	332	332
query95	421	336	335	335
query96	494	589	277	277
query97	2982	2924	2870	2870
query98	258	226	221	221
query99	1420	1406	1264	1264
Total cold run time: 278247 ms
Total hot run time: 188035 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.27 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ee5b8de6a586dcbad6877d67fbe47497a8c7e16b, data reload: false

query1	0.05	0.04	0.05
query2	0.08	0.04	0.04
query3	0.25	0.08	0.08
query4	1.61	0.12	0.11
query5	0.29	0.25	0.25
query6	1.20	0.65	0.63
query7	0.03	0.02	0.03
query8	0.06	0.04	0.05
query9	0.57	0.53	0.52
query10	0.57	0.57	0.57
query11	0.17	0.11	0.11
query12	0.15	0.12	0.11
query13	0.62	0.60	0.61
query14	1.00	1.00	1.00
query15	0.84	0.82	0.83
query16	0.39	0.38	0.40
query17	1.01	1.02	1.01
query18	0.22	0.20	0.21
query19	1.87	1.88	1.87
query20	0.02	0.01	0.01
query21	15.45	0.20	0.13
query22	5.01	0.07	0.05
query23	15.69	0.26	0.11
query24	2.22	1.30	0.31
query25	0.07	0.07	0.06
query26	0.14	0.13	0.13
query27	0.06	0.06	0.05
query28	3.27	1.15	0.95
query29	12.56	3.83	3.19
query30	0.29	0.14	0.12
query31	2.82	0.58	0.38
query32	3.24	0.54	0.47
query33	3.03	2.99	3.05
query34	15.86	5.10	4.54
query35	4.57	4.51	4.62
query36	0.67	0.51	0.49
query37	0.09	0.07	0.07
query38	0.07	0.04	0.04
query39	0.03	0.03	0.03
query40	0.18	0.14	0.14
query41	0.09	0.04	0.04
query42	0.04	0.03	0.03
query43	0.05	0.03	0.03
Total cold run time: 96.5 s
Total hot run time: 27.27 s

@yujun777
Copy link
Contributor Author

run feut

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@yujun777
Copy link
Contributor Author

run p0

@yujun777
Copy link
Contributor Author

run feut

Jibing-Li
Jibing-Li previously approved these changes Nov 20, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 20, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

@yujun777
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 20, 2025
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@yujun777
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34210 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit ef1ef61265574c28bb73d7d676f48e22c2f4f97c, data reload: false

------ Round 1 ----------------------------------
q1	17684	5056	4894	4894
q2	2293	323	227	227
q3	10244	1260	699	699
q4	10254	928	376	376
q5	7523	2312	2519	2312
q6	180	168	138	138
q7	910	787	634	634
q8	9484	1352	1143	1143
q9	7187	5482	5393	5393
q10	7033	2234	1830	1830
q11	574	300	277	277
q12	340	368	230	230
q13	17981	3619	2993	2993
q14	229	237	216	216
q15	588	518	508	508
q16	1010	988	933	933
q17	624	860	360	360
q18	7537	7014	7016	7014
q19	1117	940	536	536
q20	359	344	236	236
q21	3669	2506	2289	2289
q22	1066	1019	972	972
Total cold run time: 107886 ms
Total hot run time: 34210 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5106	4924	4892	4892
q2	322	405	322	322
q3	2159	2646	2298	2298
q4	1332	1755	1344	1344
q5	4231	4468	4401	4401
q6	216	189	133	133
q7	1977	2008	1820	1820
q8	2707	2682	2597	2597
q9	7496	7557	7455	7455
q10	3118	3230	2774	2774
q11	570	516	498	498
q12	665	748	576	576
q13	3451	3870	3324	3324
q14	304	297	270	270
q15	542	495	514	495
q16	1067	1201	1077	1077
q17	1141	1595	1402	1402
q18	7927	7735	7753	7735
q19	794	804	832	804
q20	2063	2114	1911	1911
q21	4917	4336	4217	4217
q22	1084	1046	989	989
Total cold run time: 53189 ms
Total hot run time: 51334 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188041 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit ef1ef61265574c28bb73d7d676f48e22c2f4f97c, data reload: false

query1	1134	409	389	389
query2	6558	1688	1655	1655
query3	6843	232	223	223
query4	26388	23256	22943	22943
query5	4436	655	504	504
query6	361	252	243	243
query7	4659	506	302	302
query8	307	264	281	264
query9	8640	2921	2930	2921
query10	525	342	308	308
query11	15731	15485	14797	14797
query12	183	129	120	120
query13	1697	578	465	465
query14	10804	9193	9065	9065
query15	195	188	170	170
query16	7345	660	528	528
query17	1263	789	638	638
query18	2002	437	342	342
query19	208	206	180	180
query20	132	131	120	120
query21	515	138	115	115
query22	4177	4236	4082	4082
query23	33888	33188	33338	33188
query24	8213	2404	2393	2393
query25	636	573	490	490
query26	1261	317	164	164
query27	2706	496	363	363
query28	4423	2226	2214	2214
query29	843	636	512	512
query30	337	219	200	200
query31	900	791	714	714
query32	89	79	85	79
query33	589	397	357	357
query34	790	854	531	531
query35	810	843	743	743
query36	971	998	913	913
query37	163	121	103	103
query38	3499	3550	3425	3425
query39	1447	1434	1396	1396
query40	228	137	124	124
query41	68	60	61	60
query42	129	116	118	116
query43	492	483	488	483
query44	1255	800	797	797
query45	186	189	170	170
query46	878	990	640	640
query47	1779	1810	1697	1697
query48	392	415	327	327
query49	777	488	420	420
query50	658	684	409	409
query51	3836	3992	3796	3796
query52	108	115	116	115
query53	237	266	203	203
query54	365	312	302	302
query55	91	97	88	88
query56	357	348	354	348
query57	1180	1185	1127	1127
query58	302	285	286	285
query59	2528	2628	2514	2514
query60	359	356	350	350
query61	163	161	155	155
query62	765	703	650	650
query63	229	192	195	192
query64	4555	1172	887	887
query65	4033	3932	3921	3921
query66	1186	426	349	349
query67	15550	15231	14950	14950
query68	8409	949	629	629
query69	491	339	302	302
query70	1392	1273	1291	1273
query71	477	345	330	330
query72	5846	4830	4796	4796
query73	697	554	369	369
query74	9245	9025	8815	8815
query75	3982	3274	2745	2745
query76	3690	1174	751	751
query77	812	421	331	331
query78	9348	9717	8869	8869
query79	2159	837	602	602
query80	664	593	525	525
query81	487	264	240	240
query82	424	158	140	140
query83	255	266	253	253
query84	249	116	100	100
query85	922	500	453	453
query86	345	297	299	297
query87	3678	3746	3621	3621
query88	3140	2227	2217	2217
query89	395	333	289	289
query90	2007	235	229	229
query91	170	164	135	135
query92	88	73	75	73
query93	1114	1010	686	686
query94	696	440	340	340
query95	429	339	339	339
query96	473	585	293	293
query97	2922	2971	2869	2869
query98	244	221	218	218
query99	1674	1412	1278	1278
Total cold run time: 275278 ms
Total hot run time: 188041 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.42 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit ef1ef61265574c28bb73d7d676f48e22c2f4f97c, data reload: false

query1	0.06	0.05	0.04
query2	0.11	0.05	0.05
query3	0.28	0.08	0.08
query4	1.61	0.11	0.10
query5	0.26	0.25	0.25
query6	1.18	0.64	0.63
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.59	0.53	0.52
query10	0.58	0.57	0.58
query11	0.18	0.11	0.11
query12	0.15	0.12	0.12
query13	0.62	0.61	0.60
query14	1.00	1.00	1.01
query15	0.85	0.83	0.84
query16	0.39	0.42	0.39
query17	1.03	0.99	1.02
query18	0.21	0.19	0.20
query19	1.92	1.80	1.78
query20	0.02	0.01	0.01
query21	15.48	0.21	0.14
query22	4.92	0.10	0.04
query23	15.64	0.25	0.10
query24	3.85	0.91	0.44
query25	0.06	0.05	0.07
query26	0.15	0.14	0.13
query27	0.06	0.05	0.06
query28	4.08	1.19	0.96
query29	12.61	3.80	3.20
query30	0.29	0.13	0.11
query31	2.82	0.58	0.39
query32	3.23	0.55	0.48
query33	2.97	3.16	3.03
query34	15.90	5.14	4.57
query35	4.53	4.58	4.60
query36	0.66	0.50	0.49
query37	0.13	0.07	0.06
query38	0.12	0.04	0.04
query39	0.05	0.02	0.02
query40	0.19	0.15	0.13
query41	0.11	0.04	0.03
query42	0.05	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 99.06 s
Total hot run time: 27.42 s

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 20, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

@Jibing-Li
Copy link
Contributor

run cloud_p0

@Jibing-Li Jibing-Li merged commit 9f5b4c6 into apache:master Nov 20, 2025
31 of 32 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 20, 2025
)

### What problem does this PR solve?

when do sample, it will use table.getRowCount() as rowsCount, but the
table.getRowCount() may be stale because it depend on BE's report, then
it may occur rowsCount < ndv.

Then when if 10 * rowsCount < ndv, the analyze sql will fail.

Then the regression test statistics/analyze_stats.groovy is not stable,
and cause error:

```
Exception:
java.sql.SQLException: errCode = 2, detailMessage = Failed to analyze following columns:[id] Reasons: java.lang.RuntimeException: ColStatsData is invalid, skip analyzing. ('1763112020393--1-id',0,1763112019723,1763112020393,-1,'id',null,1,16,0,'1','201',64,'2025-11-14 17:41:14','105 :0.06 ;104 :0.06 ;103 :0.06 ;102 :0.06 ;101 :0.06 ;10 :0.06 ;9 :0.06 ;8 :0.06 ;7 :0.06 ;6 :0.06')
  at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:129)
  at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122)
  at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:953)
  at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:371)
  at org.codehaus.groovy.vmplugin.v8.IndyInterface.fromCache(IndyInterface.java:321)
  at org.apache.doris.regression.util.JdbcUtils$_executeToList_closure1.doCall(JdbcUtils.groovy:47)
  at sun.reflect.GeneratedMethodAccessor33.invoke(Unknown Source)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:498)
  at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:343)
  at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:328)
  at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:279)
```

so when do sample and scan whole table, we use count(1) to represent
rowsCount.

Notice that this replace will not increase the excute cost, because the
staticstic sql has contained `count(1)`.
@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 100.00% (1/1) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.x dev/3.1.x-conflict dev/4.0.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants