Skip to content

Conversation

@deardeng
Copy link
Contributor

@deardeng deardeng commented Nov 18, 2025

…er versions.

What problem does this PR solve?

Prior to this #42986 PR, the getLastUpdateMs field of the backend was not written to the fe image. This caused the tablet mapping on the cloud to lose the PrimaryBackend information after the upgrade, resulting in the regeneration of an incorrect PrimaryBackend and uneven tablet distribution.

detail:
image

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Nov 18, 2025

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@deardeng
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34392 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 114729de3f41c3e3dc6f052f29d9b89fb6725f6f, data reload: false

------ Round 1 ----------------------------------
q1	17597	4975	4884	4884
q2	2010	329	199	199
q3	10260	1258	704	704
q4	10427	896	356	356
q5	7614	2535	2327	2327
q6	194	179	141	141
q7	943	783	634	634
q8	9360	1421	1094	1094
q9	6803	4984	5076	4984
q10	6840	2237	1771	1771
q11	516	312	281	281
q12	339	364	231	231
q13	17799	3669	3036	3036
q14	228	234	206	206
q15	578	508	511	508
q16	1019	999	935	935
q17	562	870	359	359
q18	7321	7432	7768	7432
q19	1513	1006	553	553
q20	340	326	241	241
q21	4132	3545	2490	2490
q22	1145	1146	1026	1026
Total cold run time: 107540 ms
Total hot run time: 34392 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5230	5215	5212	5212
q2	238	319	245	245
q3	2399	2951	2496	2496
q4	1475	1930	1474	1474
q5	4539	4472	4420	4420
q6	210	167	123	123
q7	1937	1949	1823	1823
q8	2865	2519	2433	2433
q9	7270	7253	6763	6763
q10	2931	3135	2648	2648
q11	579	510	487	487
q12	642	718	562	562
q13	3228	3642	3060	3060
q14	263	287	261	261
q15	524	497	502	497
q16	1010	1048	996	996
q17	1111	1528	1308	1308
q18	7207	7082	7130	7082
q19	747	705	734	705
q20	1923	1956	1786	1786
q21	4731	4325	4271	4271
q22	1117	1054	999	999
Total cold run time: 52176 ms
Total hot run time: 49651 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 188756 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 114729de3f41c3e3dc6f052f29d9b89fb6725f6f, data reload: false

query1	1013	400	415	400
query2	6565	1674	1726	1674
query3	6758	227	224	224
query4	26281	23217	23025	23025
query5	4435	646	514	514
query6	337	249	228	228
query7	4652	508	307	307
query8	293	244	254	244
query9	8672	2961	2934	2934
query10	548	361	313	313
query11	15661	14982	14803	14803
query12	184	130	118	118
query13	1691	610	473	473
query14	11016	9411	9372	9372
query15	215	188	170	170
query16	7547	691	512	512
query17	1264	798	651	651
query18	2030	442	345	345
query19	225	211	185	185
query20	138	123	124	123
query21	220	133	114	114
query22	4009	4076	3964	3964
query23	33608	32984	33003	32984
query24	8186	2425	2465	2425
query25	612	536	456	456
query26	1231	271	170	170
query27	2751	501	357	357
query28	4411	2214	2200	2200
query29	824	628	518	518
query30	304	234	201	201
query31	895	798	730	730
query32	95	86	84	84
query33	590	398	354	354
query34	787	857	552	552
query35	813	837	755	755
query36	949	1025	909	909
query37	127	121	94	94
query38	3462	3578	3453	3453
query39	1457	1458	1430	1430
query40	228	137	127	127
query41	66	59	60	59
query42	133	122	116	116
query43	480	468	450	450
query44	1251	801	797	797
query45	186	184	170	170
query46	886	994	674	674
query47	1759	1782	1725	1725
query48	400	424	332	332
query49	767	498	419	419
query50	656	678	424	424
query51	3969	3898	3882	3882
query52	121	118	109	109
query53	245	274	207	207
query54	348	319	325	319
query55	97	96	90	90
query56	352	337	359	337
query57	1171	1204	1120	1120
query58	307	298	285	285
query59	2480	2643	2449	2449
query60	399	385	352	352
query61	165	158	155	155
query62	803	704	681	681
query63	241	207	201	201
query64	4523	1162	894	894
query65	4035	3956	3960	3956
query66	1154	449	349	349
query67	15152	15086	14828	14828
query68	5984	949	649	649
query69	518	335	298	298
query70	1345	1376	1328	1328
query71	445	353	329	329
query72	5830	4943	4987	4943
query73	658	629	371	371
query74	8866	9110	8980	8980
query75	3268	3252	2802	2802
query76	3359	1130	735	735
query77	533	424	358	358
query78	9492	9909	8909	8909
query79	1309	852	617	617
query80	738	603	530	530
query81	478	260	229	229
query82	250	164	132	132
query83	268	266	258	258
query84	253	110	95	95
query85	883	495	451	451
query86	324	311	314	311
query87	3745	3737	3550	3550
query88	2784	2275	2256	2256
query89	391	333	303	303
query90	1766	226	227	226
query91	182	158	145	145
query92	77	81	72	72
query93	1129	991	686	686
query94	650	449	346	346
query95	442	343	326	326
query96	475	601	283	283
query97	2916	2965	2888	2888
query98	251	224	214	214
query99	1282	1378	1264	1264
Total cold run time: 268121 ms
Total hot run time: 188756 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 27.61 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 114729de3f41c3e3dc6f052f29d9b89fb6725f6f, data reload: false

query1	0.05	0.05	0.05
query2	0.09	0.05	0.05
query3	0.25	0.08	0.08
query4	1.61	0.11	0.10
query5	0.28	0.25	0.24
query6	1.16	0.64	0.66
query7	0.03	0.03	0.03
query8	0.06	0.04	0.05
query9	0.57	0.53	0.51
query10	0.58	0.57	0.58
query11	0.16	0.11	0.11
query12	0.16	0.12	0.12
query13	0.61	0.61	0.61
query14	1.00	0.99	0.99
query15	0.86	0.83	0.83
query16	0.39	0.39	0.38
query17	0.99	1.03	1.02
query18	0.22	0.20	0.20
query19	1.92	1.86	1.85
query20	0.01	0.01	0.01
query21	15.44	0.20	0.13
query22	5.04	0.06	0.04
query23	15.68	0.26	0.11
query24	2.78	0.53	0.68
query25	0.10	0.06	0.06
query26	0.15	0.12	0.12
query27	0.06	0.05	0.06
query28	4.89	1.13	0.95
query29	12.75	3.92	3.20
query30	0.28	0.13	0.11
query31	2.81	0.59	0.38
query32	3.22	0.55	0.46
query33	2.97	3.12	3.12
query34	15.74	5.15	4.50
query35	4.57	4.58	4.60
query36	0.69	0.50	0.49
query37	0.10	0.07	0.06
query38	0.06	0.04	0.03
query39	0.04	0.03	0.03
query40	0.16	0.15	0.14
query41	0.09	0.03	0.04
query42	0.04	0.03	0.03
query43	0.05	0.03	0.04
Total cold run time: 98.71 s
Total hot run time: 27.61 s

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 0.00% (0/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Contributor

FE Regression Coverage Report

Increment line coverage 0.00% (0/2) 🎉
Increment coverage report
Complete coverage report

@deardeng
Copy link
Contributor Author

run cloud_p0

@deardeng
Copy link
Contributor Author

run external

@deardeng
Copy link
Contributor Author

run p0

@gavinchou gavinchou changed the title [fix](cloud) Fixed uneven tablet performance during upgrades from old… [fix](cloud) Fixed uneven tablet performance during upgrades from older versions Nov 20, 2025
@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 20, 2025
@github-actions
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gavinchou gavinchou merged commit 636657a into apache:master Nov 21, 2025
31 of 33 checks passed
github-actions bot pushed a commit that referenced this pull request Nov 21, 2025
…er versions (#58135)

Prior to this #42986 PR, the
`getLastUpdateMs` field of the backend was not written to the fe image.
This caused the tablet mapping on the cloud to lose the PrimaryBackend
information after the upgrade, resulting in the regeneration of an
incorrect PrimaryBackend and uneven tablet distribution.

detail:
<img width="930" height="859" alt="image"
src="https://github.com/user-attachments/assets/23d62adc-0bbb-49cb-b5b0-d1768f6422c7"
/>
github-actions bot pushed a commit that referenced this pull request Nov 21, 2025
…er versions (#58135)

Prior to this #42986 PR, the
`getLastUpdateMs` field of the backend was not written to the fe image.
This caused the tablet mapping on the cloud to lose the PrimaryBackend
information after the upgrade, resulting in the regeneration of an
incorrect PrimaryBackend and uneven tablet distribution.

detail:
<img width="930" height="859" alt="image"
src="https://github.com/user-attachments/assets/23d62adc-0bbb-49cb-b5b0-d1768f6422c7"
/>
morrySnow pushed a commit that referenced this pull request Nov 25, 2025
…des from older versions #58135 (#58247)

Cherry-picked from #58135

Co-authored-by: deardeng <[email protected]>
yiguolei pushed a commit that referenced this pull request Nov 26, 2025
…er versions (#58135)

Prior to this #42986 PR, the
`getLastUpdateMs` field of the backend was not written to the fe image.
This caused the tablet mapping on the cloud to lose the PrimaryBackend
information after the upgrade, resulting in the regeneration of an
incorrect PrimaryBackend and uneven tablet distribution.

detail:
<img width="930" height="859" alt="image"
src="https://github.com/user-attachments/assets/23d62adc-0bbb-49cb-b5b0-d1768f6422c7"
/>
yiguolei pushed a commit that referenced this pull request Nov 27, 2025
…des from older versions #58135 (#58249)

Cherry-picked from #58135

Co-authored-by: deardeng <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.4-merged dev/4.0.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants