Skip to content

[feat](show): support the BE resource usage status#64030

Open
Baymine wants to merge 1 commit into
apache:masterfrom
Baymine:feat/be-resource-usage-status
Open

[feat](show): support the BE resource usage status#64030
Baymine wants to merge 1 commit into
apache:masterfrom
Baymine:feat/be-resource-usage-status

Conversation

@Baymine
Copy link
Copy Markdown
Contributor

@Baymine Baymine commented Jun 2, 2026

Adds BE resource usage reporting (CPU and memory) to FE, with k8s-aware CPU core collection via CpuInfo::num_cores().

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
Currently there is no way to monitor BE node CPU and memory usage from FE. The SHOW BACKENDS command and backends() table
function do not expose real-time resource utilization metrics, making it difficult to assess cluster resource health and plan
capacity.

Release note

Doris now supports reporting BE resource usage (CPU usage and memory usage) to FE. Two new columns CpuUsedPct and MemUsedPct are
added to SHOW BACKENDS and the backends() table function. CPU core count is collected in a k8s-aware manner via
CpuInfo::num_cores().
image

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
      • Deploy BE and FE, verify SHOW BACKENDS output includes CpuUsedPct and MemUsedPct columns with valid percentage
        values.
      • Verify SELECT * FROM backends() returns the two new columns with correct values.
      • Verify CPU usage reflects k8s-aware core count when running in a container with CPU limits set.
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
      • SHOW BACKENDS and backends() table function now include two additional columns: CpuUsedPct (CPU usage percentage) and
        MemUsedPct (memory usage percentage).
      • BE now periodically reports resource usage to FE via a new REPORT_RESOURCE_USAGE report worker (default interval: 5
        seconds, configurable via report_resource_usage_interval_seconds).
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

Adds BE resource usage reporting (CPU and memory) to FE, with
k8s-aware CPU core collection via CpuInfo::num_cores().
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Baymine
Copy link
Copy Markdown
Contributor Author

Baymine commented Jun 2, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 25.00% (10/40) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/55) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 54.18% (21147/39031)
Line Coverage 37.74% (201045/532697)
Region Coverage 33.84% (158133/467347)
Branch Coverage 34.80% (69044/198378)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 78.69% (48/61) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.89% (28243/38223)
Line Coverage 57.88% (307534/531329)
Region Coverage 54.68% (257942/471770)
Branch Coverage 56.08% (111677/199124)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 85.00% (34/40) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29170 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4be47057207e4c41b10c1b3189a12c20a40542bd, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17674	3985	4013	3985
q2	q3	10954	1458	827	827
q4	4828	491	357	357
q5	8626	875	584	584
q6	359	175	137	137
q7	934	851	650	650
q8	10993	1649	1704	1649
q9	7019	4502	4487	4487
q10	6795	1827	1513	1513
q11	446	284	251	251
q12	650	428	303	303
q13	18154	3457	2810	2810
q14	267	265	251	251
q15	q16	815	772	704	704
q17	1013	1004	963	963
q18	6695	5782	5524	5524
q19	1206	1254	1101	1101
q20	517	400	267	267
q21	5612	2691	2507	2507
q22	417	361	300	300
Total cold run time: 103974 ms
Total hot run time: 29170 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4388	4340	4286	4286
q2	q3	4503	4928	4319	4319
q4	2084	2225	1409	1409
q5	4469	4290	4347	4290
q6	232	176	129	129
q7	2310	1851	1687	1687
q8	2559	2227	2154	2154
q9	8057	7953	7944	7944
q10	4852	4746	4568	4568
q11	573	437	379	379
q12	745	747	567	567
q13	3318	3672	2958	2958
q14	298	297	284	284
q15	q16	731	767	650	650
q17	1374	1343	1336	1336
q18	8022	7449	6925	6925
q19	1184	1110	1111	1110
q20	2233	2230	1946	1946
q21	5311	4573	4432	4432
q22	525	486	408	408
Total cold run time: 57768 ms
Total hot run time: 51781 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169410 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4be47057207e4c41b10c1b3189a12c20a40542bd, data reload: false

query5	4331	647	483	483
query6	446	198	179	179
query7	4811	538	300	300
query8	359	224	201	201
query9	8752	3997	4134	3997
query10	440	320	270	270
query11	5868	2367	2231	2231
query12	163	107	102	102
query13	1250	619	438	438
query14	6417	5419	5044	5044
query14_1	4371	4373	4411	4373
query15	203	201	181	181
query16	1036	475	437	437
query17	1132	696	586	586
query18	2664	480	333	333
query19	201	181	142	142
query20	114	110	103	103
query21	217	139	113	113
query22	13703	13544	13400	13400
query23	17377	16448	16072	16072
query23_1	16183	16299	16339	16299
query24	7469	1783	1326	1326
query24_1	1341	1323	1331	1323
query25	584	476	407	407
query26	1314	310	180	180
query27	2680	603	338	338
query28	4461	2029	1975	1975
query29	1125	627	490	490
query30	302	236	208	208
query31	1114	1092	957	957
query32	112	64	63	63
query33	526	355	292	292
query34	1210	1168	665	665
query35	757	792	689	689
query36	1417	1420	1242	1242
query37	156	110	97	97
query38	3223	3197	3071	3071
query39	945	941	916	916
query39_1	876	887	889	887
query40	228	127	108	108
query41	73	68	72	68
query42	99	103	96	96
query43	322	321	278	278
query44	
query45	198	193	181	181
query46	1078	1176	755	755
query47	2347	2377	2224	2224
query48	418	394	307	307
query49	654	487	389	389
query50	973	362	256	256
query51	4332	4290	4187	4187
query52	91	91	78	78
query53	252	272	212	212
query54	296	234	220	220
query55	87	80	72	72
query56	255	254	233	233
query57	1438	1409	1326	1326
query58	265	221	225	221
query59	1596	1695	1483	1483
query60	290	259	249	249
query61	183	204	149	149
query62	689	657	568	568
query63	241	186	181	181
query64	2550	795	631	631
query65	
query66	1743	461	340	340
query67	29916	29737	29642	29642
query68	
query69	410	311	276	276
query70	1031	941	981	941
query71	299	223	213	213
query72	2976	2705	2416	2416
query73	841	744	432	432
query74	5132	5020	4775	4775
query75	2667	2625	2242	2242
query76	2317	1154	768	768
query77	353	385	296	296
query78	12564	12428	11891	11891
query79	1405	1049	760	760
query80	1312	479	396	396
query81	532	282	241	241
query82	639	156	122	122
query83	332	272	245	245
query84	262	138	109	109
query85	898	518	437	437
query86	424	286	279	279
query87	3435	3317	3197	3197
query88	3620	2748	2728	2728
query89	439	387	330	330
query90	1890	179	183	179
query91	181	171	137	137
query92	60	64	58	58
query93	1619	1538	855	855
query94	735	325	311	311
query95	692	467	354	354
query96	1008	800	388	388
query97	2727	2686	2578	2578
query98	213	215	212	212
query99	1151	1182	1079	1079
Total cold run time: 252717 ms
Total hot run time: 169410 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants