Skip to content

metricsreader: move cluster-scoped metrics lifecycle into backend clusters#1105

Open
YangKeao wants to merge 1 commit intopingcap:mainfrom
YangKeao:pr/04-multi-cluster-metrics
Open

metricsreader: move cluster-scoped metrics lifecycle into backend clusters#1105
YangKeao wants to merge 1 commit intopingcap:mainfrom
YangKeao:pr/04-multi-cluster-metrics

Conversation

@YangKeao
Copy link
Copy Markdown
Member

@YangKeao YangKeao commented Mar 19, 2026

What problem does this PR solve?

Issue Number: close #1099

What is changed and how it works:

Move backend metrics lifecycle to the cluster-scoped runtime introduced by the backend-cluster manager.

This PR changes metrics collection so that:

  • each backend cluster owns its own metrics reader
  • backend metrics election is cluster-scoped
  • metrics queries are dispatched through the manager to the right cluster readers

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Notable changes

  • Has configuration change
  • Has HTTP API interfaces change
  • Has tiproxyctl change
  • Other user behavior changes

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 19, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot
Copy link
Copy Markdown

ti-chi-bot bot commented Mar 19, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign djshow832 for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/XXL label Mar 19, 2026
@YangKeao YangKeao force-pushed the pr/04-multi-cluster-metrics branch from 7e1eaed to 5d2deeb Compare March 19, 2026 17:33
@YangKeao YangKeao marked this pull request as ready for review March 19, 2026 17:37
@ti-chi-bot ti-chi-bot bot requested a review from bb7133 March 19, 2026 17:37
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 77.35849% with 60 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@30d5b3f). Learn more about missing BASE report.

Files with missing lines Patch % Lines
pkg/manager/backendcluster/metrics_querier.go 65.51% 27 Missing and 3 partials ⚠️
pkg/manager/backendcluster/cluster.go 64.51% 10 Missing and 1 partial ⚠️
pkg/balance/metricsreader/query_result.go 80.43% 6 Missing and 3 partials ⚠️
pkg/manager/backendcluster/manager.go 55.55% 4 Missing ⚠️
pkg/balance/metricsreader/metrics_reader.go 82.35% 3 Missing ⚠️
pkg/server/server.go 75.00% 2 Missing ⚠️
pkg/balance/metricsreader/backend_reader.go 96.55% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1105   +/-   ##
=======================================
  Coverage        ?   67.28%           
=======================================
  Files           ?      145           
  Lines           ?    15263           
  Branches        ?        0           
=======================================
  Hits            ?    10269           
  Misses          ?     4297           
  Partials        ?      697           
Flag Coverage Δ
unit 67.28% <77.35%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@YangKeao
Copy link
Copy Markdown
Member Author

/test all

@YangKeao YangKeao force-pushed the pr/04-multi-cluster-metrics branch 2 times, most recently from 7bbf843 to 34cee09 Compare March 20, 2026 06:20
@YangKeao
Copy link
Copy Markdown
Member Author

/test all

@YangKeao YangKeao force-pushed the pr/04-multi-cluster-metrics branch 6 times, most recently from 80b8d60 to c81e4dc Compare March 24, 2026 13:13
@YangKeao YangKeao force-pushed the pr/04-multi-cluster-metrics branch from c81e4dc to 89d650a Compare March 24, 2026 13:52
@YangKeao YangKeao marked this pull request as ready for review March 24, 2026 13:57
@YangKeao YangKeao force-pushed the pr/04-multi-cluster-metrics branch 2 times, most recently from a8ab422 to 7add0f2 Compare March 24, 2026 14:21
@djshow832
Copy link
Copy Markdown
Collaborator

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42467ff5a9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +110 to +113
var vipEtcdCli *clientv3.Client
if cluster := srv.clusterManager.PrimaryCluster(); cluster != nil {
vipEtcdCli = cluster.EtcdClient()
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Rebind VIP election client after cluster reconfiguration

The VIP manager is wired to a single etcd client captured only once at startup, but backend clusters are reloadable and syncClusters can replace and close the old cluster runtime (including its etcd client). In a single-cluster deployment with VIP enabled, updating backend-clusters (for example pd-addrs or ns-servers) can leave VIP election running on a closed client, after which the election loop exits and VIP failover no longer works until restart.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, we don't support rebind VIP election client when the cluster comes down to 1.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, for VIP deployment, we didn't expect to reload the clusters. However, it's a good suggestion to return error when the VIP has been set and the cluster is recreated!

@YangKeao YangKeao force-pushed the pr/04-multi-cluster-metrics branch 7 times, most recently from 1b299bc to 004d1fd Compare April 2, 2026 14:39
@YangKeao YangKeao force-pushed the pr/04-multi-cluster-metrics branch from 004d1fd to 26f0ec0 Compare April 2, 2026 14:57
func readerOwnerKeyPrefix(clusterName string) string {
clusterName = strings.TrimSpace(clusterName)
if clusterName == "" || clusterName == config.DefaultBackendClusterName {
return "/tiproxy/metric_reader"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the constant definitions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Move the metrics reader into cluster-scoped struct.

3 participants