HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocations(startKey, limit) API for bulk meta-cache warmup by sanjeet006py · Pull Request #8237 · apache/hbase

sanjeet006py · 2026-05-14T16:52:15Z

Generated-by: Claude Opus 4.7

…s(startKey, limit) API for bulk meta-cache warmup

… and add default-throws Review comments on apache#8236 by haridsv: - Adding an abstract method to RegionLocator / AsyncTableRegionLocator breaks external implementers. Convert the new method to a default that throws UnsupportedOperationException, with javadoc instructing callers to fall back to getAllRegionLocations(). - Reusing the getRegionLocations name overloads a method with completely different semantic (row -> all replicas of containing region). Rename to getRegionLocationsPage(byte[] startKey, int limit) to make the pagination/range intent explicit. - SnapshotRegionLocator no longer needs an override; it inherits the default-throws and callers fall back to getAllRegionLocations(). Tests renamed to call getRegionLocationsPage; semantic and assertions unchanged.

Address point apache#4 of the canvas/code review: the paged meta scan was making ceil(limit / hbase.meta.scanner.caching) ScannerNext RPCs whenever the limit exceeded the configured caching, contradicting the "at most one RPC per invocation" javadoc on RegionLocator.getRegionLocationsPage / AsyncTableRegionLocator.getRegionLocationsPage. Plumb a private isPagedScan flag through ClientMetaTableAccessor.scanMeta and getMetaScan. When true, size both setLimit and setCaching to the caller-supplied rowUpperLimit so the slice returns in a single ScannerNext RPC regardless of the configured caching default. The unbounded scanMeta overload (used by getTableHRegionLocations(metaTable, tableName) for getAllRegionLocations) keeps its existing behavior. Add TestRegionLocatorPagedScanRpcCount: starts a mini-cluster with hbase.meta.scanner.caching=2, wraps the meta AsyncTable to count AdvancedScanResultConsumer.onNext invocations, and asserts: - limit <= caching -> 1 ScannerNext (baseline) - limit > caching -> 1 ScannerNext (the regression check) - unbounded scan -> ceil(N / caching) ScannerNext (proves the flag does not leak into the existing path) Verified the test catches the regression: reverting the getMetaScan change makes testSingleRpcWhenLimitExceedsCaching fail with onNext=3.

…scans Mirrors the master-PR test (TestRegionLocatorPagedScanRpcCount on apache#8237). On the sync path we proxy: Connection.getTable(META_TABLE_NAME) -> Table.getScanner(Scan) -> ResultScanner.close() so we can flip Scan.setScanMetricsEnabled(true) on the way in and read ScanMetrics.countOfRPCcalls before the underlying scanner is closed. The RPC count is the natural sync analogue of the AdvancedScanResultConsumer.onNext invocation count we used on master. Cluster runs with hbase.meta.scanner.caching = 2 against a table of 5 user regions, asserting: - limit <= caching: 1 ScannerNext RPC (baseline) - limit > caching: 1 ScannerNext RPC (regression check; without the isPagedScan fix this would be ceil(5/2) = 3) - unbounded scan: ceil(NUM_REGIONS / caching) = 3 ScannerNext RPCs (proves the isPagedScan flag does not leak into the non-paged callers) Verified the test catches the regression: reverting the isPagedScan branch in MetaTableAccessor.getMetaScan makes testSingleRpcWhenLimitExceedsCaching fail with countOfRPCcalls = 3.

Umeshkumar9414 · 2026-05-15T13:59:14Z

+   * Ordering: regions are returned in ascending region start-key order (the natural order of
+   * {@code hbase:meta} rows for a single table). Within each region, replicas are returned in
+   * ascending replica-id order (replica 0, then 1, then 2, ...). Split parents are filtered out,
+   * which may cause a page to contain fewer than {@code limit} regions but never disturbs ordering


what about merged regions ?

Thanks for flagging — let me share my understanding and please correct me if I've got this wrong.

I think merge parents don't need filtering here because of how splits vs merges record cleanup state in hbase:meta:

Split: the parent row stays around with setOffline(true).setSplit(true) (see RegionStateStore#splitRegion) and carries info:splitA/info:splitB pointers. The catalog janitor seems to use that row to track whether daughters still hold HFile references, and only deletes the parent row + HDFS dir once references are gone. So during that window, meta scans do see the parent — which is why excludeOfflinedSplitParents exists.

Merge: from what I can see in RegionStateStore#mergeRegions, the same multi-mutate that commits the merge deletes every parent row and writes a single child row with info:merge_0000, info:merge_0001, … pointing back at the parents. So the "waiting on HFile cleanup" bookkeeping lives on the child row, not on the parents.

If that's right, a meta scan should never see merge parents, and the merged child row is itself the live region for that key range — which is what we'd want to return. This also seems to match getAllRegionLocations()'s existing behavior; both share CollectRegionLocationsVisitor(excludeOfflinedSplitParents=true).

Does this line up with your understanding? Happy to add a comment near the visitor call site noting this if it'd help future readers, or revisit the filter if I'm missing a case.

Umeshkumar9414 · 2026-05-15T14:10:08Z

+      addListener(future, (locs, error) -> {
+        if (error != null || locs == null) {
+          return;
+        }


what if there is error ?

Good catch — let me explain the intent and please push back if it's not the right call.

The listener at AsyncTableRegionLocatorImpl.java:104 is a side-effect-only hook for cache warm-up. The future returned to the caller is the one from ClientMetaTableAccessor.getTableHRegionLocations, which already completes exceptionally on any meta-scan error (see getTableRegionsAndLocations's future.completeExceptionally(error) paths). So the error propagates to the caller through the returned future regardless of what the listener does.

On the listener side, if error != null there's simply nothing valid to cache, and if locs == null there's nothing to iterate. Either way, returning early is the safe behavior — we don't want to mask the error or double-fail the future, just skip cache population for this call. The next call to getRegionLocationsPage (or any locator path) will populate normally.

That said, I can add a LOG.debug (or LOG.warn) in the error branch so the silent skip is observable, e.g.:

if (error != null) { LOG.debug("Skipping cache warm-up for {}: {}", tableName, error.toString()); return; } if (locs == null) { return; }

Happy to add that if you think it's worth it.

…nc-master Resolves conflict in AbstractTestRegionLocator.java: takes upstream JUnit 5 import direction (HBASE-30082) and adds assertNotNull/ assertThrows/assertTrue for the paged-locator tests; flips message arg position to JUnit 5 ordering at four call sites.

Sanjeet Malhotra added 3 commits May 14, 2026 22:21

HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocation…

769afcf

…s(startKey, limit) API for bulk meta-cache warmup

Umeshkumar9414 reviewed May 15, 2026

View reviewed changes

sanjeet006py requested a review from Umeshkumar9414 May 15, 2026 17:20

Sanjeet Malhotra added 2 commits May 15, 2026 23:03

HBASE-30161 Apply spotless formatting

3ef97b7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocations(startKey, limit) API for bulk meta-cache warmup#8237

HBASE-30161 Add paginated, single-RPC RegionLocator.getRegionLocations(startKey, limit) API for bulk meta-cache warmup#8237
sanjeet006py wants to merge 5 commits into
apache:masterfrom
sanjeet006py:HBASE-30161-async-master

sanjeet006py commented May 14, 2026

Uh oh!

Umeshkumar9414 May 15, 2026

Uh oh!

sanjeet006py May 15, 2026

Uh oh!

Umeshkumar9414 May 15, 2026

Uh oh!

sanjeet006py May 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanjeet006py commented May 14, 2026

Uh oh!

Umeshkumar9414 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

sanjeet006py May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Umeshkumar9414 May 15, 2026

Choose a reason for hiding this comment

Uh oh!

sanjeet006py May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants