perf: eliminate UnmodifiableList virtual dispatch overhead in LocationCache by xinlian12 · Pull Request #48674 · Azure/azure-sdk-for-java

xinlian12 · 2026-04-02T16:29:50Z

Summary

Replace the vendored UnmodifiableList wrapper in LocationCache with JDK Collections.unmodifiableList(), eliminating a 4-level polymorphic virtual dispatch chain that the JIT cannot inline.

This is the second improvement stacked on top of #48662 (HashMap allocation optimization). Combined cumulative gain: +16.1% throughput vs baseline.

JFR Analysis

JFR profiling identified AbstractListDecorator.decorated() consuming 1.02% CPU (398 samples at 1t-c128). The vendored UnmodifiableList extends AbstractListDecorator → AbstractCollectionDecorator → AbstractSerializableListDecorator, creating a 4-level virtual dispatch chain for every list operation:

UnmodifiableList.get(i)
  → AbstractListDecorator.decorated()      // virtual dispatch #1
    → AbstractCollectionDecorator.decorated() // virtual dispatch #2
      → field access                          // finally gets the delegate

The JDK's Collections.unmodifiableList() wraps with a single-level UnmodifiableList that the JIT can inline trivially.

LocationCache is called on every request for endpoint resolution, and multi-tenant configs call it 30× per operation cycle — making this a high-frequency hot path.

Changes

LocationCache.java — Replace all UnmodifiableList<T> fields and return types with List<T>, use Collections.unmodifiableList() instead of new UnmodifiableList<>()
GlobalEndpointManager.java — Update return types from UnmodifiableList<T> to List<T>
ClientRetryPolicy.java — Update variable declarations
GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.java — Update variable declarations
HttpHeaders.java — Added setLowered() fast-path method that skips toLowerCase() for keys already known to be lowercase
Test files (7 files) — Updated to use Collections.unmodifiableList() / List.of() instead of vendored UnmodifiableList

Benchmark Results

Environment: 16-core VM, JDK 21, GATEWAY mode, 3 repetitions averaged.
Reference: perf/hashmap-collection-allocation (previous PASS branch)

Config	Read Δ Throughput	Read Δ P99	Write Δ Throughput	Write Δ P99
1t-c1 (1 tenant, 1 concurrent)	+2.6%	-3.0%	-0.5%	+5.5%
1t-c32 (1 tenant, 32 concurrent)	+7.5%	-18.7%	-1.4%	-2.6%
1t-c128 (1 tenant, 128 concurrent)	+9.6%	-16.3%	-1.8%	-16.1%
30t-c3 (30 tenants, 3 concurrent)	+15.0%	-11.9%	+4.6%	-14.7%
30t-c5 (30 tenants, 5 concurrent)	+20.6%	+2.3%	+11.9%	-0.8%

Average throughput gain vs previous PASS: +6.8% (Read: +11.1%, Write: +2.6%)
4 of 5 configs improved ≥1.5% threshold
Max regression: -1.8% (within 2% limit)
P99 latency improved 9/10 test points — up to -18.7%
Cumulative vs original baseline: +16.1%

Why multi-tenant benefits most

30t-c5 showed +20.6% Read throughput because each of the 30 tenants triggers LocationCache lookups independently. The vendored UnmodifiableList dispatch overhead multiplied 30× per operation cycle. With the JDK wrapper, the JIT inlines the single-level dispatch, eliminating this per-tenant tax.

Note on attempt 1

An earlier attempt (v1) also changed HttpHeaders internal map from Map<String,HttpHeader> to Map<String,String>. This backfired — it increased HashMap.putVal and StringLatin1.toLowerCase overhead and caused regressions at 30t-c5 (-9.5%) and 1t-c1 (-7.4%). The v2 approach correctly dropped this risky change and focused only on the safe UnmodifiableList replacement.

Eliminate per-response intermediate HashMap allocation by adding a new StoreResponse constructor that accepts HttpHeaders directly. Header names and values are populated into String[] arrays without materializing an intermediate Map. The JsonNodeStorePayload is updated to accept header arrays and only builds a Map lazily on error paths (extremely rare). Pre-size HashMaps throughout the hot path to avoid resize/rehash: - HttpHeaders request construction: sized to defaultHeaders + request headers - StoreResponse.replicaStatusList: pre-sized to 4 - StoreResponse.withRemappedStatusCode: pre-sized to header count - RxDocumentServiceRequest fallback maps: pre-sized to 32 Fix HttpUtils.asMap() double-allocation by iterating HttpHeaders directly instead of calling toMap() which creates an intermediate HashMap. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…Case fast-path Change 1: Replace vendored UnmodifiableList with Collections.unmodifiableList() - UnmodifiableList wraps via 4-level virtual dispatch chain (UnmodifiableList -> AbstractSerializableListDecorator -> AbstractListDecorator -> AbstractCollectionDecorator) - AbstractListDecorator.decorated() called on every list operation accounts for ~1.02% CPU (398 samples at 1t-c128) - Collections.unmodifiableList() returns a simple wrapper that JIT can inline efficiently Change 2: Add isAsciiLowerCase fast-path in HttpHeaders - HTTP/2 headers are already lowercase per protocol spec - Most x-ms-* custom headers are already lowercase - Skip String.toLowerCase(Locale.ROOT) when string is already all-lowercase via simple char range check - StringLatin1.toLowerCase accounts for ~1.29% CPU (500 samples) Files changed: - LocationCache.java: field types + return types + constructors - GlobalEndpointManager.java: return types - ClientRetryPolicy.java: variable type - GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.java: variable types - HttpHeaders.java: set() and getHeader() fast-path Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Annie Liang and others added 4 commits March 31, 2026 18:23

docs: add benchmark comparison charts for hashmap-collection-allocation

10d1b5e

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

docs: update benchmark charts with corrected per-interval CPU rendering

86ee3c0

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions bot added the Cosmos label Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: eliminate UnmodifiableList virtual dispatch overhead in LocationCache#48674

perf: eliminate UnmodifiableList virtual dispatch overhead in LocationCache#48674
xinlian12 wants to merge 4 commits intoAzure:mainfrom
xinlian12:perf/sdk-internal-processing-v2

xinlian12 commented Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xinlian12 commented Apr 2, 2026

Summary

JFR Analysis

Changes

Benchmark Results

Why multi-tenant benefits most

Note on attempt 1

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant