Skip to content

perf: eliminate UnmodifiableList virtual dispatch overhead in LocationCache#48674

Draft
xinlian12 wants to merge 4 commits intoAzure:mainfrom
xinlian12:perf/sdk-internal-processing-v2
Draft

perf: eliminate UnmodifiableList virtual dispatch overhead in LocationCache#48674
xinlian12 wants to merge 4 commits intoAzure:mainfrom
xinlian12:perf/sdk-internal-processing-v2

Conversation

@xinlian12
Copy link
Copy Markdown
Member

Summary

Replace the vendored UnmodifiableList wrapper in LocationCache with JDK Collections.unmodifiableList(), eliminating a 4-level polymorphic virtual dispatch chain that the JIT cannot inline.

This is the second improvement stacked on top of #48662 (HashMap allocation optimization). Combined cumulative gain: +16.1% throughput vs baseline.

JFR Analysis

JFR profiling identified AbstractListDecorator.decorated() consuming 1.02% CPU (398 samples at 1t-c128). The vendored UnmodifiableList extends AbstractListDecoratorAbstractCollectionDecoratorAbstractSerializableListDecorator, creating a 4-level virtual dispatch chain for every list operation:

UnmodifiableList.get(i)
  → AbstractListDecorator.decorated()      // virtual dispatch #1
    → AbstractCollectionDecorator.decorated() // virtual dispatch #2
      → field access                          // finally gets the delegate

The JDK's Collections.unmodifiableList() wraps with a single-level UnmodifiableList that the JIT can inline trivially.

LocationCache is called on every request for endpoint resolution, and multi-tenant configs call it 30× per operation cycle — making this a high-frequency hot path.

Changes

  1. LocationCache.java — Replace all UnmodifiableList<T> fields and return types with List<T>, use Collections.unmodifiableList() instead of new UnmodifiableList<>()
  2. GlobalEndpointManager.java — Update return types from UnmodifiableList<T> to List<T>
  3. ClientRetryPolicy.java — Update variable declarations
  4. GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.java — Update variable declarations
  5. HttpHeaders.java — Added setLowered() fast-path method that skips toLowerCase() for keys already known to be lowercase
  6. Test files (7 files) — Updated to use Collections.unmodifiableList() / List.of() instead of vendored UnmodifiableList

Benchmark Results

Environment: 16-core VM, JDK 21, GATEWAY mode, 3 repetitions averaged.
Reference: perf/hashmap-collection-allocation (previous PASS branch)

Config Read Δ Throughput Read Δ P99 Write Δ Throughput Write Δ P99
1t-c1 (1 tenant, 1 concurrent) +2.6% -3.0% -0.5% +5.5%
1t-c32 (1 tenant, 32 concurrent) +7.5% -18.7% -1.4% -2.6%
1t-c128 (1 tenant, 128 concurrent) +9.6% -16.3% -1.8% -16.1%
30t-c3 (30 tenants, 3 concurrent) +15.0% -11.9% +4.6% -14.7%
30t-c5 (30 tenants, 5 concurrent) +20.6% +2.3% +11.9% -0.8%
  • Average throughput gain vs previous PASS: +6.8% (Read: +11.1%, Write: +2.6%)
  • 4 of 5 configs improved ≥1.5% threshold
  • Max regression: -1.8% (within 2% limit)
  • P99 latency improved 9/10 test points — up to -18.7%
  • Cumulative vs original baseline: +16.1%

Why multi-tenant benefits most

30t-c5 showed +20.6% Read throughput because each of the 30 tenants triggers LocationCache lookups independently. The vendored UnmodifiableList dispatch overhead multiplied 30× per operation cycle. With the JDK wrapper, the JIT inlines the single-level dispatch, eliminating this per-tenant tax.

Note on attempt 1

An earlier attempt (v1) also changed HttpHeaders internal map from Map<String,HttpHeader> to Map<String,String>. This backfired — it increased HashMap.putVal and StringLatin1.toLowerCase overhead and caused regressions at 30t-c5 (-9.5%) and 1t-c1 (-7.4%). The v2 approach correctly dropped this risky change and focused only on the safe UnmodifiableList replacement.

Annie Liang and others added 4 commits March 31, 2026 18:23
Eliminate per-response intermediate HashMap allocation by adding a new
StoreResponse constructor that accepts HttpHeaders directly. Header names
and values are populated into String[] arrays without materializing an
intermediate Map. The JsonNodeStorePayload is updated to accept header
arrays and only builds a Map lazily on error paths (extremely rare).

Pre-size HashMaps throughout the hot path to avoid resize/rehash:
- HttpHeaders request construction: sized to defaultHeaders + request headers
- StoreResponse.replicaStatusList: pre-sized to 4
- StoreResponse.withRemappedStatusCode: pre-sized to header count
- RxDocumentServiceRequest fallback maps: pre-sized to 32

Fix HttpUtils.asMap() double-allocation by iterating HttpHeaders directly
instead of calling toMap() which creates an intermediate HashMap.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…Case fast-path

Change 1: Replace vendored UnmodifiableList with Collections.unmodifiableList()
- UnmodifiableList wraps via 4-level virtual dispatch chain
  (UnmodifiableList -> AbstractSerializableListDecorator ->
   AbstractListDecorator -> AbstractCollectionDecorator)
- AbstractListDecorator.decorated() called on every list operation
  accounts for ~1.02% CPU (398 samples at 1t-c128)
- Collections.unmodifiableList() returns a simple wrapper that
  JIT can inline efficiently

Change 2: Add isAsciiLowerCase fast-path in HttpHeaders
- HTTP/2 headers are already lowercase per protocol spec
- Most x-ms-* custom headers are already lowercase
- Skip String.toLowerCase(Locale.ROOT) when string is already
  all-lowercase via simple char range check
- StringLatin1.toLowerCase accounts for ~1.29% CPU (500 samples)

Files changed:
- LocationCache.java: field types + return types + constructors
- GlobalEndpointManager.java: return types
- ClientRetryPolicy.java: variable type
- GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.java: variable types
- HttpHeaders.java: set() and getHeader() fast-path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions github-actions bot added the Cosmos label Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant