Skip to content

perf: cache lexicographic chunk coords in sharding codec#4012

Open
d-v-b wants to merge 1 commit into
zarr-developers:mainfrom
d-v-b:perf-sharding-coord-cache
Open

perf: cache lexicographic chunk coords in sharding codec#4012
d-v-b wants to merge 1 commit into
zarr-developers:mainfrom
d-v-b:perf-sharding-coord-cache

Conversation

@d-v-b
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b commented May 26, 2026

Fixes a performance regression introduced in #3826 that made morton-order-indexing slower.

for details, see claude's summary below.

The subchunk_write_order feature (#3826) regressed sharded write performance: _encode_partial_single rebuilt the full per-shard chunk coordinate grid on every write via
np.array(list(_subchunk_order_iter(..., "lexicographic"))), and to_dict_vectorized rebuilt a tuple key per row with tuple(coords.ravel()). For a single-chunk write into a shard with tens of thousands of chunks this roughly doubled write time (~22ms -> ~40ms on test_sharded_morton_write_single_chunk, matching the -44% CodSpeed regression).

Add cached _lexicographic_order (array) and _lexicographic_order_keys (tuples) helpers in indexing.py, mirroring _morton_order/_morton_order_keys, and pass the cached keys into to_dict_vectorized instead of deriving them row-by-row. This restores write throughput to the pre-#3826 baseline while preserving identical chunk ordering (verified equal to np.ndindex across shapes including 0-d and empty).

[Description of PR]

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

The subchunk_write_order feature (zarr-developers#3826) regressed sharded write
performance: _encode_partial_single rebuilt the full per-shard chunk
coordinate grid on every write via
`np.array(list(_subchunk_order_iter(..., "lexicographic")))`, and
`to_dict_vectorized` rebuilt a tuple key per row with `tuple(coords.ravel())`.
For a single-chunk write into a shard with tens of thousands of chunks this
roughly doubled write time (~22ms -> ~40ms on test_sharded_morton_write_single_chunk,
matching the -44% CodSpeed regression).

Add cached `_lexicographic_order` (array) and `_lexicographic_order_keys`
(tuples) helpers in indexing.py, mirroring `_morton_order`/`_morton_order_keys`,
and pass the cached keys into `to_dict_vectorized` instead of deriving them
row-by-row. This restores write throughput to the pre-zarr-developers#3826 baseline while
preserving identical chunk ordering (verified equal to np.ndindex across
shapes including 0-d and empty).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@d-v-b d-v-b added the benchmark Code will be benchmarked in a CI job. label May 26, 2026
@d-v-b d-v-b requested a review from ilan-gold May 26, 2026 21:04
@codecov
Copy link
Copy Markdown

codecov Bot commented May 26, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.56%. Comparing base (1cda981) to head (3372b05).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4012   +/-   ##
=======================================
  Coverage   93.55%   93.56%           
=======================================
  Files          88       88           
  Lines       11873    11886   +13     
=======================================
+ Hits        11108    11121   +13     
  Misses        765      765           
Files with missing lines Coverage Δ
src/zarr/codecs/sharding.py 92.13% <100.00%> (ø)
src/zarr/core/indexing.py 96.30% <100.00%> (+0.05%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 26, 2026

Merging this PR will improve performance by 76.71%

⚡ 3 improved benchmarks
✅ 63 untouched benchmarks
⏩ 6 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
WallTime test_sharded_morton_write_single_chunk[(32, 32, 32)-memory] 348.6 ms 198.7 ms +75.45%
WallTime test_sharded_morton_write_single_chunk[(33, 33, 33)-memory] 386.8 ms 217.2 ms +78.06%
WallTime test_sharded_morton_write_single_chunk[(30, 30, 30)-memory] 292.6 ms 165.7 ms +76.62%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.


Comparing d-v-b:perf-sharding-coord-cache (3372b05) with main (c0e2afa)2

Open in CodSpeed

Footnotes

  1. 6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

  2. No successful run was found on main (1cda981) during the generation of this report, so c0e2afa was used instead as the comparison base. There might be some changes unrelated to this pull request in this report.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

benchmark Code will be benchmarked in a CI job.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant