perf: cache lexicographic chunk coords in sharding codec by d-v-b · Pull Request #4012 · zarr-developers/zarr-python

d-v-b · 2026-05-26T21:02:15Z

Fixes a performance regression introduced in #3826 that made morton-order-indexing slower.

for details, see claude's summary below.

The subchunk_write_order feature (#3826) regressed sharded write performance: _encode_partial_single rebuilt the full per-shard chunk coordinate grid on every write via
np.array(list(_subchunk_order_iter(..., "lexicographic"))), and to_dict_vectorized rebuilt a tuple key per row with tuple(coords.ravel()). For a single-chunk write into a shard with tens of thousands of chunks this roughly doubled write time (~22ms -> ~40ms on test_sharded_morton_write_single_chunk, matching the -44% CodSpeed regression).

Add cached _lexicographic_order (array) and _lexicographic_order_keys (tuples) helpers in indexing.py, mirroring _morton_order/_morton_order_keys, and pass the cached keys into to_dict_vectorized instead of deriving them row-by-row. This restores write throughput to the pre-#3826 baseline while preserving identical chunk ordering (verified equal to np.ndindex across shapes including 0-d and empty).

[Description of PR]

TODO:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

The subchunk_write_order feature (zarr-developers#3826) regressed sharded write performance: _encode_partial_single rebuilt the full per-shard chunk coordinate grid on every write via `np.array(list(_subchunk_order_iter(..., "lexicographic")))`, and `to_dict_vectorized` rebuilt a tuple key per row with `tuple(coords.ravel())`. For a single-chunk write into a shard with tens of thousands of chunks this roughly doubled write time (~22ms -> ~40ms on test_sharded_morton_write_single_chunk, matching the -44% CodSpeed regression). Add cached `_lexicographic_order` (array) and `_lexicographic_order_keys` (tuples) helpers in indexing.py, mirroring `_morton_order`/`_morton_order_keys`, and pass the cached keys into `to_dict_vectorized` instead of deriving them row-by-row. This restores write throughput to the pre-zarr-developers#3826 baseline while preserving identical chunk ordering (verified equal to np.ndindex across shapes including 0-d and empty). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-26T21:11:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.56%. Comparing base (1cda981) to head (3372b05).

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #4012   +/-   ##
=======================================
  Coverage   93.55%   93.56%           
=======================================
  Files          88       88           
  Lines       11873    11886   +13     
=======================================
+ Hits        11108    11121   +13     
  Misses        765      765

Files with missing lines	Coverage Δ
src/zarr/codecs/sharding.py	`92.13% <100.00%> (ø)`
src/zarr/core/indexing.py	`96.30% <100.00%> (+0.05%)`	⬆️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codspeed-hq · 2026-05-26T21:28:36Z

Merging this PR will improve performance by 76.71%

⚡ 3 improved benchmarks
✅ 63 untouched benchmarks
⏩ 6 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	WallTime	`test_sharded_morton_write_single_chunk[(32, 32, 32)-memory]`	348.6 ms	198.7 ms	+75.45%
⚡	WallTime	`test_sharded_morton_write_single_chunk[(33, 33, 33)-memory]`	386.8 ms	217.2 ms	+78.06%
⚡	WallTime	`test_sharded_morton_write_single_chunk[(30, 30, 30)-memory]`	292.6 ms	165.7 ms	+76.62%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing d-v-b:perf-sharding-coord-cache (3372b05) with main (c0e2afa)²}

6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩
No successful run was found on main (1cda981) during the generation of this report, so c0e2afa was used instead as the comparison base. There might be some changes unrelated to this pull request in this report. ↩

d-v-b added the benchmark Code will be benchmarked in a CI job. label May 26, 2026

d-v-b requested a review from ilan-gold May 26, 2026 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: cache lexicographic chunk coords in sharding codec#4012

perf: cache lexicographic chunk coords in sharding codec#4012
d-v-b wants to merge 1 commit into
zarr-developers:mainfrom
d-v-b:perf-sharding-coord-cache

d-v-b commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

d-v-b commented May 26, 2026

Uh oh!

codecov Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

codspeed-hq Bot commented May 26, 2026

Merging this PR will improve performance by 76.71%

Performance Changes

Footnotes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 26, 2026 •

edited

Loading