perf: cache lexicographic chunk coords in sharding codec#4012
Conversation
The subchunk_write_order feature (zarr-developers#3826) regressed sharded write performance: _encode_partial_single rebuilt the full per-shard chunk coordinate grid on every write via `np.array(list(_subchunk_order_iter(..., "lexicographic")))`, and `to_dict_vectorized` rebuilt a tuple key per row with `tuple(coords.ravel())`. For a single-chunk write into a shard with tens of thousands of chunks this roughly doubled write time (~22ms -> ~40ms on test_sharded_morton_write_single_chunk, matching the -44% CodSpeed regression). Add cached `_lexicographic_order` (array) and `_lexicographic_order_keys` (tuples) helpers in indexing.py, mirroring `_morton_order`/`_morton_order_keys`, and pass the cached keys into `to_dict_vectorized` instead of deriving them row-by-row. This restores write throughput to the pre-zarr-developers#3826 baseline while preserving identical chunk ordering (verified equal to np.ndindex across shapes including 0-d and empty). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4012 +/- ##
=======================================
Coverage 93.55% 93.56%
=======================================
Files 88 88
Lines 11873 11886 +13
=======================================
+ Hits 11108 11121 +13
Misses 765 765
🚀 New features to boost your workflow:
|
Merging this PR will improve performance by 76.71%
Performance Changes
Tip Curious why this is faster? Comment Comparing Footnotes
|
Fixes a performance regression introduced in #3826 that made morton-order-indexing slower.
for details, see claude's summary below.
The subchunk_write_order feature (#3826) regressed sharded write performance: _encode_partial_single rebuilt the full per-shard chunk coordinate grid on every write via
np.array(list(_subchunk_order_iter(..., "lexicographic"))), andto_dict_vectorizedrebuilt a tuple key per row withtuple(coords.ravel()). For a single-chunk write into a shard with tens of thousands of chunks this roughly doubled write time (~22ms -> ~40ms on test_sharded_morton_write_single_chunk, matching the -44% CodSpeed regression).Add cached
_lexicographic_order(array) and_lexicographic_order_keys(tuples) helpers in indexing.py, mirroring_morton_order/_morton_order_keys, and pass the cached keys intoto_dict_vectorizedinstead of deriving them row-by-row. This restores write throughput to the pre-#3826 baseline while preserving identical chunk ordering (verified equal to np.ndindex across shapes including 0-d and empty).[Description of PR]
TODO:
docs/user-guide/*.mdchanges/