refactor/consolidated JSON IO routines#3998
Conversation
Add four free functions for moving JSON documents in and out of stores, plus a thin StorePath.get_json wrapper: - buffer_to_json / json_to_buffer: convert between a Buffer and a parsed JSON value. The buffer prototype and the serialization policy (indent from config, allow_nan=True) live here, in one place. - get_json / set_json: compose those with Store.get / Store.set. get_json returns None for a missing key, matching what most callers want; callers needing presence check for None themselves. These are free functions, not Store ABC methods, so stores cannot (and need not) override them and the Store contract gains no dependency on the buffer prototype or global config. Subsequent commits sweep the hand-rolled JSON I/O sites onto these and delete the old private _get_bytes/_get_json store methods. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sweep the hand-rolled `json.dumps`/`json.loads` + buffer construction in the metadata write paths and the metadata/node read paths onto the free functions added in the previous commit. Write sites (`to_buffer_dict` in group/metadata) now call `json_to_buffer`, which centralizes the `json_indent`/`allow_nan=True` serialization policy. Read sites parse buffers through `buffer_to_json_object`, a new helper that narrows the parsed `JSON` to `dict[str, JSON]` once (every metadata document is an object) rather than relying on `json.loads` returning `Any`. The compact (no-indent) consolidated-metadata blob in `GroupMetadata.to_buffer_dict` is intentionally left as a raw `json.dumps` to preserve its byte layout, and `_read_metadata_v2` keeps its `asyncio.gather` of three `store.get` calls. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`Store._get_bytes`, `_get_bytes_sync`, `_get_json`, and `_get_json_sync` had no production callers — the metadata read/write paths now go through the free functions in `zarr.core._json`. Delete the four ABC methods, the eight per-store overrides on `LocalStore`/`MemoryStore` (which existed only to make the `prototype` argument optional), and the tests that exercised them (the shared `StoreTests` methods plus the per-store prototype=None tests). Drop the now-unused `json`/`sync`/`Any` imports they pulled in. These methods were always private; removing them is not a public API change. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`json_to_buffer` (and `set_json`) now take the JSON encoding parameters
`indent` and `allow_nan` as explicit keyword arguments instead of reading
`config.get("json_indent")` internally. The functions in `zarr.core._json`
are now pure: they depend only on their arguments, not the global config.
Each `to_buffer_dict` caller (group, metadata v2/v3) reads
`config.get("json_indent")` and passes it as `indent=`. This re-localizes
the config read to the metadata layer that owns the policy, and keeps the
I/O helpers free of any config dependency.
With `indent` now explicit, the compact consolidated-metadata blob in
`GroupMetadata.to_buffer_dict` also routes through `json_to_buffer`
(indent defaults to None == compact), removing the last raw `json.dumps`
in that method and the `json` import from group.py. Output bytes are
unchanged (the consolidated/metadata regression tests pass).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3998 +/- ##
==========================================
+ Coverage 93.49% 93.51% +0.01%
==========================================
Files 88 89 +1
Lines 11873 11825 -48
==========================================
- Hits 11101 11058 -43
+ Misses 772 767 -5
🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The except clauses in `contains_array`, `contains_group`, and `_contains_node_v3` (widened to catch `TypeError` in the JSON I/O refactor) had no test exercising them — a stored `zarr.json` that is malformed bytes, valid-but-non-object JSON, or an object missing `node_type` now has a test asserting each function reports the node as absent. One test per failure mode; the two `contains_*` functions are parametrized over `func`. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
chuckwondo
left a comment
There was a problem hiding this comment.
LGTM! I made some trivial/non-blocking comments.
|
|
||
| Every metadata document zarr reads is a JSON object, so this narrows the | ||
| `JSON` union to `dict[str, JSON]` once, here, instead of at each call site. | ||
|
|
There was a problem hiding this comment.
Seems trivial, but do you want to add a Parameters section?
|
|
||
| Parameters | ||
| ---------- | ||
| obj : JSON |
There was a problem hiding this comment.
In all new code, do you want to stop including types in docstrings?
Co-authored-by: Chuck Daniels <cjdaniels4@gmail.com>
|
It looks like this will break Icechunk's tests (https://github.com/earth-mover/icechunk/blob/6c504cf377db2e9559703467adfa435a845ce14d/icechunk-python/tests/test_zarr/test_store/test_icechunk_store.py#L573-L583), but likely not its functionality. I'd like to delegate my review request to an Icechunk dev if possible since there's an an impact on downstream maintenance and I hold no opinion about this PR. @dcherian would you mind reviewing this one? |
Add a Parameters section to buffer_to_json_object and drop the redundant type annotations from the numpydoc Parameters sections, per review feedback on zarr-developers#3998. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ouch, we really shouldn't have downstream projects relying on private methods like this |
This is an internal refactor that does a few things related to JSON IO:
none of these things touch public APIs. the goal is removing dead code and boilerplate.