Skip to content

refactor: define codec and data type classes upstream in a subpackage#3875

Draft
d-v-b wants to merge 2 commits intozarr-developers:mainfrom
d-v-b:refactor/upstream-apis
Draft

refactor: define codec and data type classes upstream in a subpackage#3875
d-v-b wants to merge 2 commits intozarr-developers:mainfrom
d-v-b:refactor/upstream-apis

Conversation

@d-v-b
Copy link
Copy Markdown
Contributor

@d-v-b d-v-b commented Apr 6, 2026

Projects that want to implement their own codecs or data types have to import base classes from zarr-python. This means zarr-python can practically never depend on any externally-defined codecs or data types without creating a circular dependency (unacceptable). See #3867.

To remedy this situation, this PR defines our codec and data type ABCs in a separate package called zarr-interfaces. zarr-interfaces is a sub-package in this repo. The interfaces in zarr-interfaces are in versioned namespaces, which makes evolution of these APIs straightforward. Projects that want to implement a zarr-compatible codec or data type should depend on zarr-interfaces instead of depending on zarr-python itself. This will allow zarr-python to optionally depend on externally-defined codecs and data types.

I'm opening this as a draft because I'm not sure about quite a few things, and I would appreciate feedback on the basic direction.

TODO:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/user-guide/*.md
  • Changes documented as a new file in changes/
  • GitHub Actions have all passed
  • Test coverage is 100% (Codecov passes)

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Apr 6, 2026
@d-v-b
Copy link
Copy Markdown
Contributor Author

d-v-b commented Apr 10, 2026

@zarr-developers/python-core-devs does anyone object to the basic proposal here: to upstream our basic codec + data type APIs? I think the current situation is untenable so I'd like to see it fixed. This PR is one approach, but I'm open to alternatives.

from zarr_interfaces.data_type.v1 import ZDType
```

Interfaces are versioned under a `v1` namespace to support future evolution
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the versioning will create confusion, because it is another version apart from the zarr package and the zarr data format versions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope it's not confusing! the goal here is to allow zarr-python to gracefully evolve things like the codec API. Since different codec APIs would not interact, we could define the current ABC-based API under v1, and a newer protocol-based API under v2. I think only codec and data type developers would need to know about this, and I would count on that crowd being able to know what the versions mean.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants