Skip to content

Refactor CodecPipeline for flexibility #3051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TomAugspurger opened this issue May 9, 2025 · 1 comment
Open

Refactor CodecPipeline for flexibility #3051

TomAugspurger opened this issue May 9, 2025 · 1 comment
Labels
enhancement New features or improvements

Comments

@TomAugspurger
Copy link
Contributor

Zarr version

v3

Numcodecs version

na

Python Version

na

Operating System

na

Installation

na

Description

Currently, the CodecPipeline interface works by passing around Iterable[tuple[...]] for various types of tuples. For example decode:

chunks_and_specs: Iterable[tuple[CodecOutput | None, ArraySpec]],

  • decode: Iterable[tuple[CodecOutput | None, ArraySpec]]
  • encode: Iterable[tuple[CodecInput | None, ArraySpec]]
  • read: Iterable[tuple[ByteGetter, ArraySpec, SelectorTuple, SelectorTuple, bool]]
  • write: Iterable[tuple[ByteSetter, ArraySpec, SelectorTuple, SelectorTuple, bool]]

At the moment, we have no way to evolve the interface in a backwards compatible way. #2845 noted an accidental API break.

One option for gracefully evolving the spec here, which I might need for #2904, is to replace the tuples with dataclasses. We can safely add new optional fields to the dataclass without breaking backwards compatibility.

We can define __len__ and __iter__ on the dataclasses and freeze their return values to the current API.

@dataclass(frozen=True, eq=True)
class DecodeChunksAndSpecs:
    codec_output: CodecOutput | None
    array_spec: ArraySpec

    def __len__(self): return 2
    def __iter__(self):
        yield self.codec_output
        yield self.array_spec

And potentially we would warn when accessing the fields through iteration or position, to encourage pipeline implementations to migrate to the new system.

Steps to reproduce

na

Additional output

No response

@TomAugspurger TomAugspurger added the bug Potential issues with the zarr-python library label May 9, 2025
@LDeakin
Copy link
Member

LDeakin commented May 31, 2025

Another issue is that CodecPipeline.evolve_from_array_spec is currently never called. We need the ArrayMetadata and ArrayConfig in zarrs-python to properly support a broader range of Zarr V2 arrays and configurations. Also, it would be very helpful if the array store could be passed to the CodecPipeline constructor.

Right now it looks like zarrs-python is the only public user of CodecPipeline. IMHO you should just break this API for zarr-python 3.1.

cc: @ilan-gold

@dstansby dstansby added enhancement New features or improvements and removed bug Potential issues with the zarr-python library labels May 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New features or improvements
Projects
None yet
Development

No branches or pull requests

3 participants