Skip to content

Compression plugin #18314

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 22 commits into
base: develop2
Choose a base branch
from
Open

Conversation

perseoGI
Copy link
Contributor

@perseoGI perseoGI commented May 16, 2025

Changelog: Feature: Allow externalization of compression/decompression tasks to a user defined plugin
Docs: conan-io/docs#4107

Close #18259

Related issues:

Related PRs:

With the aim of simplicity this PR keeps the current behaviour in some compression/decompression tasks such as:

  • Cache save/restore
  • File uploads
    Adding an extra layer which will be in charge of determining if the compression.py plugin is defined falling back to existing behaviour if it doesn't.

The compression.py plugin interface may follow the following structure:

def tar_extract(archive_path, dest_dir, conf=None, *args, **kwargs) -> None:
    pass
def tar_compress(archive_path, files, recursive, conf=None, ref=None, *args, **kwargs) -> None:
    pass

Pluging example with zstd:

# File: compression.py

import os
import tarfile

import zstandard

from conan.api.output import ConanOutput

# zstd compression
# Original author https://github.com/conan-io/conan/pull/14706
def tar_extract(archive_path, dest_dir, conf=None, *args, **kwargs):
    dctx = zstandard.ZstdDecompressor()
    ConanOutput().info(f"Decompressing {os.path.basename(archive_path)} with compression plugin (ZSTD)")
    with open(archive_path, "rb") as tarfile_obj:
        with dctx.stream_reader(tarfile_obj) as stream_reader:
            # The choice of bufsize=32768 comes from profiling decompression at various
            # values and finding that bufsize value consistently performs well.
            with tarfile.open(
                fileobj=stream_reader, bufsize=32768, mode="r|"
            ) as the_tar:
                the_tar.extractall(
                    path=dest_dir, filter=lambda tarinfo, _: tarinfo
                )


def tar_compress(archive_path, files, recursive, conf=None, ref=None, *args, **kwargs):
    ConanOutput(scope=str(ref or "")).info(
        f"Compressing {os.path.basename(archive_path)} with compression plugin (ZSTD)"
    )
    compresslevel = conf.get("user.zstd:compresslevel", check_type=int) if conf else None
    with open(archive_path, "wb") as tarfile_obj:
        # Only provide level if it was overridden by config.
        zstd_kwargs = {}
        if compresslevel is not None:
            zstd_kwargs["level"] = compresslevel

        dctx = zstandard.ZstdCompressor(write_checksum=True, threads=-1, **zstd_kwargs)

        # Create a zstd stream writer so tarfile writes uncompressed data to
        # the zstd stream writer, which in turn writes compressed data to the
        # output tar.zst file.
        with dctx.stream_writer(tarfile_obj) as stream_writer:
            # The choice of bufsize=32768 comes from profiling compression at various
            # values and finding that bufsize value consistently performs well.
            # The variance in compression times at bufsize<=64KB is small. It is only
            # when bufsize>=128KB that compression times start increasing.
            with tarfile.open(
                mode="w|",
                fileobj=stream_writer,
                bufsize=32768,
                format=tarfile.PAX_FORMAT,
            ) as tar:
                current_frame_bytes = 0
                for filename, abs_path in sorted(files.items()):
                    tar.add(abs_path, filename, recursive=recursive)

                    # Flush the current frame if it has reached a large enough size.
                    # There is no required size, but 128MB is a good starting point
                    # because it allows for faster random access to the file.
                    current_frame_bytes += os.path.getsize(abs_path)
                    if current_frame_bytes >= 134217728:
                        stream_writer.flush(zstandard.FLUSH_FRAME)
                        current_frame_bytes = 0

memsharded added a commit that referenced this pull request May 20, 2025
…xisting function (#18320)

Changelog: omit
Docs: omit

As suggested in
#18314 (comment)

This PR will simplify #18314 by
modifying the compression functionality on cache save command.
Intead of creating a tar object and adding files one by one (very
coupled interface with tarfile module), create a list of files which
will be passed to a already existing method in conan `compress_files`.

This method may be moved to another module in a future PR.
Copy link
Member

@memsharded memsharded left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking much better, easier to assess that the risk is low.
Just some small concern about the pkglist.json of the conan cache restore

@perseoGI perseoGI marked this pull request as ready for review May 23, 2025 09:45
Copy link
Member

@memsharded memsharded left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is looking good.

Before proceeding, I'd like to see a poc of using zstd and how the possible fallbacks and error handling is done, when mixing packages conan_package.tgz compressed with gz and with zstd.

@memsharded memsharded self-assigned this May 23, 2025
Copy link
Member

@AbrilRBS AbrilRBS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A comment about not re-checking the file, otherwise this looks good :)

@memsharded memsharded added this to the 2.18.0 milestone May 26, 2025
@czoido czoido requested a review from Copilot May 26, 2025 09:18
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Introduces a plugin hook for compression/decompression by loading a user-supplied compression.py and delegating to it when present, with fallback to the builtin behavior.

  • Loaded compression.py via a new config.compression_plugin property and propagated it through all compress/uncompress calls.
  • Updated compress_files and uncompress_file to invoke the plugin if available.
  • Added integration tests for plugin loading, fallback behavior, and default output.

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/integration/extensions/test_compression_plugin.py added tests for plugin validity and end-to-end usage
test/integration/command/cache/test_cache_save_restore.py asserted default compress/decompress outputs when no plugin
conan/internal/rest/rest_client_local_recipe_index.py passed config_api into LocalRecipesIndexApp
conan/internal/rest/remote_manager.py propagated compression_plugin into uncompress_file
conan/internal/cache/home_paths.py added compression_plugin_path property
conan/internal/api/uploader.py forwarded compression_plugin to compress_files
conan/internal/conan_app.py injected config_api into RemoteManager and LocalRecipesIndexApp
conan/api/subapi/config.py implemented compression_plugin loader property
conan/api/subapi/cache.py integrated plugin hook into cache save/restore
Comments suppressed due to low confidence (1)

conan/api/subapi/config.py:248

  • The code references os.path but there’s no import os at the top of this module. Add import os to avoid a NameError.
if not os.path.exists(compression_plugin_path):

@@ -90,3 +90,7 @@ def settings_path_user(self):
@property
def config_version_path(self):
return os.path.join(self._home, "config_version.json")

@property
def compression_plugin_path(self):
Copy link
Preview

Copilot AI May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Add a docstring explaining that this property returns the path to the compression.py plugin under the extensions directory.

Suggested change
def compression_plugin_path(self):
def compression_plugin_path(self):
"""Returns the path to the `compression.py` plugin under the extensions directory."""

Copilot uses AI. Check for mistakes.

@@ -158,9 +158,13 @@ def test_cache_save_excluded_folders():

# exclude source
c.run("cache save * --no-source")
# Check default compression function is being used and not compression.py plugin one
assert "Compressing conan_cache_save.tgz\n" in c.out
Copy link
Preview

Copilot AI May 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Instead of matching the newline, assert on "Compressing conan_cache_save.tgz" in c.out to make the test more robust across platforms and output formats.

Suggested change
assert "Compressing conan_cache_save.tgz\n" in c.out
assert "Compressing conan_cache_save.tgz" in c.out

Copilot uses AI. Check for mistakes.

@perseoGI perseoGI changed the title [POC] Compression plugin Compression plugin May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Use subdirectory for conan cache save content
4 participants