Skip to content

Add Sink that implements git's diffing improvement heuristics #2011

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

blinxen
Copy link
Contributor

@blinxen blinxen commented May 14, 2025

This PR was initially targeted at image-diff but the maintainer seems to not have enough time at the moment so I moved it here.

What does this PR do?

This PR introduces a new Sink that can be used to optimize diffs created with imara-diff. I tried to implement the same heuristics that git uses. The code is also structured very similar.

While trying to add it to gix-diff, I was unsure how it should be included. Is it just a normal Sink that can be used with imara_diff::diffor do we force users to use it by not re-exporting the imara_diff::diff method but creating a custom diff method in the gix-diff crate? What do the maintainers think?

Things that still need to be done

  • write some more tests
  • optimize the code
  • (maybe find better names for variables, structs and the Sink itself)
  • check the licensing since it was heavily inspired by the git implementation

@Byron
Copy link
Member

Byron commented May 15, 2025

Thanks so much for contributing, this will be such a great addition to gitoxide!

While trying to add it to gix-diff, I was unsure how it should be included. Is it just a normal Sink that can be used with imara_diff::diffor do we force users to use it by not re-exporting the imara_diff::diff method but creating a custom diff method in the gix-diff crate? What do the maintainers think?

As this is plumbing, and as gix-diff openly exposes imara-diff, it will be best to keep it as Sink that people can opt-in to.

As a matter of fact, it would be very nice if the algorithm could be used and I propose adding it as an option (probably even the default) to gix diff file in a follow-up.

Meanwhile I wonder if this PR should be ready for review, or if it should rather be a draft as you plan to make some more changes?
Every answer is a good answer, and I'd be happy to review it now and make it mergeable.

@blinxen
Copy link
Contributor Author

blinxen commented May 15, 2025

write some more tests
optimize the code
(maybe find better names for variables, structs and the Sink itself)

I think these three should be OK.

check the licensing since it was heavily inspired by the git implementation

For this one I am not sure how to proceed. I found contradicting information online regarding the licensing. The original code was written in C. The Rust code in this PR was inspired by it but I used a different way of doing the optimizations. The weight factors and the algorithm (I don't mean code here but the theory behind the algorithm) are 100% from the xdiff library used by git but the rest is my own work. How does LGPL-2.1 apply here 🤔 ? Maybe you know more than me on this topic :D.

@Byron
Copy link
Member

Byron commented May 16, 2025

Great, then I will take a look!

For this one I am not sure how to proceed. I found contradicting information online regarding the licensing. The original code was written in C. The Rust code in this PR was inspired by it but I used a different way of doing the optimizations. The weight factors and the algorithm (I don't mean code here but the theory behind the algorithm) are 100% from the xdiff library used by git but the rest is my own work. How does LGPL-2.1 apply here 🤔 ? Maybe you know more than me on this topic :D.

I certainly don't, but I wouldn't fret it. My intuition here is that the work itself is copyrighted, i.e. the source code as is, but not underlying knowledge. Besides, this wouldn't be the first reimplementation of semantics of the xdiff library, as I am sure more mature Git libraries already have something just like it.
And finally, if someone would be after this, I am sure they'd come after gitoxide as a means of distribution, and not the original author, so I guess I am feeling lucky :).

@EliahKagan
Copy link
Member

I think it may be preferable for the added 2c31d30 commit message to be conventional with fix:. As described in #2013 (comment), this is fixing a bug that breaks installation, so various crates that depend on gix-archive (of which there are few that do so directly, but I think more that do so through gix) can be able to build again. It will therefore be of equal or greater than usual interest to users reading changelogs.

@EliahKagan EliahKagan linked an issue May 16, 2025 that may be closed by this pull request
@Byron
Copy link
Member

Byron commented May 16, 2025

That is true! I missed that opportunity but fixed up the gix-archive changelog to prominently feature that information anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants