Skip to content

refactor suggestion: modular index, spi tags, softdep management by spi #72

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
fkiraly opened this issue Jun 21, 2024 · 6 comments
Open

Comments

@fkiraly
Copy link

fkiraly commented Jun 21, 2024

Thanks for the great presentation today, @benfulcher!

Inspired, I looked in greater detail into the repository, to my shame perhaps for the first time at that level of detail.

What I understood is that your SPI are actually not all manually implemented, but there is a wealth of them, some using external dependencies in turn. As such, pyspi is, morally, very much similar to sktime, being a mix of de-novo implementations, direct interfaces to external algorithms, and implementations that use components with soft dependencies.

I also noticed that you have tags for the different SPI, which again is very similar to sktime.

Further, when trying to interface SPI individually, I noticed that this is currently not intended to be possible - only batch feature sets can be obtained? Which seems to be a shame, you have collected so many useful pairwise transformations! Unless of course you use the yaml, and the process of discovery if you want that is tedious, and currently cannot be automated, so composability with other frameworks is severely limited.

Based on this, I had a number of ideas if you would like to hear me out:

What do you think? I'd be happy to devote some time to shift the code base gradually towards this schema. As a side effect, it would also easily allow to interface all SPI as time seires distances in sktime, and would make it easier to add SPI for multivariate or unequal length time series.

FYI @jmoo2880

@fkiraly
Copy link
Author

fkiraly commented Apr 21, 2025

Here are concrete suggestions for the base class API design.

For context, the current strategy pattern is internally defined in base.py (in package root) and the classes therein.

I would simplify the design into a flat class (no inheritance hierarchy) with tags, as follows:

  • BaseSPI inheriting from skbase BaseObject only
  • two pairs of public/private method spi / _spi, and spi_mat / _spi_mat. This covers bivariate and multivariate, but slightly different as current - as in the next bullet point.
  • the extender contract is restricted to _spi, _spi_mat, and object tags
  • the signature is spi(data: Data) -> float, and spi_mat(data, data2=None, i=None, j=None). The calculation in spi_mat is always "bivariate", i.e., pairwise. spi_mat(data, data2) returns a 2D np.ndarray of float type, with (m,n)-th entriy corresponding to pairwise interaction of data[m] and data[n]. If i or j are provided it subsets the calculation to that row and/or column, resulting in an 1D or 0D np object. If data2 is not provided, it is identical to data.
  • the decorators starting with parse move into the boilerplate, e.g., parse_multivariate happens in spi before it calls the internal _spi
  • attributes such as name, identifier, labels move into skbase tags
  • "property"-like tags such as issigned also move into skbase tags
  • the logic for the default multivariate from Directed and Undirected moves into the default implementation of _spi
  • optionally, we can provide methods bivariate and multivariate for downwards compatibility, if yes, it should raise a deprecation warning. Probably this is not needed, because the interface is not public.

@fkiraly
Copy link
Author

fkiraly commented Apr 21, 2025

FYI @Spinachboul, @benfulcher

@fkiraly
Copy link
Author

fkiraly commented Apr 21, 2025

oh, hi, @joshuabmoore!

So someone has started maintaining the package again?

See some suggestions above for API improvement.

@joshuabmoore
Copy link
Collaborator

Hi @fkiraly,

Thank you for the API improvement suggestions! We appreciate the feedback and we'll look into incorporating some of these ideas in a future release.

Yes, I'm currently maintaining the package, but only passively. There aren't any major overhauls planned at the moment, just minor updates and fixes as needed to keep things running smoothly.

@joshuabmoore
Copy link
Collaborator

Regarding integration with sktime:

We agree that making pyspi more accessible through its integration with sktime would be a great outcome. That said, I won't be playing an active role in the integration process myself, but fully support you or @Spinachboul taking the lead on this if interested.

If you'd like to proceed, you're welcome to create a fork of the repository and work from there. I'm happy to provide occasional pointers along the way if that would be helpful, but just to set expectations, I'm not currently in a position to offer active mentorship.

@Spinachboul
Copy link

@joshuabmoore @fkiraly @benfulcher
Yes! I would be glad to get started on this
Franz has already proposed an initial idea to get started with the integration , so I will be following that.
If in any doubt, I would be then asking the doubts here on Github (or maybe in the draft PR) where anyone could provide their insights which would be much appreciated.
Thanks everyone for getting involved in this.., really means a lot!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants