Skip to content

X'[1,:] much slower than X[:, 1], despite theoretically same access pattern #628

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
RomeoV opened this issue May 19, 2025 · 4 comments
Open

Comments

@RomeoV
Copy link

RomeoV commented May 19, 2025

Expected behaviour

When I allocate a SparseMatrixCSC, indexing columns is fast and indexing rows slow. E.g.,

X = sprand(100_000, 100_000, 0.01);
@elapsed X[:, 1]  # fast
@elapsed X[1, :]  # slow

Further, when I create the lazy adjoint, then the behaviour should switch

Xt = X'  # lazy adjoint
@elapsed Xt[:, 1]  # slow
@elapsed Xt[1, :]  # fast, in theory!

and in particular, I'd expect the timings for each "fast" and "slow" to approximately match.

Actual behaviour

Unfortunately, although Adjoint{SparseMatrixCSC} "simulates" a SparseMatrixCSR, the indexing is quite slow for Xt[1, :].
In particular:

julia> X = sprand(100_000, 100_000, 0.01);
julia> @elapsed X[:, 1]
9.018e-6
julia> @elapsed X[1, :]
0.001110558
# so far okay, but now...

julia> Xt = X';
julia> @elapsed Xt[:, 1]
0.009835612
julia> @elapsed Xt[1, :]   # <- why is this slow?
0.001390563

Notably, Xt[1,:] is about 100 times slower than X[:, 1] although the same elements are indexed, and the data structure should be the same.

I have already tried to understand what's happening by Cthulhu.@descending down the indexing callstack of the lazy transpose, but the generated functions are too difficult to see through for me.

Why do I need this

I have an application where sparse matrix indexing performance is critical. For parts of the algorithm I need to retrieve columns quickly (no problem), and for some parts I need to retrieve rows quickly. It would therefore be nice to just explicitly compute the transpose once, and then index into that, i.e. X = copy(transpose(X))'. Then I don't need to change any of the existing code, but row lookup should be fast. However, as you can see above, it is not.

@RomeoV
Copy link
Author

RomeoV commented May 19, 2025

Fortunately, because it's Julia, I can just define something like

function Base.getindex(M::LinearAlgebra.Adjoint{T, <:SparseMatrixCSC{T}}, i::Int, ::Colon) where {T}
    getindex(M.parent, :, i)
end

and then it is fast.

However, I think the current behaviour is quite unexpected, and perhaps something similar could be merged upstream? Happy to prepare something if there is interest.

@SobhanMP
Copy link
Member

This is technically wrong, you need to adjoint recursively on the parent as well, but you are welcome to open a pull request.

@dkarrasch
Copy link
Member

That's right, it should be something like

adjoint.(getindex(M.parent, :, i))

This is suboptimal, because it allocates two vectors. Unfortunately, something like

adjoint.(view(M.parent, :, i))

seems to create a dense vector. Looks like we're missing some broadcasting optimization for single columns of CSC-matrices.

@SobhanMP
Copy link
Member

SobhanMP commented May 21, 2025

Given JuliaLang/julia#40632 and JuliaLang/julia#31677
Should map!(adjoint, getindex(M.parent, :, i)) be valid?

p.s.

map(adjoint, view(M.parent, :, i)) works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants