-
-
Notifications
You must be signed in to change notification settings - Fork 610
Allow user to change the channel axis for BatchNorm
function and the likes
#1664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks! We already support multiple dims in the norm layers for different data types. The question really is that in order to have a channel axis different in one set of layers compared to all the others in Flux. It might cause confusion and increase complexity of the package more generally. |
Sounds like a reasonable (if somewhat uncommon) proposal to me if it is extended to GroupNorm and InstanceNorm as well. The main consideration I can think of would be how GPU-friendly this is. Edit: do you have publications or other concrete examples of where this flexibility would be helpful? I had a look through other libraries and none of them appear to support this. |
Thanks for your answers. I agree, this option should be also implemented for |
If you have code, a PR is always better to discuss a possible change. This is not meant to be push back, but is using |
Well, something like this would need to be consistent through Conv*, Dense* etc too, not just *Norm for it to be used consistently in a model. I agree a |
Well, this might require two |
Yeah if we did do this, I would prefer to do it across the board rather than have some layers with fixed
Yeah this is what I was thinking w.r.t. "cumbersome." It might make sense to have a layer that reshapes/permutes, applies a wrapped layer, then reshapes/permutes back. Just an alternate idea. |
I have proposed a PR (#1666).
Correct me if I am wrong, but using |
That's true. |
Here's a proposal, what if we check that function f(...; dims)
Chain(PermuteDimsArray(...; dims),
BN,
PermuteDimsArray(...; inv(dims)))
end |
|
How GPU-friendly is this double permute? Would the second one restore the original dense array for downstream operations? |
Some layers may copy, so no it shouldn't restore the original (hence it shouldn't be in Flux because it can't be generic). Note, the dimensionality of the output would match those of the input (the channels dim would be N - 1). It should be gpu friendly as PermuteDimsArray is handled in CUDA (via Adapt) which I think is fine. |
@DhairyaLGandhi your proposal is appealing. Nonetheless I am not sure how to interpret your piece of code. I still gave a try on a double permute with a GPU array, and it throws a scalar indexing ... using CUDA; CUDA.allowscalar(false)
import Base.PermutedDimsArray
x = CUDA.randn(10, 5 ,2 );
x = PermutedDimsArray(x,[3,2,1]);
x = PermutedDimsArray(x, invperm([3,2,1]));
x * 2 |
Could you check whether PermuteDimsArray has an appropriate rule in Adapt.jl? |
@DhairyaLGandhi Yes it looks like there is something |
You are not using Adapt.jl here, because
|
Sure, this is what it throws
|
Okay, that's because broadcasting with a |
Cc @maleadt |
Broadcast generally only works with plain |
Any updates here? Channel dimension is naturally |
Hi all,
Currently, the
BatchNorm
function selects by default theN-1
dimension of theN
dimensional array as the channel dimension. I think it would be useful to let the user decide on the dimension that corresponds to the channel dimension in his/her particular setting.I have already a proposal, that I am happy to release as a pull request.
Example
The text was updated successfully, but these errors were encountered: