Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Improved Scan #855
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Improved Scan #855
Changes from 12 commits
d4e3738
10d9c39
f2a281c
4622f1f
abfaf67
f2d6d8a
eeec20a
53ffc60
0efeb8d
1478837
e88f51a
a8e02a3
237ac09
859c313
c5a3223
87bca2b
49fd605
4ae51a1
609ad85
6b692f4
d0acb31
fc92538
8ad4843
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, if each invocation holds consecutive input and output elements, this shift becomes a mess (see that loop you have at the end)
also there was never a need to shuffle the entire vector, because you only ever used the last component
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you do coalesced, then a plain subgroup shuffle on the vector and then conditional set of first element (literal vectorized version of old code) will achieve what you want
P.S. also use
mix(T,T,bool)
instead of?
bevcause of HLSL short circuiting and turning ternaries into branches.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw the
subgroupShuffle
with a modulo SubgroupSize can be replaced with new intrinsic fromSPV_KHR_subgroup_rotate
if you extend thedevice_limits.json
and so on (so thatdevice_capability_traits
gets it)