Skip to content

Add best practice for warpSize handling #3790

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: docs/develop
Choose a base branch
from

Conversation

neon60
Copy link
Contributor

@neon60 neon60 commented May 14, 2025

No description provided.

@neon60 neon60 force-pushed the warpSize_runtime branch from b1297aa to fa98256 Compare May 14, 2025 13:36
Copy link

@randyh62 randyh62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The discussion on warpSize at line 415 refers to "hardware features" but it seems like some mention of the fact that warpSize and Wavefront size are the same would be useful here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some changes.

@adeljo-amd
Copy link

LGTM

@neon60 neon60 requested a review from randyh62 May 19, 2025 11:25
Copy link

@randyh62 randyh62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Thanks.

- Retrieves the warp size of the GPU (warpSizeHost) to determine the optimal
kernel configuration.

- Allocates device memory (`d_data`` for input, `d_results`` for block-wise

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backticks around d_data and d_results are not properly matched.

accordingly, as shown in the following block reduce example.

The ``block_reduce`` kernel has a template parameter for warpSize and reduction
operation in two main phases:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second part of the sentence seems to lack a verb for me, maybe something like "...and performs a reduction operation in two main phases:"?

Co-authored-by: Fabian Ritter <ritter.x2a@gmail.com>
Co-authored-by: Fabian Ritter <ritter.x2a@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants