Skip to content

Add ability to build cuda wheels #272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 85 commits into from
Oct 23, 2024
Merged

Conversation

ahmadsharif1
Copy link
Contributor

@ahmadsharif1 ahmadsharif1 commented Oct 16, 2024

Add a workflow that builds a wheel with cuda and tests it by pip installing it.

Also add the ability to fail a cuda test if cuda is missing -- that will make sure the test doesn't get accidentally skipped.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 16, 2024
@ahmadsharif1 ahmadsharif1 changed the title Add cuda wheels to torchcodec Add ability to build cuda wheels Oct 16, 2024
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
@ahmadsharif1 ahmadsharif1 marked this pull request as ready for review October 23, 2024 13:29
@scotts scotts mentioned this pull request Oct 23, 2024
Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @ahmadsharif1 , I left a few comments and suggestions but nothing is really blocking. I'll let you be the judge of what is best addressed here or later

Comment on lines 80 to 90
if(ENABLE_CUDA)
# TODO: Enable more ffmpeg versions for cuda.
make_torchcodec_library(libtorchcodec7 ffmpeg7)
make_torchcodec_library(libtorchcodec6 ffmpeg6)
make_torchcodec_library(libtorchcodec5 ffmpeg5)
else()
make_torchcodec_library(libtorchcodec7 ffmpeg7)
make_torchcodec_library(libtorchcodec6 ffmpeg6)
make_torchcodec_library(libtorchcodec5 ffmpeg5)
make_torchcodec_library(libtorchcodec4 ffmpeg4)
endif()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, is this the same as this?

BTW, what's currently preventing us from having CUDA + ffmpeg4 support? Will that ever be possible?

Suggested change
if(ENABLE_CUDA)
# TODO: Enable more ffmpeg versions for cuda.
make_torchcodec_library(libtorchcodec7 ffmpeg7)
make_torchcodec_library(libtorchcodec6 ffmpeg6)
make_torchcodec_library(libtorchcodec5 ffmpeg5)
else()
make_torchcodec_library(libtorchcodec7 ffmpeg7)
make_torchcodec_library(libtorchcodec6 ffmpeg6)
make_torchcodec_library(libtorchcodec5 ffmpeg5)
make_torchcodec_library(libtorchcodec4 ffmpeg4)
endif()
if(NOT ENABLE_CUDA)
# TODO: Enable more ffmpeg versions for cuda.
make_torchcodec_library(libtorchcodec4 ffmpeg4)
endif()
make_torchcodec_library(libtorchcodec7 ffmpeg7)
make_torchcodec_library(libtorchcodec6 ffmpeg6)
make_torchcodec_library(libtorchcodec5 ffmpeg5)

with:
# We use 3.9 because it's the minimum version we build.
# We test version 3.9 against all other python versions in the matrix.
name: pytorch_torchcodec__3.9_cu${{ env.cuda_version_without_periods }}_x86_64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what the comment above means, can you provide more details? It doesn't seem like we're testing any more Python versions that 3.9 here (which is fine, this is the same as the CPU job, but that seems contradictory with the comment).

Should "3.9" be ${{ matrix.python-version }}, like in our linux CPU one?

name: pytorch_torchcodec__${{ matrix.python-version }}_cpu_x86_64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry the comment was unclear

We set

build-python-only: "disable"

That should build for all python versions but it doesn't unless cliflow/binaries/all label is present.

If that label is not present, we only build for the lowest version which happens to be 3.9. Note that it's related to the build step, not test-and-install step.

We can make test-and-install test for all versions, conditional on that label. But we have to make it aware of that label -- right now it's hard-coded.

Maybe I can just control it from the matrix as you suggested

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think we need to control it from the matrix. This way, when we push a release, we set the tag as you mentioned which will trigger the build for all Python versions, and then we just need to update the matrix for the tests. That's what we do on the CPU job.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we should orthogonally test installing and testing the 3.9 wheel on python versions other than 3.9 (like 3.10, etc.)?

name: pytorch_torchcodec__3.9_cu${{ env.cuda_version_without_periods }}_x86_64
path: pytorch/torchcodec/dist/
- name: Setup miniconda using test-infra
uses: ahmadsharif1/test-infra/.github/actions/setup-miniconda@14bc3c29f88d13b0237ab4ddf00aa409e45ade40
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to be your fork? Should we upstream the changes you made locally, if any? The problem with keeping that ref to your fork is that the fork might end up outdated compared to the main official one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to upstream that now. Hopefully will happen soon:

pytorch/test-infra#5789

fail-fast: false
matrix:
python-version: ['3.9']
cuda-version: ['12.4']
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to figure out how to release for CUDA 11 (if we need to), but this doesn't need to be done here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great point. Actually we do build for 11.8 and 12.1 also:

image

So I can add those here too

${CONDA_RUN} conda install --yes nvidia::libnpp nvidia::cuda-nvrtc=12.4 nvidia::cuda-toolkit=12.4 nvidia::cuda-cudart=12.4 nvidia::cuda-driver-dev=12.4
- name: Install test dependencies
run: |
${CONDA_RUN} python -m pip install --pre torchvision --index-url https://download.pytorch.org/whl/nightly/cpu
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit surprised that this doesn't force a re-installation of the torch CPU wheel... But looking at the logs, this seems OK. (It might come back and bite us in the future, let's just keep that in mind as a possible source of future problem)

default-packages: "conda-forge::ffmpeg=${{ matrix.ffmpeg-version-for-tests }}"
- name: Check env
run: |
${CONDA_RUN} env
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember this was mentioned last week, but it looks like we don't really need CONDA_RUN in the macos wheels anymore. Do we still need CONDA_RUN for this job?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I do -- because I do install conda packages like libnpp

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, we do install conda packages in the other jobs as well (ffmpeg), and we don't use CONDA_RUN. But that's OK, we can merge as-is and quickly try to remove them in a follow-up PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe for those we are running commands on the base conda env not the one that test-infra creates for us? WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac wheels also switched back to using conda-incubator/setup-miniconda@v3 rather than from test-infra. I think that if we use conda from test-infra, we should do the CONDA_RUN thing everywhere.

- name: Run Python tests
run: |
${CONDA_RUN} FAIL_WITHOUT_CUDA=1 pytest test -vvv
${CONDA_RUN} python benchmarks/decoders/gpu_benchmark.py --devices=cuda:0,cpu --resize_devices=none
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't figure it out from the logs: how long does it take to run the benchmarks? This CUDA job is fairly long, about 9mins, compared to just ~2 minutes for the CPU equivalent.

If the extra time is due to the additional dependencies installation steps then there isn't much we can do about it, but if it's caused by the benchmark, maybe we can make it optional / not run it on all PRs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The benchmarks are quite small compared to the rest of the install process. Torch install is probably the slowest step. I added a time command prefix so we know how much time that takes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the benchmark takes around 7 seconds:

image

The whole job is long because of mainly building ffmpeg and setting up environment, etc.

.
.
.
.
.
.
.
@ahmadsharif1 ahmadsharif1 merged commit bb29228 into pytorch:main Oct 23, 2024
72 checks passed
@ahmadsharif1 ahmadsharif1 deleted the cuda8 branch October 23, 2024 18:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/binaries/all CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants