Add ability to build cuda wheels #272

ahmadsharif1 · 2024-10-16T18:37:04Z

Add a workflow that builds a wheel with cuda and tests it by pip installing it.

Also add the ability to fail a cuda test if cuda is missing -- that will make sure the test doesn't get accidentally skipped.

NicolasHug

Thanks a lot @ahmadsharif1 , I left a few comments and suggestions but nothing is really blocking. I'll let you be the judge of what is best addressed here or later

NicolasHug · 2024-10-23T15:18:02Z

src/torchcodec/decoders/_core/CMakeLists.txt

+    if(ENABLE_CUDA)
+        # TODO: Enable more ffmpeg versions for cuda.
+        make_torchcodec_library(libtorchcodec7 ffmpeg7)
+        make_torchcodec_library(libtorchcodec6 ffmpeg6)
+        make_torchcodec_library(libtorchcodec5 ffmpeg5)
+    else()
+        make_torchcodec_library(libtorchcodec7 ffmpeg7)
+        make_torchcodec_library(libtorchcodec6 ffmpeg6)
+        make_torchcodec_library(libtorchcodec5 ffmpeg5)
+        make_torchcodec_library(libtorchcodec4 ffmpeg4)
+    endif()


Nit, is this the same as this?

BTW, what's currently preventing us from having CUDA + ffmpeg4 support? Will that ever be possible?

Suggested change

if(ENABLE_CUDA)

# TODO: Enable more ffmpeg versions for cuda.

make_torchcodec_library(libtorchcodec7 ffmpeg7)

make_torchcodec_library(libtorchcodec6 ffmpeg6)

make_torchcodec_library(libtorchcodec5 ffmpeg5)

else()

make_torchcodec_library(libtorchcodec7 ffmpeg7)

make_torchcodec_library(libtorchcodec6 ffmpeg6)

make_torchcodec_library(libtorchcodec5 ffmpeg5)

make_torchcodec_library(libtorchcodec4 ffmpeg4)

endif()

if(NOT ENABLE_CUDA)

# TODO: Enable more ffmpeg versions for cuda.

make_torchcodec_library(libtorchcodec4 ffmpeg4)

endif()

make_torchcodec_library(libtorchcodec7 ffmpeg7)

make_torchcodec_library(libtorchcodec6 ffmpeg6)

make_torchcodec_library(libtorchcodec5 ffmpeg5)

NicolasHug · 2024-10-23T15:20:59Z

.github/workflows/linux_cuda_wheel.yaml

+        with:
+          # We use 3.9 because it's the minimum version we build.
+          # We test version 3.9 against all other python versions in the matrix.
+          name: pytorch_torchcodec__3.9_cu${{ env.cuda_version_without_periods }}_x86_64


I'm not sure what the comment above means, can you provide more details? It doesn't seem like we're testing any more Python versions that 3.9 here (which is fine, this is the same as the CPU job, but that seems contradictory with the comment).

Should "3.9" be ${{ matrix.python-version }}, like in our linux CPU one?

torchcodec/.github/workflows/linux_wheel.yaml

Line 68 in aae6aaf

name: pytorch_torchcodec__${{ matrix.python-version }}_cpu_x86_64

Sorry the comment was unclear

We set

build-python-only: "disable"

That should build for all python versions but it doesn't unless cliflow/binaries/all label is present.

If that label is not present, we only build for the lowest version which happens to be 3.9. Note that it's related to the build step, not test-and-install step.

We can make test-and-install test for all versions, conditional on that label. But we have to make it aware of that label -- right now it's hard-coded.

Maybe I can just control it from the matrix as you suggested

Yeah I think we need to control it from the matrix. This way, when we push a release, we set the tag as you mentioned which will trigger the build for all Python versions, and then we just need to update the matrix for the tests. That's what we do on the CPU job.

Do you think we should orthogonally test installing and testing the 3.9 wheel on python versions other than 3.9 (like 3.10, etc.)?

NicolasHug · 2024-10-23T15:24:54Z

.github/workflows/linux_cuda_wheel.yaml

+          name: pytorch_torchcodec__3.9_cu${{ env.cuda_version_without_periods }}_x86_64
+          path: pytorch/torchcodec/dist/
+      - name: Setup miniconda using test-infra
+        uses: ahmadsharif1/test-infra/.github/actions/setup-miniconda@14bc3c29f88d13b0237ab4ddf00aa409e45ade40


Is this intended to be your fork? Should we upstream the changes you made locally, if any? The problem with keeping that ref to your fork is that the fork might end up outdated compared to the main official one.

I am trying to upstream that now. Hopefully will happen soon:

pytorch/test-infra#5789

NicolasHug · 2024-10-23T15:26:10Z

.github/workflows/linux_cuda_wheel.yaml

+      fail-fast: false
+      matrix:
+        python-version: ['3.9']
+        cuda-version: ['12.4']


We should try to figure out how to release for CUDA 11 (if we need to), but this doesn't need to be done here

Great point. Actually we do build for 11.8 and 12.1 also:

So I can add those here too

NicolasHug · 2024-10-23T15:29:55Z

.github/workflows/linux_cuda_wheel.yaml

+          ${CONDA_RUN} conda install --yes nvidia::libnpp nvidia::cuda-nvrtc=12.4 nvidia::cuda-toolkit=12.4 nvidia::cuda-cudart=12.4 nvidia::cuda-driver-dev=12.4
+      - name: Install test dependencies
+        run: |
+          ${CONDA_RUN} python -m pip install --pre torchvision --index-url https://download.pytorch.org/whl/nightly/cpu


I am a bit surprised that this doesn't force a re-installation of the torch CPU wheel... But looking at the logs, this seems OK. (It might come back and bite us in the future, let's just keep that in mind as a possible source of future problem)

NicolasHug · 2024-10-23T15:31:07Z

.github/workflows/linux_cuda_wheel.yaml

+          default-packages: "conda-forge::ffmpeg=${{ matrix.ffmpeg-version-for-tests }}"
+      - name: Check env
+        run: |
+          ${CONDA_RUN} env


I remember this was mentioned last week, but it looks like we don't really need CONDA_RUN in the macos wheels anymore. Do we still need CONDA_RUN for this job?

I believe I do -- because I do install conda packages like libnpp

Hm, we do install conda packages in the other jobs as well (ffmpeg), and we don't use CONDA_RUN. But that's OK, we can merge as-is and quickly try to remove them in a follow-up PR

Maybe for those we are running commands on the base conda env not the one that test-infra creates for us? WDYT?

Mac wheels also switched back to using conda-incubator/setup-miniconda@v3 rather than from test-infra. I think that if we use conda from test-infra, we should do the CONDA_RUN thing everywhere.

NicolasHug · 2024-10-23T15:33:36Z

.github/workflows/linux_cuda_wheel.yaml

+      - name: Run Python tests
+        run: |
+          ${CONDA_RUN} FAIL_WITHOUT_CUDA=1 pytest test -vvv
+          ${CONDA_RUN} python benchmarks/decoders/gpu_benchmark.py --devices=cuda:0,cpu --resize_devices=none


I couldn't figure it out from the logs: how long does it take to run the benchmarks? This CUDA job is fairly long, about 9mins, compared to just ~2 minutes for the CPU equivalent.

If the extra time is due to the additional dependencies installation steps then there isn't much we can do about it, but if it's caused by the benchmark, maybe we can make it optional / not run it on all PRs?

The benchmarks are quite small compared to the rest of the install process. Torch install is probably the slowest step. I added a time command prefix so we know how much time that takes

FYI, the benchmark takes around 7 seconds:

The whole job is long because of mainly building ffmpeg and setting up environment, etc.

Add cuda wheel

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23
Expired

Verified
Learn about vigilant mode

Loading
Loading status checks…

0eedfe6

facebook-github-bot added the CLA Signed label Oct 16, 2024

ahmadsharif1 changed the title ~~Add cuda wheels to torchcodec~~ Add ability to build cuda wheels Oct 16, 2024

ahmadsharif1 added 27 commits October 16, 2024 12:19

.

Loading
Loading status checks…

9996cc1

.

Loading
Loading status checks…

40dcc4c

.

Loading
Loading status checks…

1392d97

.

Loading
Loading status checks…

b6e991d

.

Loading
Loading status checks…

4939eab

.

Loading
Loading status checks…

da42e6b

.

Loading
Loading status checks…

6be7b76

Merge branch 'main' of https://github.com/pytorch/torchcodec into cuda8

Loading
Loading status checks…

4bdc851

.

Loading
Loading status checks…

a2d1156

.

Loading
Loading status checks…

f4e0879

.

Loading
Loading status checks…

3358fa8

.

Loading
Loading status checks…

bb655ed

.

Loading
Loading status checks…

cb95067

.

Loading
Loading status checks…

5ca956c

.

Loading
Loading status checks…

2592372

.

Loading
Loading status checks…

9dfea81

.

Loading
Loading status checks…

c6605c2

.

Loading
Loading status checks…

4bcab19

.

Loading
Loading status checks…

c94d5a1

.

Loading
Loading status checks…

a400d0f

.

Loading
Loading status checks…

f3945b9

.

Loading
Loading status checks…

26d70a4

.

Loading
Loading status checks…

c98878b

.

Loading
Loading status checks…

318c950

.

Loading
Loading status checks…

4ef67fd

.

Loading
Loading status checks…

c29219b

.

Loading
Loading status checks…

68dc6e6

ahmadsharif1 added 16 commits October 21, 2024 12:42

.

Loading
Loading status checks…

c92d64c

.

Loading
Loading status checks…

7730c4b

.

Loading
Loading status checks…

796e637

.

Loading
Loading status checks…

52f49dd

.

Loading
Loading status checks…

768641a

.

Loading
Loading status checks…

d9ae6d0

.

Loading
Loading status checks…

abe3ed0

.

Loading
Loading status checks…

fb619ba

.

Loading
Loading status checks…

6097f32

.

Loading
Loading status checks…

6add20f

.

Loading
Loading status checks…

f6e6187

.

Loading
Loading status checks…

3456b1a

.

Loading
Loading status checks…

fad4e2b

.

Loading
Loading status checks…

e6e74d1

.

Loading
Loading status checks…

7fa4d6d

.

Loading
Loading status checks…

aae6aaf

ahmadsharif1 marked this pull request as ready for review October 23, 2024 13:29

scotts mentioned this pull request Oct 23, 2024

Build Mac wheels in CI #230

Merged

ahmadsharif1 added the ciflow/binaries/all label Oct 23, 2024

NicolasHug approved these changes Oct 23, 2024

View reviewed changes

ahmadsharif1 added 7 commits October 23, 2024 09:13

.

Loading
Loading status checks…

8e0ef0e

.

Loading
Loading status checks…

164882c

.

Loading
Loading status checks…

878153d

.

Loading
Loading status checks…

e123c65

.

Loading
Loading status checks…

370f3ee

.

Loading
Loading status checks…

250c3a8

.

Loading
Loading status checks…

3a51218

ahmadsharif1 merged commit bb29228 into pytorch:main Oct 23, 2024
72 checks passed

ahmadsharif1 deleted the cuda8 branch October 23, 2024 18:56

huydhn mentioned this pull request Oct 25, 2024

Add nightly build and upload for torchcodec pytorch/test-infra#5814

Merged

Add ability to build cuda wheels #272

Add ability to build cuda wheels #272

Conversation

ahmadsharif1 commented Oct 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ahmadsharif1 commented Oct 16, 2024 •

edited

Loading