-
Notifications
You must be signed in to change notification settings - Fork 37
Add ability to build cuda wheels #272
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @ahmadsharif1 , I left a few comments and suggestions but nothing is really blocking. I'll let you be the judge of what is best addressed here or later
if(ENABLE_CUDA) | ||
# TODO: Enable more ffmpeg versions for cuda. | ||
make_torchcodec_library(libtorchcodec7 ffmpeg7) | ||
make_torchcodec_library(libtorchcodec6 ffmpeg6) | ||
make_torchcodec_library(libtorchcodec5 ffmpeg5) | ||
else() | ||
make_torchcodec_library(libtorchcodec7 ffmpeg7) | ||
make_torchcodec_library(libtorchcodec6 ffmpeg6) | ||
make_torchcodec_library(libtorchcodec5 ffmpeg5) | ||
make_torchcodec_library(libtorchcodec4 ffmpeg4) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, is this the same as this?
BTW, what's currently preventing us from having CUDA + ffmpeg4 support? Will that ever be possible?
if(ENABLE_CUDA) | |
# TODO: Enable more ffmpeg versions for cuda. | |
make_torchcodec_library(libtorchcodec7 ffmpeg7) | |
make_torchcodec_library(libtorchcodec6 ffmpeg6) | |
make_torchcodec_library(libtorchcodec5 ffmpeg5) | |
else() | |
make_torchcodec_library(libtorchcodec7 ffmpeg7) | |
make_torchcodec_library(libtorchcodec6 ffmpeg6) | |
make_torchcodec_library(libtorchcodec5 ffmpeg5) | |
make_torchcodec_library(libtorchcodec4 ffmpeg4) | |
endif() | |
if(NOT ENABLE_CUDA) | |
# TODO: Enable more ffmpeg versions for cuda. | |
make_torchcodec_library(libtorchcodec4 ffmpeg4) | |
endif() | |
make_torchcodec_library(libtorchcodec7 ffmpeg7) | |
make_torchcodec_library(libtorchcodec6 ffmpeg6) | |
make_torchcodec_library(libtorchcodec5 ffmpeg5) |
with: | ||
# We use 3.9 because it's the minimum version we build. | ||
# We test version 3.9 against all other python versions in the matrix. | ||
name: pytorch_torchcodec__3.9_cu${{ env.cuda_version_without_periods }}_x86_64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what the comment above means, can you provide more details? It doesn't seem like we're testing any more Python versions that 3.9 here (which is fine, this is the same as the CPU job, but that seems contradictory with the comment).
Should "3.9" be ${{ matrix.python-version }}
, like in our linux CPU one?
name: pytorch_torchcodec__${{ matrix.python-version }}_cpu_x86_64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry the comment was unclear
We set
build-python-only: "disable"
That should build for all python versions but it doesn't unless cliflow/binaries/all label is present.
If that label is not present, we only build for the lowest version which happens to be 3.9. Note that it's related to the build step, not test-and-install step.
We can make test-and-install test for all versions, conditional on that label. But we have to make it aware of that label -- right now it's hard-coded.
Maybe I can just control it from the matrix as you suggested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think we need to control it from the matrix. This way, when we push a release, we set the tag as you mentioned which will trigger the build for all Python versions, and then we just need to update the matrix for the tests. That's what we do on the CPU job.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we should orthogonally test installing and testing the 3.9 wheel on python versions other than 3.9 (like 3.10, etc.)?
name: pytorch_torchcodec__3.9_cu${{ env.cuda_version_without_periods }}_x86_64 | ||
path: pytorch/torchcodec/dist/ | ||
- name: Setup miniconda using test-infra | ||
uses: ahmadsharif1/test-infra/.github/actions/setup-miniconda@14bc3c29f88d13b0237ab4ddf00aa409e45ade40 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to be your fork? Should we upstream the changes you made locally, if any? The problem with keeping that ref to your fork is that the fork might end up outdated compared to the main
official one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to upstream that now. Hopefully will happen soon:
fail-fast: false | ||
matrix: | ||
python-version: ['3.9'] | ||
cuda-version: ['12.4'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should try to figure out how to release for CUDA 11 (if we need to), but this doesn't need to be done here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
${CONDA_RUN} conda install --yes nvidia::libnpp nvidia::cuda-nvrtc=12.4 nvidia::cuda-toolkit=12.4 nvidia::cuda-cudart=12.4 nvidia::cuda-driver-dev=12.4 | ||
- name: Install test dependencies | ||
run: | | ||
${CONDA_RUN} python -m pip install --pre torchvision --index-url https://download.pytorch.org/whl/nightly/cpu |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit surprised that this doesn't force a re-installation of the torch CPU wheel... But looking at the logs, this seems OK. (It might come back and bite us in the future, let's just keep that in mind as a possible source of future problem)
default-packages: "conda-forge::ffmpeg=${{ matrix.ffmpeg-version-for-tests }}" | ||
- name: Check env | ||
run: | | ||
${CONDA_RUN} env |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember this was mentioned last week, but it looks like we don't really need CONDA_RUN in the macos wheels anymore. Do we still need CONDA_RUN for this job?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I do -- because I do install conda packages like libnpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, we do install conda packages in the other jobs as well (ffmpeg), and we don't use CONDA_RUN. But that's OK, we can merge as-is and quickly try to remove them in a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe for those we are running commands on the base conda env not the one that test-infra creates for us? WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac wheels also switched back to using conda-incubator/setup-miniconda@v3
rather than from test-infra. I think that if we use conda from test-infra, we should do the CONDA_RUN thing everywhere.
- name: Run Python tests | ||
run: | | ||
${CONDA_RUN} FAIL_WITHOUT_CUDA=1 pytest test -vvv | ||
${CONDA_RUN} python benchmarks/decoders/gpu_benchmark.py --devices=cuda:0,cpu --resize_devices=none |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't figure it out from the logs: how long does it take to run the benchmarks? This CUDA job is fairly long, about 9mins, compared to just ~2 minutes for the CPU equivalent.
If the extra time is due to the additional dependencies installation steps then there isn't much we can do about it, but if it's caused by the benchmark, maybe we can make it optional / not run it on all PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The benchmarks are quite small compared to the rest of the install process. Torch install is probably the slowest step. I added a time command prefix so we know how much time that takes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a workflow that builds a wheel with cuda and tests it by
pip install
ing it.Also add the ability to fail a cuda test if cuda is missing -- that will make sure the test doesn't get accidentally skipped.