-
Notifications
You must be signed in to change notification settings - Fork 35
Feature request for transformers
use-cases
#673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks a lot for the great feedback!
|
Thanks a lot @NicolasHug ! Looking forward for future releases 🤗 |
Quick update @zucchini-nlp , I'm working on mono/stereo conversion in #678. Regarding point 4 and the problematic video: I can confirm that the video itself isn't correctly encoded: all frames and packets are specified with a pts of I'll try to see what TorchCodec can do to smoothly handle such videos. In the mean time, I'm curious how blocking this is for you right now? Do you have a lot of such poorly encoded videos? Do you absolutely need them to be decoded as-is, or is re-encoding an option? Thanks! |
Great thanks!
This sounds good, will be nice to get an informative error or probably set duration to
Not really of a need if decoding is not possible and video is corrupted. An error we can catch or similar will be enough |
I've got some good news @zucchini-nlp , we found a way to properly decode the video you linked to. The PTS info is missing, but we can fallback to DTS values (which, in that video, were correctly set). I hope it will address the other videos you had issues with (if not, let us know!) BTW, about this:
We used to have some notice indicating that some APIs could change, but we consider the public APIs to be very stable now. It's extremely unlikely that we'll be changing public stuff (other than for major bug-fixes), so please feel free to rely on the public APIs, they're stable. I'll be pushing a new release (TorchCodec 0.4) in the coming days with:
We're very excited that you consider TorchCodec for |
Great news @NicolasHug ! The I will try to integrate |
I just published 0.4 with the improvements mentioned above: https://github.com/pytorch/torchcodec/releases I'll close the issue, thank you for your feedback and keep us updated on the |
🚀 The feature
Hi 👋
First of all, huge thanks to you and the team, the latest
torchcodec
release with audio support is fantastic! It's a long-awaited featureI'm the maintainer of multimodal models in
transformers
and I'm thinking to usetorchcodec
to load multimodal data for MLLMs. Looking forward for a stable version to be released. For now, I’ve been testing the latest release and noticed a few points that might be useful to consider for future support.Mono channel audio support: Some audio models (like Whisper from Hugging Face) only support mono-channel input. It would be helpful if audio loading allowed channel selection or converted stereo to mono optionally.
Fallback for video files with no audio: When loading audio from a video file that has no audio stream, an error is raised currently. A more flexible behavior would be to return
None
, similar to howmoviepy
handles it and can be checked asif clip.audio is not None
.Loading from URL: Loading audio/video from URLs seems to work for some urls I have tested with, though I couldn’t find in the docs whether URL input is officially supported. Hope it will be officially supported for the stable release
Video decoder issues with
avi
format: When trying to loadavi
files, the decoder fails to infer duration and related metadata, which prevents sampling frames by seconds. Loading the same video saved asmp4
resolves the issue. You can try this video as an example.Let me know if you'd like me to file any of these separately or provide reproducible examples. Thanks again for the awesome work!
Motivation, pitch
No response
The text was updated successfully, but these errors were encountered: