-
Notifications
You must be signed in to change notification settings - Fork 37
Refactor and add benchmarks #330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
print( | ||
f"video={video_file_path}, decoder={decoder_name}, pts_list={pts_list}" | ||
|
||
random_pts_list = (torch.rand(num_samples) * duration).tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the random seed here? Is this the source of the benchmark run time variation?
Is it the same for all decoders at least?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, good point. We're doing a different random list for all decoders. We should fix that. I can also set the seed to be deterministic. Maybe in the future that should be a benchmark parameter.
df_item["fps_p25"] = 1.0 * num_samples / results[-1]._p25 | ||
df_data.append(df_item) | ||
|
||
for num_consecutive_nexts in [100]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we do need 1 here, or at least the ability to pass in 1 here. (Maybe don't need 1 for the readme, but this lib should be able to accept 1 here somehow)
Time to first frame is a useful metric for devs to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I think we'll want to make that configurable, at least as input to run_benchmark()
. I'll do that.
elif decoder == "torchvision": | ||
decoder_dict["TVNewAPIDecoderWithBackendVideoReader"] = ( | ||
decoder_dict["TorchVision[backend=video_reader]"] = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug can tell you about torchvision's APIs and what the name should be for this. This is a specific API -- that I have heard him say it's the new API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming in torchvision is awful. Basially the VideoReader
class and the "video_reader"
backend are 2 orthogonal concepts with the same name.
There are 2 main Python decoding APIs.
read_video()
inio.video.py
.VideoReader()
. It is considered more "fine-grained" and the "new" API (although not that new at this point, it's just more recent).
Both APIs support pyav and "video_reader" backend where "video_reader" backend just means "torchvision cpu" backend. The VideoReader()
also supposedly supports the "cuda" backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the benchmark, we're calling the VideoReader
class, providing it with the "video_reader"
backend. I think we should name the TorchVision decoder on the config option that makes the most difference for performance. I think that the backend would be more important here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I agree.
Can you add the cli you used to generate this data too? Also it's weird to see the core being 2x slower than public for the random seek -- maybe it's because the random pts list is different for every decoder? Or we aren't running the benchmarks for long enough? |
Benchmarks were run with:
Note that I changed the defaults of what benchmarks run when there are no options provided. Let me fix the call to random, and then we can look at the numbers again. I did increase the runtime for the README chart, but not for benchmark_decoders. |
Can you put this in the PR description just before the table so git log shows it? |
I update the description with how I ran the benchmarks, and the top result from this batch. This is from calling the benchmarks four consecutive times:
The only major anomaly I see is how variable across runs TorchCodecCore and TorchCodecPublic are for 100 next. Curiously, they're both behaving about the same within a run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can double check whether we are seeking or not by dumping decoderStats in C++
I don't think we should be seeking, especially after scanning the file
Without scanning we could accidentally seek sometimes
This PR adds new benchmarks, changes existing benchmarks, and refactors the benchmark code itself. More specifically:
Running the benchmarks with the command:
On my dev machine yields:
I think we have an opportunity to improve performance here.
The current implementation of iterators for
VideoDecoder
use the indexing API (__getitem__()
). That is, we don't actually implement our own iterators. The Python language just uses__getitem__()
and__len__()
for us. That means we are seeking. If we implement our own iterators, we could do the same thing that TorchCodecCore is doing and get better performance with the public API.Also note that the README data and charts now use the public API, not the core. I think a principle we should stick to is that when we talk externally about performance, we should always talk about the performance of the public API.
Finally, #329 should merge before this PR. Then this PR should rebase on main and I need to rerun the script to generate the graph.