Refactor and add benchmarks #330

scotts · 2024-11-02T03:34:39Z

This PR adds new benchmarks, changes existing benchmarks, and refactors the benchmark code itself. More specifically:

Refactors the benchmark code, and most obviously, changed a bunch of the decoder names. The names in the code now match more what we call them in conversation, and I think make both the code and the output more readable.
Adds an experiment using random timestamps. The three experiments are now: random timestamps, uniform timestamps, and decode the first 100 frames.
Adds a new decoder kind in the experiments: TorchCodecPublic. All of the pre-existing TorchCodec decoder kinds now start with TorchCodecCore. The naming is, I hope, obvious: one kind uses the public API, the other directly uses the core API.

Running the benchmarks with the command:

python benchmarks/decoders/benchmark_decoders.py

On my dev machine yields:

[--------- video=/home/scottas/github/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------]
                                         |  uniform 10 seek()+next()  |  random 10 seek()+next()  |  1 next()  |  10 next()  |  100 next()  |  create()+next()
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------
      DecordNonBatchDecoderAccurateSeek  |            55.1            |           119.1           |    12.7    |     18.4    |     62.0     |                 
      TorchVision[backend=video_reader]  |           349.0            |           327.3           |    12.7    |     16.0    |     44.2     |                 
      TorchAudioDecoder                  |           467.3            |           513.7           |     8.0    |     15.3    |     73.1     |                 
      TorchCodecCore:                    |            48.9            |           113.7           |     9.6    |     12.8    |     44.8     |        9.6      
      TorchCodecCore:num_threads=1       |           108.9            |           259.7           |     6.9    |     11.9    |     70.0     |                 
      TorchCodecCoreCompiled             |            47.8            |           110.1           |     9.7    |     13.3    |     49.3     |                 
      TorchCodecCoreBatch                |            49.3            |            43.3           |    11.5    |     14.5    |     61.8     |                 
      TorchCodecPublic                   |            50.7            |            45.2           |    11.8    |     15.0    |     48.1     |

I think we have an opportunity to improve performance here.

The current implementation of iterators for VideoDecoder use the indexing API (__getitem__()). That is, we don't actually implement our own iterators. The Python language just uses __getitem__() and __len__() for us. That means we are seeking. If we implement our own iterators, we could do the same thing that TorchCodecCore is doing and get better performance with the public API.

Also note that the README data and charts now use the public API, not the core. I think a principle we should stick to is that when we talk externally about performance, we should always talk about the performance of the public API.

Finally, #329 should merge before this PR. Then this PR should rebase on main and I need to rerun the script to generate the graph.

ahmadsharif1 · 2024-11-04T02:05:25Z

benchmarks/decoders/benchmark_decoders_library.py

-                print(
-                    f"video={video_file_path}, decoder={decoder_name}, pts_list={pts_list}"
+
+            random_pts_list = (torch.rand(num_samples) * duration).tolist()


What's the random seed here? Is this the source of the benchmark run time variation?

Is it the same for all decoders at least?

Oh, good point. We're doing a different random list for all decoders. We should fix that. I can also set the seed to be deterministic. Maybe in the future that should be a benchmark parameter.

ahmadsharif1 · 2024-11-04T02:07:00Z

benchmarks/decoders/benchmark_decoders_library.py

+                df_item["fps_p25"] = 1.0 * num_samples / results[-1]._p25
+                df_data.append(df_item)
+
+            for num_consecutive_nexts in [100]:


I think we do need 1 here, or at least the ability to pass in 1 here. (Maybe don't need 1 for the readme, but this lib should be able to accept 1 here somehow)

Time to first frame is a useful metric for devs to track

Good point. I think we'll want to make that configurable, at least as input to run_benchmark(). I'll do that.

ahmadsharif1 · 2024-11-04T02:09:21Z

benchmarks/decoders/benchmark_decoders.py

        elif decoder == "torchvision":
-            decoder_dict["TVNewAPIDecoderWithBackendVideoReader"] = (
+            decoder_dict["TorchVision[backend=video_reader]"] = (


@NicolasHug can tell you about torchvision's APIs and what the name should be for this. This is a specific API -- that I have heard him say it's the new API

The naming in torchvision is awful. Basially the VideoReader class and the "video_reader" backend are 2 orthogonal concepts with the same name.

There are 2 main Python decoding APIs.

read_video() in io.video.py.

VideoReader(). It is considered more "fine-grained" and the "new" API (although not that new at this point, it's just more recent).

Both APIs support pyav and "video_reader" backend where "video_reader" backend just means "torchvision cpu" backend. The VideoReader() also supposedly supports the "cuda" backend.

In the benchmark, we're calling the VideoReader class, providing it with the "video_reader" backend. I think we should name the TorchVision decoder on the config option that makes the most difference for performance. I think that the backend would be more important here?

yes, I agree.

ahmadsharif1 · 2024-11-04T02:14:01Z

Can you add the cli you used to generate this data too?

Also it's weird to see the core being 2x slower than public for the random seek -- maybe it's because the random pts list is different for every decoder? Or we aren't running the benchmarks for long enough?

scotts · 2024-11-04T15:00:13Z

Benchmarks were run with:

python benchmarks/decoders/benchmark_decoders.py

Note that I changed the defaults of what benchmarks run when there are no options provided. Let me fix the call to random, and then we can look at the numbers again. I did increase the runtime for the README chart, but not for benchmark_decoders.

ahmadsharif1 · 2024-11-04T15:16:25Z

Benchmarks were run with:
python benchmarks/decoders/benchmark_decoders.py

Can you put this in the PR description just before the table so git log shows it?

scotts · 2024-11-04T15:56:25Z

I update the description with how I ran the benchmarks, and the top result from this batch. This is from calling the benchmarks four consecutive times:

[--------- video=/home/scottas/github/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------]
                                         |  uniform 10 seek()+next()  |  random 10 seek()+next()  |  1 next()  |  10 next()  |  100 next()  |  create()+next()
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------
      DecordNonBatchDecoderAccurateSeek  |            55.1            |           119.1           |    12.7    |     18.4    |     62.0     |                 
      TorchVision[backend=video_reader]  |           349.0            |           327.3           |    12.7    |     16.0    |     44.2     |                 
      TorchAudioDecoder                  |           467.3            |           513.7           |     8.0    |     15.3    |     73.1     |                 
      TorchCodecCore:                    |            48.9            |           113.7           |     9.6    |     12.8    |     44.8     |        9.6      
      TorchCodecCore:num_threads=1       |           108.9            |           259.7           |     6.9    |     11.9    |     70.0     |                 
      TorchCodecCoreCompiled             |            47.8            |           110.1           |     9.7    |     13.3    |     49.3     |                 
      TorchCodecCoreBatch                |            49.3            |            43.3           |    11.5    |     14.5    |     61.8     |                 
      TorchCodecPublic                   |            50.7            |            45.2           |    11.8    |     15.0    |     48.1     |                 


[--------- video=/home/scottas/github/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------]
                                         |  uniform 10 seek()+next()  |  random 10 seek()+next()  |  1 next()  |  10 next()  |  100 next()  |  create()+next()
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------
      DecordNonBatchDecoderAccurateSeek  |            56.0            |           122.6           |    12.9    |     18.6    |     62.0     |                 
      TorchVision[backend=video_reader]  |           343.7            |           345.0           |    12.6    |     15.7    |     50.2     |                 
      TorchAudioDecoder                  |           498.7            |           560.0           |     8.0    |     14.0    |     83.9     |                 
      TorchCodecCore:                    |            48.9            |           111.9           |     9.5    |     12.8    |     59.1     |        9.6      
      TorchCodecCore:num_threads=1       |           111.2            |           263.4           |     6.9    |     12.0    |     72.6     |                 
      TorchCodecCoreCompiled             |            48.9            |           114.0           |     9.7    |     13.5    |     69.5     |                 
      TorchCodecCoreBatch                |            50.1            |            45.0           |    11.5    |     14.4    |     62.7     |                 
      TorchCodecPublic                   |            50.4            |            46.1           |    11.6    |     15.1    |     60.9     |                 

[--------- video=/home/scottas/github/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------]
                                         |  uniform 10 seek()+next()  |  random 10 seek()+next()  |  1 next()  |  10 next()  |  100 next()  |  create()+next()
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------
      DecordNonBatchDecoderAccurateSeek  |            52.8            |           117.8           |    12.9    |     16.3    |     68.8     |                 
      TorchVision[backend=video_reader]  |           361.6            |           337.1           |    12.8    |     16.0    |     51.5     |                 
      TorchAudioDecoder                  |           472.7            |           518.9           |     8.1    |     15.6    |     84.5     |                 
      TorchCodecCore:                    |            48.1            |           109.3           |     9.6    |     12.8    |     59.8     |        9.6      
      TorchCodecCore:num_threads=1       |           113.6            |           260.9           |     6.9    |     11.9    |     70.7     |                 
      TorchCodecCoreCompiled             |            48.4            |           109.2           |     9.7    |     13.4    |     64.8     |                 
      TorchCodecCoreBatch                |            49.1            |            43.7           |    11.5    |     14.6    |     61.9     |                 
      TorchCodecPublic                   |            50.1            |            45.0           |    11.6    |     14.9    |     59.4     |                 

[--------- video=/home/scottas/github/torchcodec/benchmarks/decoders/../../test/resources/nasa_13013.mp4 h264 480x270, 13.013s 29.97002997002997fps ---------]
                                         |  uniform 10 seek()+next()  |  random 10 seek()+next()  |  1 next()  |  10 next()  |  100 next()  |  create()+next()
1 threads: ---------------------------------------------------------------------------------------------------------------------------------------------------
      DecordNonBatchDecoderAccurateSeek  |            54.9            |           121.5           |    12.8    |     18.1    |     69.5     |                 
      TorchVision[backend=video_reader]  |           351.3            |           325.7           |    12.7    |     15.8    |     46.6     |                 
      TorchAudioDecoder                  |           463.7            |           521.3           |     8.0    |     14.0    |     84.8     |                 
      TorchCodecCore:                    |            47.4            |           110.4           |     9.6    |     12.7    |     58.5     |        9.5      
      TorchCodecCore:num_threads=1       |           112.1            |           271.0           |     6.9    |     11.9    |     69.0     |                 
      TorchCodecCoreCompiled             |            48.2            |           112.2           |     9.8    |     13.3    |     63.7     |                 
      TorchCodecCoreBatch                |            49.2            |            44.3           |    11.6    |     14.7    |     63.4     |                 
      TorchCodecPublic                   |            49.4            |            43.8           |    11.6    |     14.8    |     59.6     |

The only major anomaly I see is how variable across runs TorchCodecCore and TorchCodecPublic are for 100 next. Curiously, they're both behaving about the same within a run.

ahmadsharif1

You can double check whether we are seeking or not by dumping decoderStats in C++

I don't think we should be seeking, especially after scanning the file

Without scanning we could accidentally seek sometimes

Refactor and add benchmarks

e0040a0

facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 2, 2024

scotts marked this pull request as ready for review November 2, 2024 03:53

scotts requested a review from ahmadsharif1 November 2, 2024 03:53

ahmadsharif1 reviewed Nov 4, 2024

View reviewed changes

Merge branch 'main' of github.com:pytorch/torchcodec into exp_refactor

79589f2

Generation new chart, add more next benchmarks

c4681c0

ahmadsharif1 approved these changes Nov 4, 2024

View reviewed changes

scotts merged commit 6028420 into pytorch:main Nov 4, 2024
37 of 40 checks passed

scotts deleted the exp_refactor branch November 4, 2024 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor and add benchmarks #330

Refactor and add benchmarks #330

Uh oh!

scotts commented Nov 2, 2024 •

edited

Loading

Uh oh!

ahmadsharif1 Nov 4, 2024 •

edited

Loading

Uh oh!

scotts Nov 4, 2024

Uh oh!

ahmadsharif1 Nov 4, 2024

Uh oh!

scotts Nov 4, 2024

Uh oh!

ahmadsharif1 Nov 4, 2024

Uh oh!

NicolasHug Nov 4, 2024

Uh oh!

scotts Nov 4, 2024

Uh oh!

NicolasHug Nov 4, 2024

Uh oh!

ahmadsharif1 commented Nov 4, 2024

Uh oh!

scotts commented Nov 4, 2024

Uh oh!

ahmadsharif1 commented Nov 4, 2024

Uh oh!

scotts commented Nov 4, 2024

Uh oh!

ahmadsharif1 left a comment

Uh oh!

Uh oh!

Uh oh!

Refactor and add benchmarks #330

Refactor and add benchmarks #330

Uh oh!

Conversation

scotts commented Nov 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahmadsharif1 Nov 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

scotts Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

ahmadsharif1 Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

scotts Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

ahmadsharif1 Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

NicolasHug Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

scotts Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

NicolasHug Nov 4, 2024

Choose a reason for hiding this comment

Uh oh!

ahmadsharif1 commented Nov 4, 2024

Uh oh!

scotts commented Nov 4, 2024

Uh oh!

ahmadsharif1 commented Nov 4, 2024

Uh oh!

scotts commented Nov 4, 2024

Uh oh!

ahmadsharif1 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

scotts commented Nov 2, 2024 •

edited

Loading

ahmadsharif1 Nov 4, 2024 •

edited

Loading