📝 Note: By way of exception, I include one and only one image dataset, due to its size: 700K scenes and the incredible improvement in depth estimation results of the fine-tuned Depth Anything V2 ViT-B model on MegaSynth and evaluated on Hypersim. See the results in Table 6.
Dataset | Venue | Resolution | |
---|---|---|---|
1 | MegaSynth | 512×512 |
Dataset | Venue | Resolution | B o T |
C 3 R |
D 2 U |
D P |
G C |
M o G |
P O M |
R D |
U D 2 |
V D A |
|
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Spring | (to do) | 1920×1080 | - | T | T | E | T | T | - | - | - | - |
2 | HorizonGS | 1920×1080 | - | - | - | - | - | - | - | - | - | - | |
3 | MVS-Synth | (to do) | 1920×1080 | - | T | - | T | T | T | - | - | - | - |
4 | Mid-Air | (to do) | 1024×1024 | - | - | - | - | T | T | - | - | - | - |
5 | MatrixCity | (to do) | 1000×1000 | - | - | - | - | T | T | - | - | T | - |
6 | SAIL-VOS 3D | (to do) | 1280×800 | - | - | - | T | - | - | - | - | - | - |
7 | BEDLAM | (to do) | 1280×720 | - | T | - | T | - | - | - | - | T | - |
8 | Dynamic Replica | (to do) | 1280×720 | - | T | - | T | T | - | T | - | T | - |
9 | BlinkVision | 960×540 | - | - | T | - | - | - | - | - | - | - | |
10 | PointOdyssey | (to do) | 960×540 | - | T | T | - | - | - | T | E | T | T |
11 | DyDToF | (to do) | 960×540 | - | - | - | - | - | - | - | E | - | - |
12 | IRS | (to do) | 960×540 | - | T | - | T | T | T | - | - | - | T |
13 | Scene Flow | (to do) | 960×540 | - | - | - | - | E | - | - | - | - | - |
14 | 3D Ken Burns | (to do) | 512×512 | - | T | - | T | T | T | - | - | - | - |
15 | TartanAir | (to do) | 640×480 | - | T | T | T | T | T | T | T | T | T |
16 | ParallelDomain-4D | 640×480 | - | - | - | - | - | - | T | - | - | - | |
17 | GTA-SfM | (to do) | 640×480 | - | - | - | - | T | T | - | - | - | - |
18 | MPI Sintel | (to do) | 1024×436 | E | E | E | E | E | E | E | - | E | E |
19 | Virtual KITTI 2 | (to do) | 1242×375 | - | T | - | T | T | - | - | - | - | T |
20 | TartanAir Shibuya | (to do) | 640×360 | E | - | - | - | - | - | - | - | - | - |
Total: T (training) | |||||||||||||
Total: E (testing) |
- Stereo4D (400 video clips with 16 frames each at 5 fps): LPIPS<=0.242
- Qualitative comparison of four 2D to 3D video conversion methods: Rank (human perceptual judgment)
- Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.079
- NYU-Depth V2: AbsRel<=0.0424 (relative depth)
- NYU-Depth V2: AbsRel<=0.051 (metric depth)
- Appendix 1: Rules for qualifying models for the rankings (to do)
- Appendix 2: Metrics selection for the rankings (to do)
- Appendix 3: List of all research papers from the above rankings
RK | Model Links: Venue Repository |
LPIPS ↓ {Input fr.} Table 1 M2SVid |
---|---|---|
1 | M2SVid |
0.180 {MF} |
2 | SVG |
0.217 {MF} |
3 | StereoCrafter |
0.242 {MF} |
📝 Note: This ranking is based on my own perceptual judgement of the qualitative comparison results shown in Figure 7. One output frame (right view) is compared with one input frame (left view) from the video clip: 22_dogskateboarder and one output frame (right view) is compared with one input frame (left view) from the video clip: scooter-black
RK | Model Links: Venue Repository |
Rank ↓ (human perceptual judgment) |
---|---|---|
1 | StereoCrafter |
1 |
2-3 | Immersity AI | 2-3 |
2-3 | Owl3D | 2-3 |
4 | Deep3D |
4 |
📝 Note: 1) See Figure 4 2) The ranking order is determined in the first instance by a direct comparison of the scores of two models in the same paper. If there is no such direct comparison in any paper or there is a disagreement in different papers, the ranking order is determined by the best score of the compared two models in all papers that are shown in the columns as data sources. The DepthCrafter rank is based on the latest version 1.0.1.
📝 Note: The ranking order is determined in the first instance by a direct comparison of the scores of two models in the same paper. If there is no such direct comparison in any paper or there is a disagreement in different papers, the ranking order is determined by the best score of the compared two models in all papers that are shown in the columns as data sources. The Metric3D v2 ViT-Large rank is not based on a score of 0.134, which is probably just an anomaly.