Skip to content

Rankings include: Align3R BetterDepth ChronoDepth CUT3R Deep3D Depth Any Video Depth Anything Depth Pro DepthCrafter Geo4D GRIN L4P M2SVid MASt3R Metric3D Metric-Solver MoGe MonST3R NVDS RollingDepth StereoCrafter SVG UniDepth UniK3D Video Depth Anything

Notifications You must be signed in to change notification settings

AIVFI/Monocular-Depth-Estimation-Rankings-and-2D-to-3D-Video-Conversion-Rankings

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 

Repository files navigation

Monocular Depth Estimation Rankings
and 2D to 3D Video Conversion Rankings

Awesome Synthetic RGB-D Image Datasets for Training HD Video Depth Estimation Models

📝 Note: By way of exception, I include one and only one image dataset, due to its size: 700K scenes and the incredible improvement in depth estimation results of the fine-tuned Depth Anything V2 ViT-B model on MegaSynth and evaluated on Hypersim. See the results in Table 6.

Dataset      Venue      Resolution
1 MegaSynth CVPR 512×512

Awesome Synthetic RGB-D Video Datasets for Training HD Video Depth Estimation Models

Dataset      Venue      Resolution B
o
T
C
3
R
D
2
U
D
P
G
C
M
o
G
P
O
M
R
D
U
D
2
V
D
A
1 Spring (to do) 1920×1080 - T T E T T - - - -
2 HorizonGS CVPR 1920×1080 - - - - - - - - - -
3 MVS-Synth (to do) 1920×1080 - T - T T T - - - -
4 Mid-Air (to do) 1024×1024 - - - - T T - - - -
5 MatrixCity (to do) 1000×1000 - - - - T T - - T -
6 SAIL-VOS 3D (to do) 1280×800 - - - T - - - - - -
7 BEDLAM (to do) 1280×720 - T - T - - - - T -
8 Dynamic Replica (to do) 1280×720 - T - T T - T - T -
9 BlinkVision ECCV 960×540 - - T - - - - - - -
10 PointOdyssey (to do) 960×540 - T T - - - T E T T
11 DyDToF (to do) 960×540 - - - - - - - E - -
12 IRS (to do) 960×540 - T - T T T - - - T
13 Scene Flow (to do) 960×540 - - - - E - - - - -
14 3D Ken Burns (to do) 512×512 - T - T T T - - - -
15 TartanAir (to do) 640×480 - T T T T T T T T T
16 ParallelDomain-4D ECCV 640×480 - - - - - - T - - -
17 GTA-SfM (to do) 640×480 - - - - T T - - - -
18 MPI Sintel (to do) 1024×436 E E E E E E E - E E
19 Virtual KITTI 2 (to do) 1242×375 - T - T T - - - - T
20 TartanAir Shibuya (to do) 640×360 E - - - - - - - - -
Total: T (training)
Total: E (testing)

List of Rankings

2D to 3D Video Conversion Rankings

  1. Stereo4D (400 video clips with 16 frames each at 5 fps): LPIPS<=0.242
  2. Qualitative comparison of four 2D to 3D video conversion methods: Rank (human perceptual judgment)

Monocular Depth Estimation Rankings

I. Rankings based on temporal consistency metrics

  1. ScanNet (170 frames): TAE<=2.2

II. Rankings based on 3D metrics

  1. iBims-1: F-score>=0.303

III. Rankings based on 2D metrics

  1. Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.079
  2. NYU-Depth V2: AbsRel<=0.0424 (relative depth)
  3. NYU-Depth V2: AbsRel<=0.051 (metric depth)

Appendices


Stereo4D (400 video clips with 16 frames each at 5 fps): LPIPS<=0.242

RK Model
Links:
         Venue   Repository    
   LPIPS ↓   
{Input fr.}
arXiv
Table 1
M2SVid
1 M2SVid
arXiv
0.180 {MF}
2 SVG
ICLR GitHub Stars
0.217 {MF}
3 StereoCrafter
arXiv GitHub Stars
0.242 {MF}

Back to Top Back to the List of Rankings

Qualitative comparison of four 2D to 3D video conversion methods: Rank (human perceptual judgment)

📝 Note: This ranking is based on my own perceptual judgement of the qualitative comparison results shown in Figure 7. One output frame (right view) is compared with one input frame (left view) from the video clip: 22_dogskateboarder and one output frame (right view) is compared with one input frame (left view) from the video clip: scooter-black

RK Model
Links:
         Venue   Repository    
Rank ↓
(human perceptual
judgment)
1 StereoCrafter
arXiv GitHub Stars
1
2-3 Immersity AI 2-3
2-3 Owl3D 2-3
4 Deep3D
ECCV GitHub Stars
4

Back to Top Back to the List of Rankings

ScanNet (170 frames): TAE<=2.2

RK Model
Links:
         Venue   Repository    
  TAE ↓  
{Input fr.}
CVPR
VDA
1 VDA-L
CVPR GitHub Stars
0.570 {MF}
2 DepthCrafter
CVPR GitHub Stars
0.639 {MF}
3 Depth Any Video
ICLR GitHub Stars
0.967 {MF}
4 ChronoDepth
CVPR GitHub Stars
1.022 {MF}
5 Depth Anything V2 Large
NeurIPS GitHub Stars
1.140 {1}
6 NVDS
ICCV GitHub Stars
2.176 {4}

Back to Top Back to the List of Rankings

iBims-1: F-score>=0.303

RK Model
Links:
         Venue   Repository    
  F-score ↑  
{Input fr.}
arXiv
TABLE I
UD2
  F-score ↑  
{Input fr.}
CVPR
Table 20
UniK3D
1 UniDepthV2-Large
arXiv GitHub Stars
0.709 {1} -
2 UniK3D-Large
CVPR GitHub Stars
- 0.698 {1}
3 Depth Pro
ICLR GitHub Stars
0.628 {1} 0.628 {1}
4 MASt3R
ECCV GitHub Stars
0.557 {2} 0.557 {2}
5 UniDepth
CVPR GitHub Stars
0.303 {1} 0.303 {1}

Back to Top Back to the List of Rankings

Bonn RGB-D Dynamic (5 video clips with 110 frames each): AbsRel<=0.079

📝 Note: 1) See Figure 4 2) The ranking order is determined in the first instance by a direct comparison of the scores of two models in the same paper. If there is no such direct comparison in any paper or there is a disagreement in different papers, the ranking order is determined by the best score of the compared two models in all papers that are shown in the columns as data sources. The DepthCrafter rank is based on the latest version 1.0.1.

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
CVPR
VDA
  AbsRel ↓  
{Input fr.}
arXiv
L4P
  AbsRel ↓  
{Input fr.}
arXiv
Geo4D
  AbsRel ↓  
{Input fr.}
CVPR
Align3R
  AbsRel ↓  
{Input fr.}
ICLR
MonST3R
  AbsRel ↓  
{Input fr.}
CVPR
DC
  AbsRel ↓  
{Input fr.}
CVPR
CUT3R
  AbsRel ↓  
{Input fr.}
CVPR
RD
1 Depth Any Video
ICLR GitHub Stars
0.051 {MF} - - - - - - -
2 VDA-L
CVPR GitHub Stars
0.053 {MF} - - - - - - -
3 L4P
arXiv
- 0.056 {MF} - - - - - -
4 Geo4D
arXiv GitHub Stars
- - 0.059 {MF} - - - - -
5 Depth Pro
ICLR GitHub Stars
- - - 0.067 {1} - - - -
6 Align3R (Depth Pro)
CVPR GitHub Stars
- - - 0.068 {2} - - - -
7 MonST3R
ICLR GitHub Stars
- - 0.063 {2} 0.082 {2} 0.063 {2} - 0.066 {2} -
8 DepthCrafter v1.0.1
CVPR GitHub Stars
0.066 {MF}
(DC v1.0.0)
0.071 {MF} 0.071 {MF} 0.075 {MF}
(DC v1.0.0)
0.075 {MF}
(DC v1.0.0)
0.071 {MF} 0.075 {MF}
(DC v1.0.0)
0.066 {MF}
(DC v1.0.0)
9 CUT3R
CVPR GitHub Stars
- - - - - - 0.074 {MF} -
10 RollingDepth
CVPR GitHub Stars
- - - - - - - 0.079 {MF}
11 Depth Anything
CVPR GitHub Stars
- 0.078 {1} - - - 0.078 {1} - 0.099 {1}

Back to Top Back to the List of Rankings

NYU-Depth V2: AbsRel<=0.0424 (relative depth)

📝 Note: The ranking order is determined in the first instance by a direct comparison of the scores of two models in the same paper. If there is no such direct comparison in any paper or there is a disagreement in different papers, the ranking order is determined by the best score of the compared two models in all papers that are shown in the columns as data sources. The Metric3D v2 ViT-Large rank is not based on a score of 0.134, which is probably just an anomaly.

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
CVPR
MoGe
  AbsRel ↓  
{Input fr.}
NeurIPS
BD
   AbsRel ↓   
{Input fr.}
arXiv
M3D v2
  AbsRel ↓  
{Input fr.}
CVPR
DA
    AbsRel ↓    
{Input fr.}
NeurIPS
DA V2
1 MoGe
CVPR GitHub Stars
0.0341 {1} - - - -
2 UniDepth
CVPR GitHub Stars
0.0380 {1} - - - -
3-5 BetterDepth
NeurIPS
- 0.042 {1} - - -
3-5 Depth Anything V2 Large
NeurIPS GitHub Stars
0.0420 {1} - - - 0.045 {1}
3-5 Metric3D v2 ViT-Large
TPAMI GitHub Stars
0.134 {1} - 0.042 {1} - -
6 Depth Anything Large
CVPR GitHub Stars
0.0424 {1} 0.043 {1} 0.043 {1} 0.043 {1} 0.043 {1}

Back to Top Back to the List of Rankings

NYU-Depth V2: AbsRel<=0.051 (metric depth)

RK Model
Links:
         Venue   Repository    
  AbsRel ↓  
{Input fr.}
CVPR
Table 16
UniK3D
  AbsRel ↓  
{Input fr.}
arXiv
UD2
   AbsRel ↓   
{Input fr.}
arXiv
M3D v2
  AbsRel ↓  
{Input fr.}
arXiv
Table 2
MS
  AbsRel ↓  
{Input fr.}
arXiv
GRIN
1 UniK3D
CVPR GitHub Stars
0.0443 {1} - - - -
2 UniDepthV2
arXiv GitHub Stars
- 0.0468 {1} - - -
3 Metric3D v2 ViT-L FT
TPAMI GitHub Stars
0.0470 {1} 0.0470 {1} 0.047 {1} - -
4 Metric-Solver
arXiv GitHub Stars
- - - 0.049 {1} -
5 GRIN_FT_NI
arXiv
- - - - 0.051 {1}

Back to Top Back to the List of Rankings

Appendix 3: List of all research papers from the above rankings

Method Abbr. Paper      Venue     
(Alt link)
Official
  repository  
Align3R - Align3R: Aligned Monocular Depth Estimation for Dynamic Videos CVPR GitHub Stars
BetterDepth BD BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation NeurIPS -
ChronoDepth - Learning Temporally Consistent Video Depth from Video Diffusion Priors CVPR GitHub Stars
CUT3R C3R Continuous 3D Perception Model with Persistent State CVPR GitHub Stars
Deep3D - Deep3D: Fully Automatic 2D-to-3D Video Conversion with Deep Convolutional Neural Networks ECCV GitHub Stars
Depth Any Video DAV Depth Any Video with Scalable Synthetic Data ICLR GitHub Stars
Depth Anything DA Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data CVPR GitHub Stars
Depth Anything V2 DA V2 Depth Anything V2 NeurIPS GitHub Stars
Depth Pro DP Depth Pro: Sharp Monocular Metric Depth in Less Than a Second ICLR GitHub Stars
DepthCrafter DC DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos CVPR GitHub Stars
Geo4D - Geo4D: Leveraging Video Generators for Geometric 4D Scene Reconstruction arXiv GitHub Stars
GRIN - GRIN: Zero-Shot Metric Depth with Pixel-Level Diffusion arXiv -
L4P - L4P: Low-Level 4D Vision Perception Unified arXiv -
M2SVid - M2SVid: End-to-End Inpainting and Refinement for Monocular-to-Stereo Video Conversion arXiv -
MASt3R - Grounding Image Matching in 3D with MASt3R ECCV GitHub Stars
Metric3D v2 M3D v2 Metric3D v2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation TPAMI
arXiv
GitHub Stars
Metric-Solver MS Metric-Solver: Sliding Anchored Metric Depth Estimation from a Single Image arXiv GitHub Stars
MoGe MoG MoGe: Unlocking Accurate Monocular Geometry Estimation for Open-Domain Images with Optimal Training Supervision CVPR GitHub Stars
MonST3R - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion ICLR GitHub Stars
NVDS - Neural Video Depth Stabilizer ICCV GitHub Stars
RollingDepth RD Video Depth without Video Models CVPR GitHub Stars
StereoCrafter - StereoCrafter: Diffusion-based Generation of Long and High-fidelity Stereoscopic 3D from Monocular Videos arXiv GitHub Stars
SVG - SVG: 3D Stereoscopic Video Generation via Denoising Frame Matrix ICLR GitHub Stars
UniDepth UD UniDepth: Universal Monocular Metric Depth Estimation CVPR GitHub Stars
UniDepthV2 UD2 UniDepthV2: Universal Monocular Metric Depth Estimation Made Simpler arXiv GitHub Stars
UniK3D - UniK3D: Universal Camera Monocular 3D Estimation CVPR GitHub Stars
Video Depth Anything VDA Video Depth Anything: Consistent Depth Estimation for Super-Long Videos CVPR GitHub Stars

Back to Top Back to the List of Rankings

About

Rankings include: Align3R BetterDepth ChronoDepth CUT3R Deep3D Depth Any Video Depth Anything Depth Pro DepthCrafter Geo4D GRIN L4P M2SVid MASt3R Metric3D Metric-Solver MoGe MonST3R NVDS RollingDepth StereoCrafter SVG UniDepth UniK3D Video Depth Anything

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published