This repo contains some benchmarks for evaluating Multi-view Multi-class machine learning algorithms.
📢 More information about the datasets can be found in [Google Sheets | Tencent Docs].
No. | Datasets | #Samples | #Classes | #Views | Tag | Reference |
---|---|---|---|---|---|---|
1 | 100Leaves | 1,600 | 100 | 3 | Plant leaf classification using probabilistic integration of shape, texture and margin features | |
2 | Caltech101-7 | 1,474 | 7 | 6 | imbalance |
Large-scale multi-view spectral clustering via bipartite graph |
3 | Caltech101-20 | 2,386 | 20 | 6 | imbalance |
Deep Incomplete Multi-View Learning Network with Insufficient Label Information |
4 | Caltech101 | 9,144 | 102 | 6 | imbalance |
Binary Multi-View Clustering |
5 | Deep Caltech101 | 8,677 | 101 | 2 | imbalance |
Trusted Multi-View Classification |
6 | Caltech256 | 30,607 | 257 | 3 | imbalance |
Auto-weighted Multi-view Clustering for Large-scale Data |
7 | Deep AWA_2views | 10,158 | 50 | 2 | imbalance |
Deep Partial Multi-View Learning |
8 | Reuters_2views | 18,758 | 6 | 2 | imbalance |
Multi-view Spectral Clustering Network |
9 | NoisyMNIST | 70,000 | 10 | 2 | Robust Multi-View Clustering With Incomplete Information | |
10 | NoisyMNIST | 30,000 | 10 | 2 | Robust Multi-View Clustering With Incomplete Information | |
11 | MNIST-USPS | 5,000 | 10 | 2 | Robust Multi-View Clustering With Incomplete Information | |
12 | Scene15 | 4,485 | 15 | 3 | Ensemble projection for semi-supervised image classification | |
13 | Out-Scene | 2,688 | 8 | 4 | Deep Incomplete Multi-View Learning Network with Insufficient Label Information | |
14 | NUS-WIDE | 30,000 | 31 | 5 | imbalance |
Fast Multi-view Clustering via Ensembles: Towards Scalability, Superiority, and Simplicity |
✨ We have collated some publicly available datasets and you can download them from Baidu Netdisk. The data format is as follows:
xxx.mat
├── gnd: matrix, double, start from 1, (sample_number, 1).
├── X: cell, (1, view_num)
└── └── X{i}: matrix, double, (sample_number, feature_dimension).
Abbreviation | Full Name |
---|---|
SIFT | Scale Invariant Feature Transform |
SD | Shape Descriptor |
FSM | Fine Scale Margin |
FOU | FOUrier coefficients of the character shapes |
FAC | profile correlations |
PIX | PIXel averages in 2 × 3 windows |
ZER | ZERnike moment |
MOR | MORphological features |
Gabor | Gabor feature |
WM | Wavelet Moments |
CENTRIST/CENT | CENTRIST feature |
LBP | Local Binary Patterns feature |
CH | Color Histogram |
TH | Texture Histogram |
CM | Color Moments |
CS | Color Similiarity |
CORR | color CORRelation |
EDH | Edge Distribution Histogram |
WT | Wavelet Texture |
[Note] Modified from this repo.
- [2024/12/30] Update the visual feature list of the image datasets!
- [2024/12/29] The script to evaluate the modality quality with K-Means clustering algorithm is uploaded
modality_evaluation/modality_eval.py
! - [2024/08/12] The script for the label distribution plot is uploaded
label_distribution/plot_label_distribution.ipynb
! - [2024/08/08] Create a share link to datasets we have collected from the Internet for public research. [Baidu Netdisk]
We simply adopt the K-Means clustering algorithm as a baseline to evaluate the contribution of each modality under the NMI and Silhouette metrics. The results are as follows (more figures can be found in the modality_evaluation
fold):
📢 More figures can be found in the label_distribution
fold!
Some datasets were downloaded from these sites, for which we are very grateful:
[1] https://github.com/liujiyuan13/mvdata
[2] https://github.com/wangsiwei2010/large_scale_multi-view_clustering_datasets