Skip to content

This repo contains some benchmarks for evaluating Multi-view Multi-class machine learning algorithms.

Notifications You must be signed in to change notification settings

ZhangqiJiang07/Multi-view_Multi-class_Datasets

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 

Repository files navigation

Multi-view Multi-class Datasets

This repo contains some benchmarks for evaluating Multi-view Multi-class machine learning algorithms.

📄 Statistics of Datasets

📢 More information about the datasets can be found in [Google Sheets | Tencent Docs].

No. Datasets #Samples #Classes #Views Tag Reference
1 100Leaves 1,600 100 3 Plant leaf classification using probabilistic integration of shape, texture and margin features
2 Caltech101-7 1,474 7 6 imbalance Large-scale multi-view spectral clustering via bipartite graph
3 Caltech101-20 2,386 20 6 imbalance Deep Incomplete Multi-View Learning Network with Insufficient Label Information
4 Caltech101 9,144 102 6 imbalance Binary Multi-View Clustering
5 Deep Caltech101 8,677 101 2 imbalance Trusted Multi-View Classification
6 Caltech256 30,607 257 3 imbalance Auto-weighted Multi-view Clustering for Large-scale Data
7 Deep AWA_2views 10,158 50 2 imbalance Deep Partial Multi-View Learning
8 Reuters_2views 18,758 6 2 imbalance Multi-view Spectral Clustering Network
9 NoisyMNIST 70,000 10 2 Robust Multi-View Clustering With Incomplete Information
10 NoisyMNIST 30,000 10 2 Robust Multi-View Clustering With Incomplete Information
11 MNIST-USPS 5,000 10 2 Robust Multi-View Clustering With Incomplete Information
12 Scene15 4,485 15 3 Ensemble projection for semi-supervised image classification
13 Out-Scene 2,688 8 4 Deep Incomplete Multi-View Learning Network with Insufficient Label Information
14 NUS-WIDE 30,000 31 5 imbalance Fast Multi-view Clustering via Ensembles: Towards Scalability, Superiority, and Simplicity

✨ We have collated some publicly available datasets and you can download them from Baidu Netdisk. The data format is as follows:

xxx.mat
├── gnd: matrix, double, start from 1, (sample_number, 1).
├── X: cell, (1, view_num)
└── └── X{i}: matrix, double, (sample_number, feature_dimension).

📌 Visual features for image datasets

Abbreviation Full Name
SIFT Scale Invariant Feature Transform
SD Shape Descriptor
FSM Fine Scale Margin
FOU FOUrier coefficients of the character shapes
FAC profile correlations
PIX PIXel averages in 2 × 3 windows
ZER ZERnike moment
MOR MORphological features
Gabor Gabor feature
WM Wavelet Moments
CENTRIST/CENT CENTRIST feature
LBP Local Binary Patterns feature
CH Color Histogram
TH Texture Histogram
CM Color Moments
CS Color Similiarity
CORR color CORRelation
EDH Edge Distribution Histogram
WT Wavelet Texture

[Note] Modified from this repo.

🔥 Update

  • [2024/12/30] Update the visual feature list of the image datasets!
  • [2024/12/29] The script to evaluate the modality quality with K-Means clustering algorithm is uploaded modality_evaluation/modality_eval.py!
  • [2024/08/12] The script for the label distribution plot is uploaded label_distribution/plot_label_distribution.ipynb!
  • [2024/08/08] Create a share link to datasets we have collected from the Internet for public research. [Baidu Netdisk]

🌋 Modality Evaluation

We simply adopt the K-Means clustering algorithm as a baseline to evaluate the contribution of each modality under the NMI and Silhouette metrics. The results are as follows (more figures can be found in the modality_evaluation fold):

📊 Label Distribution

📢 More figures can be found in the label_distribution fold!

Acknowledgements

Some datasets were downloaded from these sites, for which we are very grateful:

[1] https://github.com/liujiyuan13/mvdata

[2] https://github.com/wangsiwei2010/large_scale_multi-view_clustering_datasets

About

This repo contains some benchmarks for evaluating Multi-view Multi-class machine learning algorithms.

Topics

Resources

Stars

Watchers

Forks