Multi-view Multi-class Datasets

This repo contains some benchmarks for evaluating Multi-view Multi-class machine learning algorithms.

📄 Statistics of Datasets

📢 More information about the datasets can be found in [Google Sheets | Tencent Docs].

No.	Datasets	#Samples	#Classes	#Views	Tag	Reference
1	100Leaves	1,600	100	3		Plant leaf classification using probabilistic integration of shape, texture and margin features
2	Caltech101-7	1,474	7	6	`imbalance`	Large-scale multi-view spectral clustering via bipartite graph
3	Caltech101-20	2,386	20	6	`imbalance`	Deep Incomplete Multi-View Learning Network with Insufficient Label Information
4	Caltech101	9,144	102	6	`imbalance`	Binary Multi-View Clustering
5	Deep Caltech101	8,677	101	2	`imbalance`	Trusted Multi-View Classification
6	Caltech256	30,607	257	3	`imbalance`	Auto-weighted Multi-view Clustering for Large-scale Data
7	Deep AWA_2views	10,158	50	2	`imbalance`	Deep Partial Multi-View Learning
8	Reuters_2views	18,758	6	2	`imbalance`	Multi-view Spectral Clustering Network
9	NoisyMNIST	70,000	10	2		Robust Multi-View Clustering With Incomplete Information
10	NoisyMNIST	30,000	10	2		Robust Multi-View Clustering With Incomplete Information
11	MNIST-USPS	5,000	10	2		Robust Multi-View Clustering With Incomplete Information
12	Scene15	4,485	15	3		Ensemble projection for semi-supervised image classification
13	Out-Scene	2,688	8	4		Deep Incomplete Multi-View Learning Network with Insufficient Label Information
14	NUS-WIDE	30,000	31	5	`imbalance`	Fast Multi-view Clustering via Ensembles: Towards Scalability, Superiority, and Simplicity

✨ We have collated some publicly available datasets and you can download them from Baidu Netdisk. The data format is as follows:

xxx.mat
├── gnd: matrix, double, start from 1, (sample_number, 1).
├── X: cell, (1, view_num)
└── └── X{i}: matrix, double, (sample_number, feature_dimension).

📌 Visual features for image datasets

Abbreviation	Full Name
SIFT	Scale Invariant Feature Transform
SD	Shape Descriptor
FSM	Fine Scale Margin
FOU	FOUrier coefficients of the character shapes
FAC	profile correlations
PIX	PIXel averages in 2 × 3 windows
ZER	ZERnike moment
MOR	MORphological features
Gabor	Gabor feature
WM	Wavelet Moments
CENTRIST/CENT	CENTRIST feature
LBP	Local Binary Patterns feature
CH	Color Histogram
TH	Texture Histogram
CM	Color Moments
CS	Color Similiarity
CORR	color CORRelation
EDH	Edge Distribution Histogram
WT	Wavelet Texture

[Note] Modified from this repo.

🔥 Update

[2024/12/30] Update the visual feature list of the image datasets!
[2024/12/29] The script to evaluate the modality quality with K-Means clustering algorithm is uploaded modality_evaluation/modality_eval.py!
[2024/08/12] The script for the label distribution plot is uploaded label_distribution/plot_label_distribution.ipynb!
[2024/08/08] Create a share link to datasets we have collected from the Internet for public research. [Baidu Netdisk]

🌋 Modality Evaluation

We simply adopt the K-Means clustering algorithm as a baseline to evaluate the contribution of each modality under the NMI and Silhouette metrics. The results are as follows (more figures can be found in the modality_evaluation fold):

📊 Label Distribution

📢 More figures can be found in the label_distribution fold!

Acknowledgements

Some datasets were downloaded from these sites, for which we are very grateful:

[1] https://github.com/liujiyuan13/mvdata

[2] https://github.com/wangsiwei2010/large_scale_multi-view_clustering_datasets

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-view Multi-class Datasets

📄 Statistics of Datasets

📌 Visual features for image datasets

🔥 Update

🌋 Modality Evaluation

📊 Label Distribution

Acknowledgements

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
label_distribution		label_distribution
modality_evaluation		modality_evaluation
README.md		README.md

ZhangqiJiang07/Multi-view_Multi-class_Datasets

Folders and files

Latest commit

History

Repository files navigation

Multi-view Multi-class Datasets

📄 Statistics of Datasets

📌 Visual features for image datasets

🔥 Update

🌋 Modality Evaluation

📊 Label Distribution

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Languages