GitHub - AISoltani/Clustering-in-Python: Clustering methods in Machine Learning includes both theory and python code of each algorithm. Algorithms include K Mean, K Mode, Hierarchical, DB Scan and Gaussian Mixture Model GMM. Interview questions on clustering are also added in the end.

Welcome to Clustering (Theory & Code)

01 Unsupervised Learning (Theory)

What is Unsupervised Learning & Goals of Unsupervised Learning
Type of Unsupervised Learning: 1.Clustering, 2.Association Rule & 3.Dimensionality Reduction

02 Clustering (Theory)

Definition and Application of Clustering
4 methods: 1.K Means 2.Hierarchical 3.DBScan & 4.Gaussian Mixture

03 Euclidean & Manhattan Distance (Theory)

Two points are near to each other, chances they are similar
Distance Measure between two points
1. Euclidean Distance: Under-root of Square distance between two points
2. Manhattan Distance: Absolute Distance between points

04 K-Means Clustering (Theory)

How Algorithim works (Step Wise Calculation)
Pre-processing required for K Means
Determining optimal number of K: 1.Profiling Approach & 2.Elbow Method

05 Elbow Method (Theory)

Working of Elbow Method with Example
3 concepts: 1.Total Error, 2.Variance/Total Squared Error & 3.Within Cluster Sum of Square (WCSS)

06 K Means Clustering (Python Code)

Define number of clusters, take centroids and measure distance
Euclidean Distance : Measure distance between points
Number of Clusters defined by Elbow Method
Elbow Method : WCSS vs Number of Cluster
Silhouette Score : Goodness of Clustering

07 Hierarchical Clustering (Theory)

Two Approaches: 1.Agglomerative(Botton-Up) & 2.Divisive(Top-Down)
Types of Linkages:
1. Single Linkage - Nearest Neighbour (Minimal intercluster dissimilarity)
2. Complete Linkage - Farthest Neighbour (Maximal intercluster dissimilarity)
3. Average Linkage - Average Distance (Mean intercluster dissimilarity)
Steps in Agglomerative Hierarchical Clustering with Single Linkage
Determining optimal number of Cluster: Dendogram

08 Dendogram (Theory)

Hierarchical relationship between objects
Optimal number of Clusters for Hierarchical Clustering

09 Hierarchical Clustering (Python Code)

Type of HC
1. Agglomerative : Bottom Up approach
2. Divisive : Top Down approach
Number of Clusters defined by Dendogram
Dendogram : Joining datapoints based on distance & creating clusters
Linkage : To calculate distance between two points of two clusters
1. Single linkage : Minimum Distance between two clusters
2. Complete linkage : Maximum Distance between two clusters
3. Average linkage : Average Distance between two clusters

10 DBScan Clustering (Theory)

Density Based Clustering
Kmeans & Hierarchical good for compact & well seperated Data
Both are sensitive to Outliers & Noise
DBScan overcome all the issue & works well with Outliers
2 important parameters -
1. eps: Distance between 2 points is lower/equal to eps they are neighbours
2. MinPts: Minimum number of neighbours/data points with eps radius

11 DBScan Clustering (Python Code)

No need to give pre-define clusters
Distance metric is Euclidean Distance
Need to give 2 parameters
1. eps : Radius of the circle
2. min_samples : minimum data points to consider it as clusters

12 GMM Clustering (Theory)

Weakness of K Means
Expectation Maximization(EM) method

13 Gausian Mixture Model Clustering (Python Code)

Probablistic Model
Uses Expectation-Minimization (EM) steps:
1. E Step : Probability of datapoint of each cluster
2. M Step : For each cluster,revise parameter based on proabability

14 Cluster Adjustment (Theory)

2 Steps we normally do for Cluster Adjustement
1. Quality of Clustering (Cardinality & Magnitude)
2. Performance of Similiarity Measure (Euclidean Distance)

15 Silhouette Coefficient - Cluster Validation (Theory)

Clusters are well apart from each other as the silhouette score is closer to 1
It is a metric used to calculate the goodness of a clustering technique
Its value ranges from -1 to 1.
1. 1: Means clusters are well apart from each other and clearly distinguished
2. 0: Means clusters are indifferent, or distance between clusters is not significant
3. -1: Means clusters are assigned in the wrong way

16 Disadvantage & Choosing Right Clustering Method (Theory)

Disadvantage of each clustering techniques respectively
Based on the data, which is the right clustering method

17 Clustering Revision (Theory)

Short Description of Each Clustering Alogrithim
Advantage, Disadvantage
When to use what

18 Interview Questions on Clustering (Theory)

Commonly asked question on Clustering

19 K Modes (Theory)

For Categorical variable clustering, use K Modes
It uses the dissimilarities(total mismatch) between data points
Lesser the dissimilarities, the more our data points are closer
End.
It uses Mode for most value in the column

20 K Modes (Python Code)

K Mode code in Python

Name	Name	Last commit message	Last commit date
Latest commit AISoltani Update README.md Jan 8, 2024 9b4318c · Jan 8, 2024 History 81 Commits
01 Unsupervised Learning.pdf	01 Unsupervised Learning.pdf	Add files via upload	Aug 7, 2020
02 Clustering.pdf	02 Clustering.pdf	Add files via upload	Aug 9, 2020
03_Distance_Metrics_in_ML.ipynb	03_Distance_Metrics_in_ML.ipynb	Created using Colaboratory	Oct 29, 2021
04 K Means Clustering.pdf	04 K Means Clustering.pdf	Add files via upload	Aug 11, 2020
05 Elbow Method.pdf	05 Elbow Method.pdf	Add files via upload	Aug 11, 2020
06_K_Means_Clustering.ipynb	06_K_Means_Clustering.ipynb	Created using Colaboratory	Oct 29, 2021
07 Hierarchical Clustering.pdf	07 Hierarchical Clustering.pdf	Add files via upload	Aug 12, 2020
08 Dendogram.pdf	08 Dendogram.pdf	Add files via upload	Aug 12, 2020
09_Hierarchical_Clustering.ipynb	09_Hierarchical_Clustering.ipynb	Created using Colaboratory	Oct 30, 2021
10 DBScan Clustering.pdf	10 DBScan Clustering.pdf	Add files via upload	Aug 11, 2020
11_DBScan_Clustering.ipynb	11_DBScan_Clustering.ipynb	Created using Colaboratory	Nov 4, 2021
12 GMM Clustering.pdf	12 GMM Clustering.pdf	Add files via upload	Aug 14, 2020
13_Gaussian_Mixture_Model.ipynb	13_Gaussian_Mixture_Model.ipynb	Created using Colaboratory	Nov 6, 2021
14 Cluster Adjustment .pdf	14 Cluster Adjustment .pdf	Add files via upload	Aug 14, 2020
15 Silhouette Coefficient - Cluster Validation.pdf	15 Silhouette Coefficient - Cluster Validation.pdf	Add files via upload	Aug 14, 2020
16 Disadvantage & Choosing Right Clustering .pdf	16 Disadvantage & Choosing Right Clustering .pdf	Add files via upload	Aug 14, 2020
17 Clustering Revision.pdf	17 Clustering Revision.pdf	Add files via upload	Nov 6, 2021
18 Clustering Interview Questions .pdf	18 Clustering Interview Questions .pdf	Add files via upload	Nov 6, 2021
19 K Modes.pdf	19 K Modes.pdf	Add files via upload	Dec 9, 2021
20_K_Modes.ipynb	20_K_Modes.ipynb	Created using Colaboratory	Dec 9, 2021
README.md	README.md	Update README.md	Jan 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Welcome to Clustering (Theory & Code)

01 Unsupervised Learning (Theory)

02 Clustering (Theory)

03 Euclidean & Manhattan Distance (Theory)

04 K-Means Clustering (Theory)

05 Elbow Method (Theory)

06 K Means Clustering (Python Code)

07 Hierarchical Clustering (Theory)

08 Dendogram (Theory)

09 Hierarchical Clustering (Python Code)

10 DBScan Clustering (Theory)

11 DBScan Clustering (Python Code)

12 GMM Clustering (Theory)

13 Gausian Mixture Model Clustering (Python Code)

14 Cluster Adjustment (Theory)

15 Silhouette Coefficient - Cluster Validation (Theory)

16 Disadvantage & Choosing Right Clustering Method (Theory)

17 Clustering Revision (Theory)

18 Interview Questions on Clustering (Theory)

19 K Modes (Theory)

20 K Modes (Python Code)

About

Uh oh!

Releases

Packages

Languages

AISoltani/Clustering-in-Python

Folders and files

Latest commit

History

Repository files navigation

Welcome to Clustering (Theory & Code)

01 Unsupervised Learning (Theory)

02 Clustering (Theory)

03 Euclidean & Manhattan Distance (Theory)

04 K-Means Clustering (Theory)

05 Elbow Method (Theory)

06 K Means Clustering (Python Code)

07 Hierarchical Clustering (Theory)

08 Dendogram (Theory)

09 Hierarchical Clustering (Python Code)

10 DBScan Clustering (Theory)

11 DBScan Clustering (Python Code)

12 GMM Clustering (Theory)

13 Gausian Mixture Model Clustering (Python Code)

14 Cluster Adjustment (Theory)

15 Silhouette Coefficient - Cluster Validation (Theory)

16 Disadvantage & Choosing Right Clustering Method (Theory)

17 Clustering Revision (Theory)

18 Interview Questions on Clustering (Theory)

19 K Modes (Theory)

20 K Modes (Python Code)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages