K-means clustering belongs to the class of unsupervised learning algorithms. The main idea is to partition a group of objects in a pre-defined number of clusters in a way that objects in the same cluster are more similar to each other than to other objects from other clusters.

The algorithm to detemine the final set of clusters can be divided in the following steps:

1. choose k – the number of clusters

2. select k random points as the initial centroids

3. assign each data point to the nearest cluster based on the distance of the data point to the centroid (use Euclidean distance)

4. calculate the new coordinates of all clusters centroids by averaging coordinates over all clusters members

5. repeat steps 3. and 4. until the clusters members do not change anymore (no transitions between clusters)

Data

Let us first generate a set of data points from multivariate normal distribution:

We need some functions to perform main steps of the algorithm:

Iterating through the algorithm:

and plotting the clusters with:

gives the following evolution to final cluster composition:

Get in Touch

Do you need consultation or have a project in mind? We would love to hear from you!

Get in Touch