Clustering in Machine Learning

Clustering is an unsupervised learning technique used in machine learning to uncover potential relationships and groupings, commonly known as "clusters".

What is a cluster? It's a collection or group of objects that share some, but not all, characteristics. They're similar but not identical. The concept of a cluster is consistent across fields like machine learning, statistics, and marketing.

In machine learning, clustering is also referred to as unsupervised classification.

A Practical Example

In this dataset, there are three features: x₁, x₂, x₃.

a practical example

The machine lacks label information and any learning function. There's no supervision.

Despite this, the table reveals a significant relationship between x₁, x₂, and x₃.

To illustrate this, let's temporarily ignore x₃ and plot x₁ and x₂ on a Cartesian graph.

Even in this two-dimensional graph, a pattern and regularity in the data begin to emerge.

the two-dimensional graphical representation

Next, I assign different colors (blue, red) to the coordinates (x₁, x₂) to represent the third feature, x₃, or the third dimension.

Blue for x₃=1 and red for x₃=2.

three-dimensional clustering

Now, the clustering is immediately apparent even to the naked eye.

In clusters A and B, similar data points are grouped together.

This way, the machine learns significant information from the data, without any guidance from a supervisor.

Note. This is a simple two-dimensional example, but it illustrates the concept. In reality, clustering is particularly useful when applied to multidimensional databases, where the human eye can't discern patterns.

In machine learning, clustering algorithms are used to identify relationships between data through a mathematical-statistical learning process.