Cluster Analysis

distance function

EuclideanDistance

dataset

number of clusters

method

Agglomerate

Optimize

Part

:The expression ConvexHull[2] cannot be used as a part specification.

Part

:The expression ConvexHull[MethodOptimize] cannot be used as a part specification.

Part

:The expression ConvexHull[DistanceFunctionEuclideanDistance] cannot be used as a part specification.

General

:Further output of Part::pkspec1 will be suppressed during this calculation.

Cluster analysis is a key activity in exploratory data analysis. This Demonstration lets you experiment with various distance functions and clustering methods to partition randomly generated sets of 2D points into separate clusters.

The clustering methods

"Agglomerate"

and

"Optimize"

determine how to cluster the data for a particular number of clusters

k

.

"Agglomerate"

uses an agglomerative hierarchical method starting with each member of the set in a cluster of its own and fusing nearest clusters until there are

k

remaining.

"Optimize"

starts by building a set of

k

representative objects and clustering around those, iterating until a (locally) optimal clustering is found.