[this page | pdf | references | back links]
Returns the result (in a single array) of applying a k-means
clustering analysis, as per Press et al.
In k-means clustering, we have n m-dimensional
datapoints (where n = NumberOfSeries, m = NumberOfDataPointsPerSeries)
and we ascribe them to clusters depending on how near they are (in a spherical
Euclidean sense) to potential cluster centres. We need to specify the number of
clusters (NumberOfClusters) and some initial starting means (i.e. centres) for
each cluster. The algorithm then uses a variant of the EM algorithm to find
which datapoints belong to which clusters and where the cluster centres need to
be to minimise the sum of the distance that the datapoints are away from their
InputData is a 2d array of size NumberOfSeries
x NumberOfDataPointsPerSeries and StartMeans is a 2d array of
size NumberOfClusters x NumberOfSeries.
The output is an array with 3 + NumberOfClusters x NumberOfSeries
+ NumberOfDataPointsPerSeries as follows:
First 3 values = NumberOfSeries, NumberOfDataPointsPerSeries
Following NumberOfClusters x NumberOfSeries
values = locations of means (centres) of each cluster
Following NumberOfDataPointsPerSeries values =
cluster to which datapoint has been assigned (counting from 0).
The probability that a datapoint selected randomly is in a
particular cluster can be found if needed by counting the proportion of the
datapoints assigned to different clusters.
See also Gaussian
mixture modelling (which can be thought of as akin to a generalisation of k-means
clustering) and example
data series for Gaussian mixture modelling and k-means clustering.
Contents | Prev | Next
Interactively run function
Output type / Parameter details
Other Markov processes functions
Computation units used
Note: If you use any Nematrian web service either programmatically or interactively then you will be deemed to have agreed to the Nematrian website License Agreement