GaussianMixtureModellingSA
[this page  pdf  references  back links]
Function Description
Returns the result (as a single array) of applying a
Gaussian mixture modelling analysis, as per Press et
al. (2007).
In Gaussian mixture modelling, we have n mdimensional
datapoints (where n = NumberOfSeries, m = NumberOfDataPointsPerSeries)
and we assume that each datapoint may have come from one of a number (NumberOfMixtureComponents)
of different multivariate Normal (i.e. Gaussian) probability distributions. We
use the EM algorithm to identify the Gaussian mixture model with that number of
Gaussian distributions that best fits the data in a maximum likelihood sense.
To do so we provide initial starting means (i.e. centres) for each cluster, an
initial value to ascribe to the covariance matrix terms and a termination
criterion which stops the algorithm when an iteration appears to have added
very little to the (log) likelihood. Also included is a backstop Itermax,
which stops the algorithm carrying out a very large number of iterations.
InputData is a 2d array of size NumberOfSeries
x NumberOfDataPointsPerSeries and StartMeans is a 2d array of
size NumberOfClusters x NumberOfSeries.
The output is an array with 3 + NumberOfClusters x NumberOfSeries
+ 1 + NumberOfClusters x NumberOfSeries x NumberOfSeries +
NumberOfClusters + NumberOfDataPointsPerSeries x NumberOfClusters
as follows:
First 3 values = NumberOfSeries, NumberOfDataPointsPerSeries
and NumberOfMixtureComponents
Following NumberOfClusters x NumberOfSeries
values = means (centres) of each mixture component
Next value = Log likelihood of the data given the selected Gaussian
mixture
Following NumberOfClusters x NumberOfSeries x NumberOfSeries
values = covariances of each mixture component
Following NumberOfClusters = probability of a random
datapoint being drawn from a given mixture component
Following NumberOfDataPointsPerSeries x NumberOfClusters
values = probability of each individual datapoint being in a particular mixture
component.
See also kmeans
clustering (which can be thought of as akin to a simplified version of
Gaussian mixture modelling) and example
data series for Gaussian mixture modelling and kmeans clustering.
The algorithm used is derived from Press et
al. (2007).
WARNING
For some data series used to tested the algorithm the log
likelihood function appears to have many local maxima using the implementation
of the EM algorithm by the Nematrian website. This means that the precise
mixture selected is sensitive to the starting values inserted into the
algorithm, in particular here the initial starting means. This reduces the
usefulness of the algorithm for finding robust estimates of the different
mixture components.
NAVIGATION LINKS
Contents  Prev  Next
Links to:

Interactively run function

Interactive instructions

Example calculation

Output type / Parameter details

Illustrative spreadsheet

Other Markov processes functions

Computation units used
Note: If you use any Nematrian web service either programmatically or interactively then you will be deemed to have agreed to the Nematrian website License Agreement