GaussianMixtureModellingSA
[this page | pdf | references | back links]
Function Description
Returns the result (as a single array) of applying a
Gaussian mixture modelling analysis, as per Press et
al. (2007).
In Gaussian mixture modelling, we have n m-dimensional
datapoints (where n = NumberOfSeries, m = NumberOfDataPointsPerSeries)
and we assume that each datapoint may have come from one of a number (NumberOfMixtureComponents)
of different multi-variate Normal (i.e. Gaussian) probability distributions. We
use the EM algorithm to identify the Gaussian mixture model with that number of
Gaussian distributions that best fits the data in a maximum likelihood sense.
To do so we provide initial starting means (i.e. centres) for each cluster, an
initial value to ascribe to the covariance matrix terms and a termination
criterion which stops the algorithm when an iteration appears to have added
very little to the (log) likelihood. Also included is a backstop Itermax,
which stops the algorithm carrying out a very large number of iterations.
InputData is a 2d array of size NumberOfSeries
x NumberOfDataPointsPerSeries and StartMeans is a 2d array of
size NumberOfClusters x NumberOfSeries.
The output is an array with 3 + NumberOfClusters x NumberOfSeries
+ 1 + NumberOfClusters x NumberOfSeries x NumberOfSeries +
NumberOfClusters + NumberOfDataPointsPerSeries x NumberOfClusters
as follows:
First 3 values = NumberOfSeries, NumberOfDataPointsPerSeries
and NumberOfMixtureComponents
Following NumberOfClusters x NumberOfSeries
values = means (centres) of each mixture component
Next value = Log likelihood of the data given the selected Gaussian
mixture
Following NumberOfClusters x NumberOfSeries x NumberOfSeries
values = covariances of each mixture component
Following NumberOfClusters = probability of a random
datapoint being drawn from a given mixture component
Following NumberOfDataPointsPerSeries x NumberOfClusters
values = probability of each individual datapoint being in a particular mixture
component.
See also k-means
clustering (which can be thought of as akin to a simplified version of
Gaussian mixture modelling) and example
data series for Gaussian mixture modelling and k-means clustering.
The algorithm used is derived from Press et
al. (2007).
WARNING
For some data series used to tested the algorithm the log
likelihood function appears to have many local maxima using the implementation
of the EM algorithm by the Nematrian website. This means that the precise
mixture selected is sensitive to the starting values inserted into the
algorithm, in particular here the initial starting means. This reduces the
usefulness of the algorithm for finding robust estimates of the different
mixture components.
NAVIGATION LINKS
Contents | Prev | Next
Links to:
-
Interactively run function
-
Interactive instructions
-
Example calculation
-
Output type / Parameter details
-
Illustrative spreadsheet
-
Other Markov processes functions
-
Computation units used
Note: If you use any Nematrian web service either programmatically or interactively then you will be deemed to have agreed to the Nematrian website License Agreement