Function library: MnGaussianMixtureModellingSA (GaussianMixtureModellingSA)

GaussianMixtureModellingSA

[this page | pdf | references | back links]

Interactively run this function

Answer

Parameter Name	Input	An input expression?	Delimiter
InputData	ExampleDataGMMandKMCDataSeries()
NumberOfSeries
NumberOfDataPointsPerSeries
NumberOfMixtureComponents
StartMeans	0.03,0,0,0.03,-0.03,0,0,-0.03
StartSigma
TerminationCriterion
Itermax

Delimiter used in output:

Calculation description
Time-stamp calculation?

Function Description

Returns the result (as a single array) of applying a Gaussian mixture modelling analysis, as per Press et al. (2007).

In Gaussian mixture modelling, we have n m-dimensional datapoints (where n = NumberOfSeries, m = NumberOfDataPointsPerSeries) and we assume that each datapoint may have come from one of a number (NumberOfMixtureComponents) of different multi-variate Normal (i.e. Gaussian) probability distributions. We use the EM algorithm to identify the Gaussian mixture model with that number of Gaussian distributions that best fits the data in a maximum likelihood sense. To do so we provide initial starting means (i.e. centres) for each cluster, an initial value to ascribe to the covariance matrix terms and a termination criterion which stops the algorithm when an iteration appears to have added very little to the (log) likelihood. Also included is a backstop Itermax, which stops the algorithm carrying out a very large number of iterations.

InputData is a 2d array of size NumberOfSeries x NumberOfDataPointsPerSeries and StartMeans is a 2d array of size NumberOfClusters x NumberOfSeries.

The output is an array with 3 + NumberOfClusters x NumberOfSeries + 1 + NumberOfClusters x NumberOfSeries x NumberOfSeries + NumberOfClusters + NumberOfDataPointsPerSeries x NumberOfClusters as follows:

First 3 values = NumberOfSeries, NumberOfDataPointsPerSeries and NumberOfMixtureComponents

Following NumberOfClusters x NumberOfSeries values = means (centres) of each mixture component

Next value = Log likelihood of the data given the selected Gaussian mixture

Following NumberOfClusters x NumberOfSeries x NumberOfSeries values = covariances of each mixture component

Following NumberOfClusters = probability of a random datapoint being drawn from a given mixture component

Following NumberOfDataPointsPerSeries x NumberOfClusters values = probability of each individual datapoint being in a particular mixture component.

See also k-means clustering (which can be thought of as akin to a simplified version of Gaussian mixture modelling) and example data series for Gaussian mixture modelling and k-means clustering.

The algorithm used is derived from Press et al. (2007).

WARNING

For some data series used to tested the algorithm the log likelihood function appears to have many local maxima using the implementation of the EM algorithm by the Nematrian website. This means that the precise mixture selected is sensitive to the starting values inserted into the algorithm, in particular here the initial starting means. This reduces the usefulness of the algorithm for finding robust estimates of the different mixture components.

NAVIGATION LINKS
Contents | Prev | Next

Links to:

- Interactively run function

- Interactive instructions

- Example calculation

- Output type / Parameter details

- Illustrative spreadsheet

- Other Markov processes functions

- Computation units used

Note: If you use any Nematrian web service either programmatically or interactively then you will be deemed to have agreed to the Nematrian website License Agreement

Desktop view | Switch to Mobile