[this page | pdf | references | back links]

Function Description

Returns the result (as a single array) of applying a Gaussian mixture modelling analysis, as per Press et al. (2007).


In Gaussian mixture modelling, we have n m-dimensional datapoints (where n = NumberOfSeries, m = NumberOfDataPointsPerSeries) and we assume that each datapoint may have come from one of a number (NumberOfMixtureComponents) of different multi-variate Normal (i.e. Gaussian) probability distributions. We use the EM algorithm to identify the Gaussian mixture model with that number of Gaussian distributions that best fits the data in a maximum likelihood sense. To do so we provide initial starting means (i.e. centres) for each cluster, an initial value to ascribe to the covariance matrix terms and a termination criterion which stops the algorithm when an iteration appears to have added very little to the (log) likelihood. Also included is a backstop Itermax, which stops the algorithm carrying out a very large number of iterations.


InputData is a 2d array of size NumberOfSeries x NumberOfDataPointsPerSeries and StartMeans is a 2d array of size NumberOfClusters x NumberOfSeries.


The output is an array with 3 + NumberOfClusters x NumberOfSeries + 1 + NumberOfClusters x NumberOfSeries x NumberOfSeries + NumberOfClusters + NumberOfDataPointsPerSeries x NumberOfClusters as follows:


First 3 values = NumberOfSeries, NumberOfDataPointsPerSeries and NumberOfMixtureComponents


Following NumberOfClusters x NumberOfSeries values = means (centres) of each mixture component


Next value = Log likelihood of the data given the selected Gaussian mixture


Following NumberOfClusters x NumberOfSeries x NumberOfSeries values = covariances of each mixture component


Following NumberOfClusters = probability of a random datapoint being drawn from a given mixture component


Following NumberOfDataPointsPerSeries x NumberOfClusters values = probability of each individual datapoint being in a particular mixture component.


See also k-means clustering (which can be thought of as akin to a simplified version of Gaussian mixture modelling) and example data series for Gaussian mixture modelling and k-means clustering.


The algorithm used is derived from Press et al. (2007).




For some data series used to tested the algorithm the log likelihood function appears to have many local maxima using the implementation of the EM algorithm by the Nematrian website. This means that the precise mixture selected is sensitive to the starting values inserted into the algorithm, in particular here the initial starting means. This reduces the usefulness of the algorithm for finding robust estimates of the different mixture components.


Contents | Prev | Next

Links to:

-          Interactively run function

-          Interactive instructions

-          Example calculation

-          Output type / Parameter details

-          Illustrative spreadsheet

-          Other Markov processes functions

-          Computation units used

Note: If you use any Nematrian web service either programmatically or interactively then you will be deemed to have agreed to the Nematrian website License Agreement

Desktop view | Switch to Mobile