/

### Function Description

Returns the result (as a single array) of applying a Gaussian mixture modelling analysis, as per Press et al. (2007).

In Gaussian mixture modelling, we have n m-dimensional datapoints (where n = NumberOfSeries, m = NumberOfDataPointsPerSeries) and we assume that each datapoint may have come from one of a number (NumberOfMixtureComponents) of different multi-variate Normal (i.e. Gaussian) probability distributions. We use the EM algorithm to identify the Gaussian mixture model with that number of Gaussian distributions that best fits the data in a maximum likelihood sense. To do so we provide initial starting means (i.e. centres) for each cluster, an initial value to ascribe to the covariance matrix terms and a termination criterion which stops the algorithm when an iteration appears to have added very little to the (log) likelihood. Also included is a backstop Itermax, which stops the algorithm carrying out a very large number of iterations.

InputData is a 2d array of size NumberOfSeries x NumberOfDataPointsPerSeries and StartMeans is a 2d array of size NumberOfClusters x NumberOfSeries.

The output is an array with 3 + NumberOfClusters x NumberOfSeries + 1 + NumberOfClusters x NumberOfSeries x NumberOfSeries + NumberOfClusters + NumberOfDataPointsPerSeries x NumberOfClusters as follows:

First 3 values = NumberOfSeries, NumberOfDataPointsPerSeries and NumberOfMixtureComponents

Following NumberOfClusters x NumberOfSeries values = means (centres) of each mixture component

Next value = Log likelihood of the data given the selected Gaussian mixture

Following NumberOfClusters x NumberOfSeries x NumberOfSeries values = covariances of each mixture component

Following NumberOfClusters = probability of a random datapoint being drawn from a given mixture component

Following NumberOfDataPointsPerSeries x NumberOfClusters values = probability of each individual datapoint being in a particular mixture component.

See also k-means clustering (which can be thought of as akin to a simplified version of Gaussian mixture modelling) and example data series for Gaussian mixture modelling and k-means clustering.

The algorithm used is derived from Press et al. (2007).

WARNING

For some data series used to tested the algorithm the log likelihood function appears to have many local maxima using the implementation of the EM algorithm by the Nematrian website. This means that the precise mixture selected is sensitive to the starting values inserted into the algorithm, in particular here the initial starting means. This reduces the usefulness of the algorithm for finding robust estimates of the different mixture components.

Contents | Prev | Next