ERM Formulae

Enterprise Risk Management Formula Book

5. Statistical Methods

5.1 Sample moments

A random sample of observations has (equally weighted) sample moments as follows:

Sample mean
Sample variance
Sample skewness
Sample (excess) kurtosis

‘Population’ moments (e.g. population variance, population skewness, population excess kurtosis) are calculated as if the distribution from which the data was being drawn was discrete and the probabilities of occurrence exactly matched the observed frequency of occurrence.

The least squares estimator for parameters of a distribution are the values of the parameters that minimise the square of the residuals, so the least squares estimator for the mean, , is the value that minimises

Non-equally weighted moments give different weights to different observations (the weights not dependent on the ordering of the observations), e.g. the sample non-equally weighted mean (using weights ) is:

5.2 Parametric inference (with an underlying following the normal distribution)

One sample:

For a single (equally weighted) sample of size , , where then the following statistics are distributed according to the Student’s t distribution and the chi-squared distribution:

Two samples:

For two independent samples of sizes and , and , where and then the following statistic is distributed according to the F distribution:

If then:

where is the pooled sample variance.

5.3 Maximum likelihood estimators

If is the maximum likelihood estimator of a parameter based on a sample then

where is the likelihood for the sample, i.e. and hence

is asymptotically normally distributed with mean and variance equal to the Cramér-Rao lower bound

Likelihood ratio test:

where is the maximum log-likelihood for the model under (with free parameters) and is the maximum log-likelihood for the model under (with free parameters). Non-equally weighted estimators can be identified by weighting the terms appropriately.

5.4 Method-of-moments estimators

Method of moments estimators are the parameter values (for the parameters specifying a given distributional family) that result in replication of the first moments of the observed data. For the normal distribution these involve and either (the sample variance, if a small sample size adjustment is included) or (the ‘population’ variance, if the small sample size adjustment is ignored and we select the estimators to fit and . In the generalised method of moments approach we select parameters that ‘best’ fit the selected moments (given some criterion for ‘best’), rather than selecting parameters that perfectly fit the selected moments.

5.5 Goodness of fit

Goodness of fit describes how well a statistical model fits a set of observations. Examples include the following, where is the ’th order statistic, is the supremum (i.e. largest value) of the set , is the cumulative distribution function of the distribution we are fitting and is the empirical distribution function:

(a) Kolmogorov-Smirnov test: . Under the null hypothesis (that the sample comes from the hypothesized distribution), as then tends to a limiting distribution (the Kolmogorov distribution).

(b) Cramér-von-Mises test:

If data is bucketed into ranges then we may also use (Pearson’s) chi-squared goodness of fit test using the following test statistic, where is the sample size and is the observed count, is the expected count and and are the lower and upper limits for the ’th bin. The test statistic follows approximately a chi-squared distribution with degrees of freedom, i.e. where is the number of non-empty cells and is the number of estimated parameters plus 1:

We may also test whether the skew or kurtosis or the two combined (the Jarque-Bera test) appear materially different from what would be implied by the relevant distributional family. If the null hypothesis is that the data comes from a normal distribution then, for large , , and .

The Akaike Information Criterion (AIC) (and other similar ways of choosing between different types of model that trade-off goodness of fit with model complexity, such as the Bayes Information Criterion, BIC) involves selecting the model with the highest information criterion of the form where there are unknown parameters and we are using a data series of length for fitting purposes. For the AIC and for the BIC .

5.6 Linear regression

In the univariate case suppose where , then (equally weighted) estimates of and are:

where

Also

The individual expected responses are and satisfy the following ‘sum of squares’ relationship:

The variance of the predicted mean response is:

The variance of a predicted individual response is the variance of the predicted mean response plus an additional .

For generalised least squares, if we have different series each with observations we are fitting then the vector of least squares estimators, is given by where is a matrix with elements and is an dimensional vector with elements .

5.7 Correlations

The observed (sample) correlation coefficient (i.e. Pearson correlation coefficient) between two series of equal lengths indexed in the same manner and is (where , and are as given in the section on linear regression):

If the underlying correlation coefficient, , is zero and the data comes from a bivariate normal distribution then:

For arbitrary () the Fisher z transform is where:

If the data comes from a bivariate normal distribution then is distributed approximately as follows:

Two non-parametric measures of correlation are:

- Spearman’s rank correlation coefficient, where and are the ranks within and of and respectively:

- Kendall’s tau, where computation is taken over all and with and (for the moment ignoring ties) a concordant pair is a case where and a discordant pair is a case where :

There are various possible ways of handling ties in these two non-parametric measures of correlation (ties should not in practice arise if the random variables really are continuous).

5.8 Analysis of variance

Given a single factor normal model