/

### Enterprise Risk Management Formula Book 5. Statistical Methods

5.1          Sample moments

A random sample of  observations  has (equally weighted) sample moments as follows:

 Sample mean Sample variance Sample skewness Sample (excess) kurtosis

‘Population’ moments (e.g. population variance, population skewness, population excess kurtosis) are calculated as if the distribution from which the data was being drawn was discrete and the probabilities of occurrence exactly matched the observed frequency of occurrence.

The least squares estimator for parameters of a distribution are the values of the parameters that minimise the square of the residuals, so the least squares estimator for the mean, , is the value that minimises

Non-equally weighted moments give different weights to different observations (the weights not dependent on the ordering of the observations), e.g. the sample non-equally weighted mean (using weights ) is:

5.2          Parametric inference (with an underlying following the normal distribution)

One sample:

For a single (equally weighted) sample of size , , where  then the following statistics are distributed according to the Student’s t distribution and the chi-squared distribution:

Two samples:

For two independent samples of sizes  and ,  and ,  where  and  then the following statistic is distributed according to the F distribution:

If  then:

where  is the pooled sample variance.

5.3          Maximum likelihood estimators

If  is the maximum likelihood estimator of a parameter  based on a sample  then

where  is the likelihood for the sample, i.e.  and hence

is asymptotically normally distributed with mean  and variance equal to the Cramér-Rao lower bound

Likelihood ratio test:

where  is the maximum log-likelihood for the model under  (with  free parameters) and  is the maximum log-likelihood for the model under  (with  free parameters). Non-equally weighted estimators can be identified by weighting the  terms appropriately.

5.4          Method-of-moments estimators

Method of moments estimators are the parameter values (for the  parameters specifying a given distributional family) that result in replication of the first  moments of the observed data. For the normal distribution these involve  and either  (the sample variance, if a small sample size adjustment is included) or  (the ‘population’ variance, if the small sample size adjustment is ignored and we select the estimators to fit  and . In the generalised method of moments approach we select parameters that ‘best’ fit the selected moments (given some criterion for ‘best’), rather than selecting parameters that perfectly fit the selected moments.

5.5          Goodness of fit

Goodness of fit describes how well a statistical model fits a set of observations. Examples include the following, where  is the ’th order statistic,  is the supremum (i.e. largest value) of the set ,  is the cumulative distribution function of the distribution we are fitting and  is the empirical distribution function:

(a)          Kolmogorov-Smirnov test:  . Under the null hypothesis (that the sample comes from the hypothesized distribution), as  then  tends to a limiting distribution (the Kolmogorov distribution).

(b)          Cramér-von-Mises test:

(c)           Anderson-Darling test:  where

If data is bucketed into ranges then we may also use (Pearson’s) chi-squared goodness of fit test using the following test statistic, where  is the sample size and  is the observed count,  is the expected count and  and  are the lower and upper limits for the ’th bin. The test statistic follows approximately a chi-squared distribution with  degrees of freedom, i.e.  where  is the number of non-empty cells and  is the number of estimated parameters plus 1:

We may also test whether the skew or kurtosis or the two combined (the Jarque-Bera test) appear materially different from what would be implied by the relevant distributional family. If the null hypothesis is that the data comes from a normal distribution then, for large , ,  and .

The Akaike Information Criterion (AIC) (and other similar ways of choosing between different types of model that trade-off goodness of fit with model complexity, such as the Bayes Information Criterion, BIC) involves selecting the model with the highest information criterion of the form  where there are  unknown parameters and we are using a data series of length  for fitting purposes. For the AIC  and for the BIC .

5.6          Linear regression

In the univariate case suppose  where ,  then (equally weighted) estimates of  and  are:

where

Also

The individual expected responses are  and satisfy the following ‘sum of squares’ relationship:

The variance of the predicted mean response is:

The variance of a predicted individual response is the variance of the predicted mean response plus an additional .

For generalised least squares, if we have  different series each with  observations we are fitting  then the vector of least squares estimators,  is given by  where  is a  matrix with elements  and  is an  dimensional vector with elements .

5.7          Correlations

The observed (sample) correlation coefficient (i.e. Pearson correlation coefficient) between two series of equal lengths indexed in the same manner  and  is (where ,  and  are as given in the section on linear regression):

If the underlying correlation coefficient, , is zero and the data comes from a bivariate normal distribution then:

For arbitrary  () the Fisher z transform is  where:

If the data comes from a bivariate normal distribution then  is distributed approximately as follows:

Two non-parametric measures of correlation are:

-          Spearman’s rank correlation coefficient, where  and  are the ranks within  and  of  and  respectively:

-       Kendall’s tau, where computation is taken over all  and  with  and (for the moment ignoring ties) a concordant pair is a case where  and a discordant pair is a case where :

There are various possible ways of handling ties in these two non-parametric measures of correlation (ties should not in practice arise if the random variables really are continuous).

Given a single factor normal model

where  with .

Variance estimate:

Under the null hypothesis given above

where:

5.9          Bayesian priors and posteriors

Posterior and prior distributions are related as follows:

i.e.

For example, if  is a random sample of size  from a  where  is known and the prior distribution for  is  then  the posterior distribution for  is:

where  is ‘credibility weighted’ as follows:

and