Enterprise Risk Management Formula Book
5. Statistical Methods
[this page | pdf | back links]
5.1 Sample moments
A random sample of observations has (equally
weighted) sample moments as follows:
‘Population’ moments (e.g. population variance,
population skewness,
population excess
kurtosis) are calculated as if the distribution from which the data was
being drawn was discrete and the probabilities of occurrence exactly matched
the observed frequency of occurrence.
The least squares estimator for parameters of a distribution
are the values of the parameters that minimise the square of the residuals, so
the least squares estimator for the mean, , is the value
that minimises
Non-equally weighted moments give different weights to
different observations (the weights not dependent on the ordering of the
observations), e.g. the sample non-equally weighted mean (using
weights ) is:
5.2 Parametric inference (with an underlying
following the normal distribution)
One sample:
For a single (equally weighted) sample of size ,
, where then the
following statistics are distributed according to the Student’s t
distribution and the chi-squared distribution:
Two samples:
For two independent samples of sizes and
, and , where and then the
following statistic is
distributed according to the F distribution:
If then:
where is the pooled
sample variance.
5.3 Maximum likelihood estimators
If is the maximum likelihood estimator
of a parameter based on a
sample then
where is the
likelihood for the sample, i.e. and hence
is
asymptotically normally distributed with mean and
variance equal to the Cramér-Rao lower bound
Likelihood ratio test:
where is the
maximum log-likelihood for the model under (with free
parameters) and is the
maximum log-likelihood for the model under (with free
parameters). Non-equally weighted estimators can be identified by weighting the
terms
appropriately.
5.4 Method-of-moments estimators
Method of moments estimators are the parameter values
(for the parameters
specifying a given distributional family) that result in replication of the
first moments of
the observed data. For the normal distribution these involve and either (the sample
variance, if a small sample size adjustment is included) or (the
‘population’ variance, if the small sample size adjustment is ignored and we
select the estimators to fit and . In the generalised
method of moments approach we select parameters that ‘best’ fit the
selected moments (given some criterion for ‘best’), rather than selecting
parameters that perfectly fit the selected moments.
5.5 Goodness of fit
Goodness of fit describes how well a statistical model fits
a set of observations. Examples include the following, where is the ’th
order statistic, is the
supremum (i.e. largest value) of the set , is
the cumulative distribution function of the distribution we are fitting and is the
empirical distribution function:
(a) Kolmogorov-Smirnov
test: . Under the
null hypothesis (that the sample comes from the hypothesized distribution), as then
tends to a
limiting distribution (the Kolmogorov distribution).
(b) Cramér-von-Mises test:
(c) Anderson-Darling test:
where
If data is bucketed into ranges then we may also use
(Pearson’s) chi-squared goodness of fit test using the following test
statistic, where is the sample
size and is the
observed count, is the
expected count and and are the lower
and upper limits for the ’th bin. The test
statistic follows approximately a chi-squared distribution with degrees
of freedom, i.e. where is
the number of non-empty cells and is the number
of estimated parameters plus 1:
We may also test whether the skew or kurtosis or the two
combined (the Jarque-Bera test) appear materially different from what would be
implied by the relevant distributional family. If the null hypothesis is that
the data comes from a normal distribution then, for large ,
, and .
The Akaike
Information Criterion (AIC) (and other similar ways of choosing between
different types of model that trade-off goodness of fit with model complexity,
such as the Bayes Information Criterion, BIC) involves selecting the
model with the highest information criterion of the form where there
are unknown
parameters and we are using a data series of length for
fitting purposes. For the AIC and for the
BIC .
5.6 Linear regression
In the univariate case suppose where , then
(equally weighted) estimates of and are:
where
Also
The individual expected responses are and satisfy
the following ‘sum of squares’ relationship:
The variance of the predicted mean response is:
The variance of a predicted individual response is the
variance of the predicted mean response plus an additional .
For generalised least squares, if we have different
series each with observations
we are fitting then the
vector of least squares estimators, is given by where is a matrix
with elements and is an dimensional
vector with elements .
5.7 Correlations
The observed (sample) correlation coefficient
(i.e. Pearson correlation coefficient) between two series of equal
lengths indexed in the same manner and is (where , and are as given
in the section on linear regression):
If the underlying correlation coefficient, ,
is zero and the data comes from a bivariate normal distribution then:
For arbitrary ()
the Fisher z transform
is where:
If the data comes from a bivariate normal distribution then is
distributed approximately as follows:
Two non-parametric measures of correlation are:
-
Spearman’s
rank correlation coefficient, where and are the ranks
within and of
and respectively:
-
Kendall’s
tau, where computation is taken over all and
with and
(for the moment ignoring ties) a concordant pair is a case where and a
discordant pair is a case where :
There are various possible ways of handling ties in these
two non-parametric measures of correlation (ties should not in practice arise
if the random variables really are continuous).
5.8 Analysis of variance
Given a single factor normal model
where with .
Variance estimate:
Under the null hypothesis given above
where:
5.9 Bayesian priors and posteriors
Posterior and prior distributions are related as follows:
i.e.
For example, if is a random
sample of size from a where is known and
the prior distribution for is then the
posterior distribution for is:
where is
‘credibility weighted’ as follows:
and
NAVIGATION LINKS
Contents | Prev | Next