4a. Testing backtest quality
statistically: Aggregate quality
[this page | pdf | references | back links]
Return to Abstract
4.1 Any statistic such as a VaR estimate that is
ultimately derived in part from analysis of a finite data sample is itself just
an uncertain estimate of whatever is its ‘true’ underlying (but ultimately
unobservable) value. It therefore comes with some error. Moreover, outcomes
that arise in the future will also ultimately be probabilistic in nature.
suppose we experienced a significantly adverse outcome in the next period, well
outside the typical spread of ranges we might have otherwise predicted. Does
this mean that our model is wrong? Not necessarily. It might just mean that we
have been unlucky.
4.3 Statisticians face this sort of issue with any
type of modelling. The way that it is typically tackled is to postulate a
hypothesis and to then identify the likelihood that the hypothesis is wrong
(with the model being rejected if the hypothesis is too likely to be wrong).
But even then, we might have alighted on the right model but might reject it
because of a fluke series of outcomes.
backtesting of risk models typically thus proceeds in one of two ways:
(a) We tabulate past
estimates from our risk model (with suitable out-of-sample adjustments as
appropriate) of the specific statistic that we are most interested in
‘estimating correctly’ versus past outcomes. For example, the statistic in
question might be a given quantile level, i.e. a suitable VaR estimate. We then
apply suitable statistical tests applicable to that particular statistic, see
and Tokpavi (2006), Pena, Rivera
and Ruiz-Mata (2006) or Zumbach (2006)
to test if past actuals suitably fit past predictions. For example, we might
use a one sided likelihood ratio test which provides a confidence interval on
the number of rejects that we would expect to see, rejecting the model if too
many actuals exceed corresponding predictions.
(b) Alternatively, we may
seek to test whether the entire distributional form that our model would have
predicted when applied to past data seems to fit the observed range of actual
past outcomes, using appropriate statistical tests, see e.g. Campbell (2006)
or Dowd (2006).
4.5 Statistical techniques might also typically be
supplemented by corresponding graphical comparison of the data. This might, for
example, indicate visually that the model was a poor fit only during a limited
‘exceptional’ period in the past which might permit some suitable explanation
or refinement of the model to cater for this historic period.
Contents | Prev | Next