Portfolio Backtesting

4a. Testing backtest quality statistically: Aggregate quality

[this page | pdf | references | back links]

Return to Abstract and Contents

Next page


4.1          Any statistic such as a VaR estimate that is ultimately derived in part from analysis of a finite data sample is itself just an uncertain estimate of whatever is its ‘true’ underlying (but ultimately unobservable) value. It therefore comes with some error. Moreover, outcomes that arise in the future will also ultimately be probabilistic in nature.


4.2          Thus, suppose we experienced a significantly adverse outcome in the next period, well outside the typical spread of ranges we might have otherwise predicted. Does this mean that our model is wrong? Not necessarily. It might just mean that we have been unlucky.


4.3          Statisticians face this sort of issue with any type of modelling. The way that it is typically tackled is to postulate a hypothesis and to then identify the likelihood that the hypothesis is wrong (with the model being rejected if the hypothesis is too likely to be wrong). But even then, we might have alighted on the right model but might reject it because of a fluke series of outcomes.


4.4          Statistical backtesting of risk models typically thus proceeds in one of two ways:


(a)    We tabulate past estimates from our risk model (with suitable out-of-sample adjustments as appropriate) of the specific statistic that we are most interested in ‘estimating correctly’ versus past outcomes. For example, the statistic in question might be a given quantile level, i.e. a suitable VaR estimate. We then apply suitable statistical tests applicable to that particular statistic, see e.g. Campbell (2006), Hurlin and Tokpavi (2006), Pena, Rivera and Ruiz-Mata (2006) or Zumbach (2006) to test if past actuals suitably fit past predictions. For example, we might use a one sided likelihood ratio test which provides a confidence interval on the number of rejects that we would expect to see, rejecting the model if too many actuals exceed corresponding predictions.


(b)   Alternatively, we may seek to test whether the entire distributional form that our model would have predicted when applied to past data seems to fit the observed range of actual past outcomes, using appropriate statistical tests, see e.g. Campbell (2006) or Dowd (2006).


4.5          Statistical techniques might also typically be supplemented by corresponding graphical comparison of the data. This might, for example, indicate visually that the model was a poor fit only during a limited ‘exceptional’ period in the past which might permit some suitable explanation or refinement of the model to cater for this historic period.


Contents | Prev | Next

Desktop view | Switch to Mobile