3. In-sample versus out-of-sample
[this page | pdf | references | back links]
Return to Abstract
3.1 Short-cutting the future by referring merely to
the past introduces look-back bias. Exactly how this works out in
practice depends on how the backtesting is carried out.
3.2 One way of carrying out a backtest would be to
take a single model of how the future might evolve and then to apply the same
model to every prior period. This is called in-sample backtesting. The
key issue with such an approach is that the model will typically have been
formulated by reference to past history including the past that we are then
testing the model against. Thus, unless we have been particularly inept at
fitting the past when constructing the risk model in the first place, we should
find that it is a reasonable fit in an in-sample, i.e. ex-post, comparison.
We cannot then conclude much from its apparent goodness of fit.
3.3 Backtesters attempt to mitigate this problem by
using so-called out-of-sample testing. What this involves is a
specification of how to construct a model using data only available up to a
particular point in time. We then apply the model construction algorithm only
to observations that occurred after the end of the sample period used in
the estimation of the model, i.e. out of the sample in question. The model
might be estimated once-off using a particular earlier period of time and then
the same model might be applied each time period thereafter. Alternatively, the
model might be re-estimated at the start of each time period using data that
would have then been available, so that the time period then just about occur
is still (just) after the in-sample period.
3.4 Whilst out-of-sample modelling does reduce
look-back bias it does not eliminate it. Risk models ultimately involve lots of
different assumptions about how the future might evolve, not least the format
of the risk model itself. In the background there are lots of competing risk
models that we might have considered suitable for the problem. Not too
surprisingly, the only ones that actually see the light of day, and therefore
get formally assessed in an out-of-sample context, are ones that are likely to
be tolerably good at fitting the past even in an out-of-sample context. Risk
modellers are clever enough to winnow out ones that will obviously fail such a
test before the test is actually carried out. This point is perhaps more
relevant to backtesting of return generating algorithms, given the human
tendency to rationalise explanations for success or failure, perhaps even if
there is no such explanation, see e.g. Taleb (2004).
Contents | Prev | Next