Backtest overfitting occurs when a quantitative strategy is tuned — via parameter optimization, rule selection, or model choice — on historical data to the point where the resulting performance reflects random in-sample patterns rather than genuine predictive structure. The optimized strategy performs well on the data it was fitted to, and poorly on new data.
The core statistical problem is multiple testing: when many strategy variants are evaluated and the best is selected, the winner is guaranteed to look strong in-sample regardless of whether any true signal exists. Reported backtested Sharpe Ratios are systematically overstated without adjustment.
Diagnostics
- Out-of-sample testing — reserve a portion of data for evaluation only, never touched during development
- Walk-forward analysis — sequential train/test windows that simulate live execution
- Deflated Sharpe Ratio — adjusts SR for the number of trials tested, skewness, and kurtosis
- Probability of Backtest Overfitting (PBO) — estimates the probability the best IS strategy would underperform out-of-sample