Backtest Overfitting

When in-sample optimization causes historical performance to overstate expected live performance.

Backtest overfitting occurs when a quantitative strategy is tuned — via parameter optimization, rule selection, or model choice — on historical data to the point where the resulting performance reflects random in-sample patterns rather than genuine predictive structure. The optimized strategy performs well on the data it was fitted to, and poorly on new data.

The core statistical problem is multiple testing: when many strategy variants are evaluated and the best is selected, the winner is guaranteed to look strong in-sample regardless of whether any true signal exists. Reported backtested Sharpe Ratios are systematically overstated without adjustment.

Diagnostics

Out-of-sample testing — reserve a portion of data for evaluation only, never touched during development
Walk-forward analysis — sequential train/test windows that simulate live execution
Deflated Sharpe Ratio — adjusts SR for the number of trials tested, skewness, and kurtosis
Probability of Backtest Overfitting (PBO) — estimates the probability the best IS strategy would underperform out-of-sample

Backtest Overfitting

Diagnostics

Related terms

Related articles