backtesting

Purged Cross-Validation

Cross-validation for financial time series that removes observations adjacent to the test set from training, eliminating leakage from autocorrelation.

Standard k-fold cross-validation applied to financial time series suffers from look-ahead bias: training observations adjacent to the test set share autocorrelated features and label information (from overlapping return windows), allowing the model to partially see the future. Purged Cross-Validation (PCV), developed by Marcos López de Prado (2018), solves this by purging from the training set all observations that overlap in their labeling window with the test observations.

Steps

  1. Define the label horizon T — the forward-return window used as the outcome variable (e.g., a 5-day return)
  2. For each test observation at time t, remove all training observations within [t − T, t + T] from the training fold
  3. Optionally add an embargo period after the test set to prevent leakage from post-event autocorrelation in the subsequent training fold

Why standard CV fails for financial data

In financial time series, adjacent observations overlap: the 5-day forward return from day t and from day t+1 share 4 overlapping days of information. If both appear in train and test, the model can exploit the shared content. PCV prevents this by separating train and test by at least one full labeling window.

Related terms

Related articles