Standard k-fold cross-validation applied to financial time series suffers from look-ahead bias: training observations adjacent to the test set share autocorrelated features and label information (from overlapping return windows), allowing the model to partially see the future. Purged Cross-Validation (PCV), developed by Marcos López de Prado (2018), solves this by purging from the training set all observations that overlap in their labeling window with the test observations.
Steps
- Define the label horizon T — the forward-return window used as the outcome variable (e.g., a 5-day return)
- For each test observation at time t, remove all training observations within [t − T, t + T] from the training fold
- Optionally add an embargo period after the test set to prevent leakage from post-event autocorrelation in the subsequent training fold
Why standard CV fails for financial data
In financial time series, adjacent observations overlap: the 5-day forward return from day t and from day t+1 share 4 overlapping days of information. If both appear in train and test, the model can exploit the shared content. PCV prevents this by separating train and test by at least one full labeling window.