Fractional differentiation is a feature-transformation technique that makes a financial time series stationary while preserving as much of its memory as possible. It was brought into mainstream quantitative practice by Marcos López de Prado in Advances in Financial Machine Learning (2018), and it resolves a dilemma that sits at the heart of preparing data for predictive models: most statistical and machine-learning methods require stationary inputs, but the usual way of getting them — taking returns — throws away almost all of the memory a model might have exploited.
Prices are non-stationary: they wander like a random walk with no fixed mean, so their statistical properties drift over time and inference built on them is unreliable. The textbook fix is to difference the series once, turning prices into returns, which are approximately stationary. But that integer differencing is a blunt instrument — it strips out the persistent, long-memory structure along with the non-stationarity. Fractional differentiation offers a dial instead of a switch: apply just enough differencing to reach stationarity, and no more. This guide explains the stationarity–memory dilemma, how the fractional differencing operator works, how to choose the differencing order, de Prado's fixed-width-window refinement, and what the technique does and does not do for signal feature engineering.
Key Takeaways
- Most models need stationary inputs, but prices are non-stationary. The standard remedy — first differencing into returns — achieves stationarity by destroying almost all of the series' memory. That is the stationarity–memory dilemma.
- Fractional differentiation generalizes differencing to a non-integer order d between 0 and 1, applying the minimum amount needed to reach stationarity while retaining maximum memory. At d = 0 the series is unchanged; at d = 1 it is the ordinary return; values in between keep a long, slowly decaying weighting of the past.
- The practical recipe is to find the smallest d for which the series passes a stationarity test (the Augmented Dickey-Fuller test). De Prado reports this minimum order is often well below 1, so the differenced series stays highly correlated with the original level — most of its memory survives.
- De Prado's fixed-width-window fractional differentiation (FFD) drops negligible weights so every observation is transformed with the same finite window — yielding a well-defined, driftless, stationary series rather than the non-uniform one an expanding window produces.
- Fractional differentiation is a preprocessing step, not an alpha. It preserves the memory a model can learn from; it does not create signal, it does not prevent overfitting, and the edges a model finds in frac-diff features still decay.
The Stationarity–Memory Dilemma
Nearly every inferential tool a quant uses — from a linear regression to a gradient-boosted tree to a neural network — implicitly assumes that the statistical relationship it is fitting is stable over time. That assumption requires stationarity: a series whose mean, variance, and autocorrelation structure do not drift. Raw prices violate it badly. A price series behaves like a random walk with a unit root: it has no fixed level to revert to, its variance grows with time, and a model fit on price levels will latch onto spurious relationships that fall apart out of sample.
The conventional cure is to difference the series once. Subtracting yesterday's price from today's gives the return, and returns are approximately stationary — they fluctuate around a roughly constant mean. The problem is that this integer differencing is far more aggressive than it needs to be. Returns are very nearly memoryless: today's return tells you almost nothing about the level the price reached, the trend it had been on, or how far it sits from its recent range. All of that persistent structure — the memory of the series — is exactly what a predictive model might have used, and first differencing erases it. You are forced into an all-or-nothing trade: keep the full memory and stay non-stationary, or become stationary and lose almost all the memory. Fractional differentiation exists to break that false choice.
What Fractional Differentiation Is
The key realization is that differencing does not have to be a whole-number operation. The differencing operator can be raised to a non-integer power, and doing so applies a partial, tunable amount of differencing. This idea comes from the long-memory time-series literature — Granger and Joyeux (1980) and Hosking (1981) introduced fractional differencing as the basis of fractionally integrated (ARFIMA) models — and de Prado adapted it as a way to build stationary-yet-memory-rich features for machine-learning models of financial signals.
From integer to fractional differencing
Write B for the backshift operator, which simply shifts a series back one step in time, so that B applied to today's price returns yesterday's. An ordinary first difference is then the operator (1 − B): applying it to the price gives the return. Applying (1 − B) to the power 0 is the identity — it leaves the series untouched. Fractional differentiation generalizes the exponent from a whole number to any real value d. The operator (1 − B) raised to a fractional power d does not terminate after one or two terms the way the integer cases do; it expands into an infinite weighted sum of past observations, and those weights are what determine how much memory survives.
The weights
Expanding (1 − B) to the power d as a binomial series produces a sequence of weights on successive lags that can be computed with a simple recursion: the first weight is 1, and each subsequent weight equals the previous one multiplied by −(d − k + 1) / k, where k is the lag index. Three cases make the behavior clear:
- d = 0: only the first weight is nonzero, so the operator returns the original series unchanged — full memory, fully non-stationary.
- d = 1: the first weight is 1 and the second is −1, and all the rest vanish, so the operator is the ordinary first difference — the return, stationary, with its memory wiped out.
- 0 < d < 1: the weights form an infinite, slowly decaying sequence. Each transformed value is a weighted blend of many past observations, so the series keeps a long memory while the partial differencing pulls it toward stationarity.
The table below shows the first few weights for an illustrative fractional order, to convey how slowly they decay relative to the abrupt cutoff of integer differencing (values illustrative, rounded):
| Lag k | Weight at d = 1 (returns) | Weight at d = 0.4 (illustrative) |
|---|---|---|
| 0 | 1.00 | 1.00 |
| 1 | −1.00 | −0.40 |
| 2 | 0.00 | −0.12 |
| 3 | 0.00 | −0.06 |
| 4 | 0.00 | −0.04 |
At d = 1 the weights stop after one lag — only the most recent change matters, which is why returns are memoryless. At a fractional order the weights persist across many lags and fade gradually, so the recent past dominates but the deeper past still contributes. That long, fading tail is the memory the technique is designed to keep.
Finding the Minimum Differencing Order
Because d is a continuous dial, the practical question is where to set it. The goal is the smallest d that achieves stationarity: differencing any harder than necessary only discards memory you could have kept. The standard test is the Augmented Dickey-Fuller (ADF) test (Dickey and Fuller, 1979), which checks for the unit root that signals non-stationarity. The procedure is to sweep d upward from 0, fractionally difference the series at each step, and run the ADF test until the unit-root null is rejected at a chosen confidence level — the first d that clears the threshold is the minimum differencing order.
De Prado's central empirical observation is that this minimum order is frequently well below 1 for financial price series — often a small fraction rather than a full unit. That has a direct consequence: at the minimum d, the fractionally differenced series remains highly correlated with the original price level, because so little differencing was applied. In other words, you can usually reach statistical stationarity while preserving the overwhelming majority of the series' memory. As d rises toward 1, that correlation with the original level collapses toward zero — which is just the formal statement of how much information ordinary returns throw away.
Fixed-Width Window Fractional Differentiation (FFD)
A naive implementation lets the weights extend over the entire available history, so the effective window grows as the series lengthens and the earliest observations are differenced with fewer weights than the latest. That non-uniformity introduces a subtle negative drift and means different parts of the series carry different amounts of memory — an awkward property for a feature you want to be consistent across time.
De Prado's fix is fixed-width-window fractional differentiation (FFD). Because the weights decay, those past a certain lag are negligibly small; FFD simply drops every weight whose absolute value falls below a chosen threshold. The result is a constant number of weights and therefore a constant-width window, so every point in the series is transformed using the same finite memory. FFD produces a well-defined, driftless, stationary series in which the memory is uniform from start to end — the version of the transform you actually want to feed to a model. In practice the weights for a given d and threshold are computed once and then applied to the series as a fixed convolution.
Why It Matters for Signal Research
For a team building machine-learning models of weak predictive signals, fractional differentiation directly improves the quality of the inputs. Returns satisfy the stationarity requirement but arrive at the model already stripped of memory, so the model can only learn from very short-horizon structure. Fractionally differentiated features satisfy the same stationarity requirement while still carrying the level and trend information that returns discard — giving the model a genuine chance to learn from persistent structure. This is why frac-diff belongs in the feature-engineering toolkit alongside the other transforms that shape raw data into model-ready inputs, and why it appears repeatedly in machine-learning approaches to signal detection. The natural next step — judging which of the resulting features actually carry predictive content — is the job of feature importance analysis.
What it does not do is remove the need for honest validation. A more memory-rich feature set widens the space of patterns a model can fit, which raises rather than lowers the risk of fitting noise. Fractionally differentiated inputs still demand purged cross-validation, vigilance against backtest overfitting and multiple testing, and a clean separation of in-sample and out-of-sample data before any result is believed.
Pitfalls and Practical Guidance
The right d is data-dependent. The minimum differencing order is a property of each series and can shift as a market's regime changes, so it should be estimated per series and re-checked over time rather than fixed once and forgotten. Over-differencing (d higher than necessary) needlessly discards memory; under-differencing leaves the series non-stationary and the model's inferences unreliable.
It is preprocessing, not prediction. Fractional differentiation prepares a series so that a model has the best chance of finding signal; it does not itself produce an edge. A frac-diff feature that retains memory is only valuable if there is genuine predictive structure for the model to learn, and any edge it helps uncover is still subject to alpha decay and the broader pattern of signal decay. The discipline of validating a candidate signal and backtesting it honestly applies exactly as before.
Interpretability has a cost. A fractionally differentiated value is a weighted sum of many past observations, which is less directly interpretable than a simple return. That is a reasonable trade for the memory it preserves, but it is worth remembering when diagnosing a model: the inputs no longer map one-to-one onto an obvious quantity like "yesterday's move."
Fractional Differentiation in Practice
Used well, fractional differentiation is a quiet but powerful upgrade to the data pipeline behind a signal model. The workflow is consistent: sweep the differencing order upward and use an ADF test to find the minimum d that achieves stationarity; apply that order through a fixed-width window so the memory is uniform and driftless; confirm that the transformed series stays well-correlated with the original level, evidence that memory was preserved; then feed those features into the model and validate the result with the same rigor any signal demands. It will not manufacture an edge where none exists — but where there is real, persistent structure in prices, fractional differentiation is how you keep that structure intact long enough for a model to find it, instead of throwing it away at the very first step.