Statistical
stationarity
First difference (period-to-period change)
Statistical
stationarity: A
stationary time series is one whose statistical properties
such as mean, variance, autocorrelation, etc. are all constant
over time. Most statistical forecasting methods are based on the
assumption that the time series can be rendered approximately
stationary (i.e., "stationarized") through the use of
mathematical transformations. A stationarized series is relatively
easy to predict: you simply predict that its statistical properties
will be the same in the future as they have been in the
past!
(Recall our famous forecasting quotes.)
The predictions for the stationarized series can then be
"untransformed,"
by reversing whatever mathematical transformations were previously
used, to obtain predictions for the original series. (The details
are normally taken care of by your software.) Thus, finding the
sequence of transformations needed to stationarize a time series
often provides important clues in the search for an appropriate
forecasting model.
Another reason for trying to stationarize a time series is to be able to obtain meaningful sample statistics such as means, variances, and correlations with other variables. Such statistics are useful as descriptors of future behavior only if the series is stationary. For example, if the series is consistently increasing over time, the sample mean and variance will grow with the size of the sample, and they will always underestimate the mean and variance in future periods. And if the mean and variance of a series are not well-defined, then neither are its correlations with other variables. For this reason you should be cautious about trying to extrapolate regression models fitted to nonstationary data.
Most business and economic time
series are far from stationary
when expressed in their original units of measurement, and even
after deflation or seasonal adjustment they will typically still
exhibit trends, cycles, random-walking, and other non-stationary
behavior. If the series has a stable long-run trend and
tends to revert to the trend line following a disturbance, it may be
possible to stationarize it by de-trending (e.g., by fitting a trend
line and subtracting it out prior to fitting a model, or else by
including the time index as an independent variable in a regression or
ARIMA model), perhaps in conjunction with logging or deflating.
Such a series is said to be trend-stationary.
However, sometimes even de-trending is not sufficient to
make the series stationary, in which case it may be necessary
to transform it into a series of period-to-period and/or
season-to-season
differences. If the mean, variance, and autocorrelations
of the original series are not constant in time, even after detrending,
perhaps the statistics
of the changes in the series between periods or between
seasons will be constant. Such a series is said to
be difference-stationary. (Sometimes it can be hard
to tell the difference between a series that is trend-stationary and
one that is difference-stationary, and a so-called unit root test may be used to get a more
definitive answer. We will return to this topic later in the
course.)
(Return to top of page.)
The first
difference of a time
series is the series of changes from one period to the next. If
Y(t) denotes the value of the time series Y at period t, then
the first difference of Y at period t is equal to Y(t)-Y(t-1).
In Statgraphics, the first difference of Y is expressed as DIFF(Y).
If the first difference of Y is stationary and also completely
random (not autocorrelated), then Y is described by a random walk
model: each value is a random step away from the previous value.
If the first difference of Y is stationary but not completely
random--i.e., if its value at period t is autocorrelated with its
value at earlier periods--then a more sophisticated forecasting
model such as exponential smoothing or ARIMA may be appropriate.
(Note: if DIFF(Y) is stationary and random, this indicates that a
random walk model is appropriate for the original series Y, not that a random walk model should
be fitted to DIFF(Y). Fitting a random walk model to Y is
logically equivalent to fitting a mean (constant-only) model to
DIFF(Y).)
Here is a graph of the first difference of AUTOSALE/CPI, the deflated auto sales series. Notice that it now looks approximately stationary (at least the mean and variance are more-or-less constant) but it is not at all random (a strong seasonal pattern remains):
The following spreadsheet illustrates how the first difference is calculated for the deflated auto sales data: