Principles and risks
of forecasting (pdf)

Famous forecasting
quotes

How to move data around

Get to know your data

Inflation adjustment (deflation)

Seasonal adjustment

Stationarity and differencing

The logarithm transformation

**Stationarity and
differencing**

Statistical stationarity

First difference (period-to-period change)

**Statistical
stationarity: **A
*stationary* time series is one whose statistical properties such as mean,
variance, autocorrelation, etc. are all constant over time. Most statistical forecasting
methods are based on the assumption that the time series can be rendered
approximately stationary (i.e., "stationarized") through the use of
mathematical transformations. A stationarized series is relatively easy to
predict: you simply predict that its statistical properties will be the same in
the future as they have been in the past! (Recall our famous forecasting quotes.) The predictions for
the stationarized series can then be "untransformed," by reversing
whatever mathematical transformations were previously used, to obtain
predictions for the original series. (The details are normally taken care of by
your software.) Thus, finding the sequence of transformations needed to
stationarize a time series often provides important clues in the search for an
appropriate forecasting model.
Stationarizing a time series through differencing (where needed) is an
important part of the process of fitting an **ARIMA model**, as discussed in the ARIMA pages
of these notes.

Another
reason for trying to stationarize a time series is to be able to obtain
meaningful sample statistics such as means, variances, and correlations with
other variables. Such statistics are useful as descriptors of future behavior *only*
if the series is stationary. For example, if the series is consistently
increasing over time, the sample mean and variance will grow with the size of
the sample, and they will always underestimate the mean and variance in future
periods. And if the mean and variance of a series are not well-defined, then
neither are its correlations with other variables. For this reason you should
be cautious about trying to extrapolate *regression* models fitted to
nonstationary data.

Most
business and economic time series are far from stationary when expressed in
their original units of measurement, and even after deflation or seasonal
adjustment they will typically still exhibit trends, cycles, random-walking,
and other non-stationary behavior. If the series has a stable
long-run trend and tends to revert to the trend line following a disturbance,
it may be possible to stationarize it by de-trending (e.g., by fitting a trend
line and subtracting it out prior to fitting a model, or else by including the
time index as an independent variable in a regression or ARIMA model), perhaps
in conjunction with logging or deflating. Such a series is said to be **trend-stationary**.
However, sometimes even de-trending is not sufficient to make the
series stationary, in which case it may be necessary to transform it into a
series of period-to-period and/or season-to-season *differences*. If
the mean, variance, and autocorrelations of the original series are not
constant in time, even after detrending, perhaps the statistics of the *changes
*in the series between periods or between seasons *will* be
constant. Such a series is said to be **difference-stationary***.
*(Sometimes it can be hard to tell the difference between a series that is
trend-stationary and one that is difference-stationary, and a so-called **unit
root test*** *may be used to get a more definitive answer. We will
return to this topic later in the course.)

(Return to top of page.)

The
**first difference** of a time series is the series of changes from one
period to the next. If Y_{t} denotes the value of the time series Y at
period t, then the first difference of Y at period t is equal to Y_{t}-Y_{t-1}.
In Statgraphics, the first difference of Y is expressed as DIFF(Y), and in
RegressIt it is Y_DIFF1. If the first difference of Y is stationary and also *completely
random *(not autocorrelated), then Y is described by a random
walk model: each value is a random step away from the previous value. If
the first difference of Y is stationary but *not* completely random--i.e.,
if its value at period t is autocorrelated with its value at earlier
periods--then a more sophisticated forecasting model such as exponential
smoothing or ARIMA may be appropriate. (Note: if DIFF(Y) is
stationary and random, this indicates that a random walk model is appropriate
for the original series Y, *not* that a random walk model should be fitted
to DIFF(Y). Fitting a random walk model to Y is logically equivalent to
fitting a mean (constant-only) model to DIFF(Y).)

Here is a
graph of the first difference of AUTOSALE/CPI, the deflated auto sales series.
Notice that it now looks approximately stationary (at least the mean and
variance are more-or-less constant) but it is not at all random (a strong
seasonal pattern remains):

The
following spreadsheet illustrates how the first difference is calculated for
the deflated auto sales data:

Go on to next topic: The logarithm transformation