Statistics
review and the simplest forecasting model: the sample mean (pdf)

Notes on the random
walk model (pdf)

Mean (constant) model

Linear trend model

Random walk model

Geometric random walk model

Three types of forecasts: estimation, validation, and the
future

When faced
with a time series that shows irregular growth, such as Series #2 analyzed earlier, the best strategy
may not be to try to directly predict the *level* of the series at each
period (i.e., the quantity Y(t)). Instead, it may be better to try to predict
the *change* that occurs from one period to the next (i.e., the quantity Y(t)-Y(t-1)).
In other words, it may be helpful to look at the first
difference of the series, to see if a predictable pattern can be discerned
there. For practical purposes, it is just as good to predict the next change as
to predict the next level of the series, since the predicted change can always
be added to the current level to yield a predicted level. Here's a plot of the
first difference of the irregular growth series analyzed above:

Notice
that this looks stationary and quite random: a pattern that we previously
fitted with the mean model. Hence, the forecasting model suggested by this plot
is

...where
alpha is the mean of the first difference , i.e., the average change one period
to the next. If we rearrange this equation to put Y(t) by itself on the left,
we get:

In other
words, we predict that this period's value will equal last period's value plus
a constant representing the average change between periods. This is the
so-called "random walk" model: it assumes that, from one period to
the next, the original time series merely takes a random "step" away
from its last recorded position. (Think of an inebriated person who steps
randomly to the left or right at the same time as he steps forward: the path he
traces will be a random walk.)

If the
constant term (alpha) in the random walk model is *zero*, it is a **random
walk without drift**. This is the model that Statgraphics fits when you
specify a "Random walk" on the Model Specification panel in the
forecasting procedure. Here is a plot of the Series #2 and the forecast
produced by the random-walk model:

Notice
that (a) the one-step forecasts within the sample merely "shadow" the
observed data, lagging exactly one period behind, and (b) the long-term
forecasts outside the sample follow a horizontal straight line anchored on the
last observed value. The error measures and residual randomness tests for this
model are very much superior to those of the linear trend model, as will be
seen below. However, the horizontal appearance of the long-term forecasts is
rather unsatisfactory if we believe that the upward trend observed in the past
is genuine. (Return to top of page.)

**Random
walk with drift: **If
the series being fitted by a random walk model has an average upward (or
downward) trend that is expected to continue in the future, you should include
a *non-zero* constant term in the model--i.e., assume that the random walk
undergoes "drift." To add a non-zero constant drift term to the
random walk model in Statgraphics, you can just check the "constant"
box on the Model Options panel after specifying a random walk model. This
works fine if you are not holding out any data for validation, but
unfortunately there is a bug in this feature that surfaces when you are holding
out data: the drift term is still estimated from the entire sample.
(You'll notice that the forecasts do not change at all when more or fewer data
points are held out.) To fit a random-walk-with-drift model with
data held out for validation, you must specify it as a special case of an ARIMA
model. ARIMA models are a very general class of forecasting models that
includes random walk models and more elaborate models whose forecasting
equations may include lags of the differenced time series (so-called
auto-regressive or "AR" terms) and/or lags of the forecast errors
(so-called moving-average or "MA" terms).

To specify
the random walk model with non-zero constant drift, (i) select
"ARIMA" as the model type, (ii) set the order of *non-seasonal
differencing* to 1, (iii) set all the AR, MA, SAR, and SMA terms to zero
(the default setting is AR=1: change this to zero), and (iv) leave the
"constant" box checked (i.e., *do* estimate a constant). By
choosing these settings, you are simply applying the constant (mean)
forecasting model to the first difference of the series, although Statgraphics
will "undifference" the forecasts for you in the plots and output
reports. In ARIMA terminology, this is a "(0,1,0) model with
constant," where the numbers in parentheses refer to the number of AR
terms, the number of nonseasonal differences, the number of MA terms,
respectively. Here's the result of specifying this model for Series #2:

This
picture looks much the same as the previous one, except that the long-term
forecasts now trend upward. The *slope* of the forecasts is merely the *average
monthly difference* that was calculated inside the sample, which is
0.259231. (This is the "alpha" term in the forecasting equation, and
it shows up as the estimated constant in the Analysis
Summary report for the model.) This value is very close--but not quite
identical--to the slope of the forecasts in the linear
trend model, which was 0.258761. However, the *intercept* of the
out-of-sample forecasts of the random walk model is always *reanchored* so
that that the forecasts extend from the last observed data point, rather some
point fixed in the past.

If we look
at the Model Comparison report for the three models
(linear trend, random walk, and random walk with constant), we see that the
last model is indeed the best, both in-sample and out-of-sample.

An
advantage to using the ARIMA model option to fit a random walk model is that it
easily allows you to **add terms to correct the model for autocorrelation in
the residuals**, if this should be necessary. In particular, if the
random walk model has significant *positive autocorrelation in the residuals
at lag 1*, you should try setting AR=1, which yields a so-called
ARIMA(1,1,0) model. On the other hand, if the random walk model has
significant *negative autocorrelation in the residuals at lag 1*, you
should try setting MA=1, which yields a so-called ARIMA(0,1,1) model, which is
essentially the same as a simple exponential smoothing model. We will
discuss these model types and autocorrelation-correction strategies in more
depth later in the course.