review and the simplest forecasting model: the sample mean (pdf)
Notes on the random walk model (pdf)
Mean (constant) model
Linear trend model
Random walk model
Geometric random walk model
Three types of forecasts: estimation, validation, and the future
When faced with a time series that shows irregular growth, such as Series #2 analyzed earlier, the best strategy may not be to try to directly predict the level of the series at each period (i.e., the quantity Y(t)). Instead, it may be better to try to predict the change that occurs from one period to the next (i.e., the quantity Y(t)-Y(t-1)). In other words, it may be helpful to look at the first difference of the series, to see if a predictable pattern can be discerned there. For practical purposes, it is just as good to predict the next change as to predict the next level of the series, since the predicted change can always be added to the current level to yield a predicted level. Here's a plot of the first difference of the irregular growth series analyzed above:
Notice that this looks stationary and quite random: a pattern that we previously fitted with the mean model. Hence, the forecasting model suggested by this plot is
...where alpha is the mean of the first difference , i.e., the average change one period to the next. If we rearrange this equation to put Y(t) by itself on the left, we get:
In other words, we predict that this period's value will equal last period's value plus a constant representing the average change between periods. This is the so-called "random walk" model: it assumes that, from one period to the next, the original time series merely takes a random "step" away from its last recorded position. (Think of an inebriated person who steps randomly to the left or right at the same time as he steps forward: the path he traces will be a random walk.)
If the constant term (alpha) in the random walk model is zero, it is a random walk without drift. This is the model that Statgraphics fits when you specify a "Random walk" on the Model Specification panel in the forecasting procedure. Here is a plot of the Series #2 and the forecast produced by the random-walk model:
Notice that (a) the one-step forecasts within the sample merely "shadow" the observed data, lagging exactly one period behind, and (b) the long-term forecasts outside the sample follow a horizontal straight line anchored on the last observed value. The error measures and residual randomness tests for this model are very much superior to those of the linear trend model, as will be seen below. However, the horizontal appearance of the long-term forecasts is rather unsatisfactory if we believe that the upward trend observed in the past is genuine. (Return to top of page.)
Random walk with drift: If the series being fitted by a random walk model has an average upward (or downward) trend that is expected to continue in the future, you should include a non-zero constant term in the model--i.e., assume that the random walk undergoes "drift." To add a non-zero constant drift term to the random walk model in Statgraphics, you can just check the "constant" box on the Model Options panel after specifying a random walk model. This works fine if you are not holding out any data for validation, but unfortunately there is a bug in this feature that surfaces when you are holding out data: the drift term is still estimated from the entire sample. (You'll notice that the forecasts do not change at all when more or fewer data points are held out.) To fit a random-walk-with-drift model with data held out for validation, you must specify it as a special case of an ARIMA model. ARIMA models are a very general class of forecasting models that includes random walk models and more elaborate models whose forecasting equations may include lags of the differenced time series (so-called auto-regressive or "AR" terms) and/or lags of the forecast errors (so-called moving-average or "MA" terms).
To specify the random walk model with non-zero constant drift, (i) select "ARIMA" as the model type, (ii) set the order of non-seasonal differencing to 1, (iii) set all the AR, MA, SAR, and SMA terms to zero (the default setting is AR=1: change this to zero), and (iv) leave the "constant" box checked (i.e., do estimate a constant). By choosing these settings, you are simply applying the constant (mean) forecasting model to the first difference of the series, although Statgraphics will "undifference" the forecasts for you in the plots and output reports. In ARIMA terminology, this is a "(0,1,0) model with constant," where the numbers in parentheses refer to the number of AR terms, the number of nonseasonal differences, the number of MA terms, respectively. Here's the result of specifying this model for Series #2:
This picture looks much the same as the previous one, except that the long-term forecasts now trend upward. The slope of the forecasts is merely the average monthly difference that was calculated inside the sample, which is 0.259231. (This is the "alpha" term in the forecasting equation, and it shows up as the estimated constant in the Analysis Summary report for the model.) This value is very close--but not quite identical--to the slope of the forecasts in the linear trend model, which was 0.258761. However, the intercept of the out-of-sample forecasts of the random walk model is always reanchored so that that the forecasts extend from the last observed data point, rather some point fixed in the past.
If we look at the Model Comparison report for the three models (linear trend, random walk, and random walk with constant), we see that the last model is indeed the best, both in-sample and out-of-sample.
An advantage to using the ARIMA model option to fit a random walk model is that it easily allows you to add terms to correct the model for autocorrelation in the residuals, if this should be necessary. In particular, if the random walk model has significant positive autocorrelation in the residuals at lag 1, you should try setting AR=1, which yields a so-called ARIMA(1,1,0) model. On the other hand, if the random walk model has significant negative autocorrelation in the residuals at lag 1, you should try setting MA=1, which yields a so-called ARIMA(0,1,1) model, which is essentially the same as a simple exponential smoothing model. We will discuss these model types and autocorrelation-correction strategies in more depth later in the course.