Averaging and exponential smoothing models

Spreadsheet implementation of seasonal adjustment and exponential smoothing

It is
straightforward to perform seasonal adjustment and fit exponential smoothing
models using a spreadsheet program such as Excel. The screen images and charts
below are taken from a spreadsheet which has been set up to illustrate
multiplicative seasonal adjustment and linear exponential smoothing on the
Outboard Marine data. To obtain a copy of the spreadsheet file itself, click here. The
version of linear exponential smoothing that will be used here for purposes of
demonstration is Brown’s version, merely because it can be implemented with
a single column of formulas and there is only one smoothing constant to
optimize. Usually it is better to
use Holt’s version that has separate smoothing constants for level and
trend.

The
forecasting process proceeds as follows: (i) first the data are seasonally
adjusted; (ii) then forecasts are generated for the seasonally adjusted data
via linear exponential smoothing; and (iii) finally the seasonally adjusted
forecasts are "reseasonalized" to obtain forecasts for the original
series. The seasonal adjustment process is carried out in the first six columns.

The first
step in seasonal adjustment is to compute a *centered moving average*
(performed here in column C). This can be done by simply taking the average of
two one-year-wide averages that are offset by one period relative to each
other. The next step is to compute the *ratio to moving average*--i.e.,
the original data divided by the moving average in each period--which is
performed here in column D. The estimated *seasonal index* for each season
is computed by first averaging all the ratios for that particular season, and
then renormalizing the ratios if necessary so that they sum to exactly 100%
times the number of periods in a season, or 400% in this case. This is done in
the first few rows of columns E and F. In this case the number of rows is
small, and for purposes of illustration I just manually typed the row numbers
for all the "June" ratios inside the AVERAGE( ) formula in cell E3,
and then I copied this formula into cells E4-E6. (There are better, more systematic ways
to average the ratios, which really ought to be used in practice to avoid
formula errors. In particular, you
could add a column that contains numeric codes for seasons and then use an
AVERAGEIF formula to average only the values for a particular season
number.) Cells E13-E16 contain cell
references pointing to the appropriate seasonal indices computed in cells
F3-F6. (Careful: you must be sure to line them up
correctly. In this case, the
formulas in cells E13 to E16 are "=F5", "=F6",
"=F3", "=F4", respectively.) Cells E17 and below contain cell
references pointing to the value in the same column 4 quarters earlier. Note
that you do NOT copy the seasonal index formula into ALL the other cells in the
same column--you just replicate it enough times to obtain the seasonal index
once for each season in the year. Then, in the remainder of the cells in the
column, you insert cell references pointing to the seasonal index used in the
same season in the preceding year. The *seasonally adjusted data* is
obtained by dividing the original data by the column of seasonal indices (here,
by dividing column B by column E). The centered moving average and the seasonally
adjusted data end up looking like this:

Note that
the moving average typically looks like a smoothed version of the seasonally
adjusted series.

The same
spreadsheet also shows the application of the linear exponential smoothing
model to the seasonally adjusted data, beginning in column G.

A value
for the smoothing constant (alpha) is entered above the forecast column (here,
in cell G9) and for convenience it is assigned the range name
"Alpha." (The name is assigned using the
"Insert/Name/Create" command.) The LES model is initialized by
setting the first two forecasts equal to the first actual value of the
seasonally adjusted series. The formula used here for the LES forecast is the single-equation
recursive form of Brown’s model:

This formula is entered in the cell corresponding to the third period (here,
cell G15) and copied down from there. Notice that the LES forecast for the
current period refers to the *two preceding observations *and the *two
preceding forecast errors,* as well as to the value of alpha. Thus, the
forecasting formula in row 15 refers only to data which were available in row
14 and earlier. (Of course, if we wished to use *simple* instead of linear
exponential smoothing, we could substitute the SES formula here instead. We could also use Holt’s rather
than Brown’s LES model, which would require two more columns of formulas
to calculate the level and trend that are used in the forecast.)

The errors
are computed in the next column (here, column H) by subtracting the forecasts
from the actual values. The** root mean squared error **is computed
as the square root of the variance of the errors plus the square of the mean.
(This follows from the mathematical identity: MSE= VARIANCE(errors)+
(AVERAGE(errors))^2.) In calculating the mean and variance of the errors in
this formula, the first two periods are excluded because the model does not
actually begin forecasting until the third period (row 15 on the spreadsheet).
The optimal value of alpha can be found either by manually changing alpha until
the minimum RMSE is found, or else you can use the "Solver" to
perform an exact minimization. The value of alpha that the Solver found is
shown here (alpha=0.471154).

It is
usually a good idea to plot the errors of the model (in transformed units) and
also to compute and plot their autocorrelations at lags up to one season. Here
is a time series plot of the (seasonally adjusted) errors:

The error
autocorrelations are computed by using the CORREL( ) function to compute the
correlations of the errors with themselves lagged by one or more
periods--details are shown in the spreadsheet model. Here is a plot of the
autocorrelations of the errors at the first five lags:

The spike
at lag 4 is slightly troublesome--it suggests that the seasonal adjustment
process has not been completely successful. However, it is actually only
marginally significant. The 95% significant limits for autocorrelations are
roughly plus-or-minus-2-over-the-square-root-of-n, where n is the sample size.
Here the sample size for the autocorrelations varies from 34 to 38 depending on
the lag, so the square-root-of-n is about 6, and hence the confidence limits
are roughly plus-or-minus 2/6, or 0.33. If you vary the value of alpha by hand,
you can observe the effect on the time series and autocorrelation plots of the
errors, as well as on the mean squared error.

At the
bottom of the spreadsheet, the forecasting formula is "bootstrapped"
into the future by merely substituting forecasts for actual values at the point
where the actual data runs out--i.e., where "the future" begins. (In
other words, in each cell where a future data value would occur, a cell
reference is inserted which points to the* forecast* made for that
period.) All the other formulas are simply copied down from above:

Notice
that the *errors* for forecasts of the future are all computed to be *zero*.
This does not mean the actual errors will be zero, but rather it merely
reflects the fact that for purposes of extrapolation we are assuming that the
future data will equal the forecasts. The resulting LES forecasts for the
seasonally adjusted data look like this:

Note that
with this particular value of alpha--the "optimal" value for
one-step-ahead prediction--the projected trend is slightly upward, reflecting
the local trend that was observed over the last 2 years or so. For other values
of alpha, a different trend projection might be obtained.

The final
step in building the forecasting model is to "reasonalize" the LES
forecasts by multiplying them by the appropriate seasonal indices. Thus, the
reseasonalized forecasts in column I are simply the product of the seasonal
indices in column E and the seasonally adjusted LES forecasts in column G.

It is
relatively easy to compute **confidence intervals **for one-step-ahead
forecasts made by this model: first compute the RMSE (root-mean-squared error,
which is just the square root of the MSE) and then compute a confidence
interval for the *seasonally adjusted* forecast by adding and subtracting
two times the RMSE. (Remember that a 95% confidence interval is roughly equal
to the point forecast "plus-or-minus-two-standard deviations,"
assuming the error distribution is approximately normal and the sample size is
20 or more. Here, the RMSE is the best estimate of the error standard
deviation.) The confidence limits for the seasonally adjusted forecast are then
*reseasonalized*, along with the forecast, by multiplying them by the
appropriate seasonal indices. In this case the RMSE is equal to **27.4** and
the seasonally adjusted forecast for the first future period (Dec-93) is **273.2**,
so the seasonally adjusted 95% confidence interval is from **273.2-2*27.4 =
218.4** to **273.2+2*27.4 = 328.0**. Multiplying these limits by
December's seasonal index of** 68.61%**, we obtain lower and upper
confidence limits of **149.8** and** 225.0** around the Dec-93 point
forecast of **187.4**.

Confidence limits for forecasts more than one period ahead will generally widen as the forecast horizon increases, due to uncertainty about the level and trend as well as the seasonal factors, but it is difficult to compute them in general by analytic methods. (The appropriate way to compute confidence limits for the LES forecast is by using ARIMA theory, but the uncertainty in the seasonal indices is another matter.) If you want a realistic confidence interval for a forecast more than one period ahead, taking all sources of error into account, your best bet is to use empirical methods: for example, to obtain a confidence interval for a 2-step ahead forecast, you could create another column on the spreadsheet to compute a 2-step-ahead forecast for every period (by bootstrapping the one-step-ahead forecast). Then compute the RMSE of the 2-step-ahead forecast errors and use this as the basis for a 2-step-ahead confidence interval.