As a first
step in improving on "naive" forecasting models, nonseasonal
patterns and trends can be extrapolated using a *moving-average* or *smoothing*
model. The basic assumption behind averaging and smoothing models is that the
time series is "locally stationary" with a slowly varying mean.
Hence, we take a *moving* (i.e., local) average to estimate the current
value of the mean, and use this as the forecast. This can be considered as a
compromise between the mean model and the random walk model. The moving average
is often called a "smoothed" version of the original series, since
short-term averaging has the effect of smoothing out the bumps in the original
series. By adjusting the degree of smoothing (i.e., the "width" of
the moving average), we can hope to strike some kind of optimal balance between
the performance of the mean and random walk models. The simplest kind of
averaging model is the....

**Simple
(equally-weighted) Moving Average:**

**Ý(****t) = (Y(t-1) +
Y(t-2) + . . . + Y(t-k))/k**

Here, the
one-period-ahead forecast Ý(t), made at time
t-1, equals the simple average of the last k observations. This average is
"centered" at period t-(k+1)/2, which implies that the estimate of
the local mean will tend to lag behind the true value of the local mean by
about (k+1)/2 periods. Thus, we say **the average age of the data in
the simple moving average is (k+1)/2 **relative to the period for which the
forecast is computed: this is the amount of time by which forecasts will tend
to

Here is an example of a series which
appears to exhibit random fluctuations around a slowly-varying mean. First,
let's try to fit it with a random walk model, which is equivalent to a simple
moving average of 1 term:

The random
walk model responds very quickly to changes in the series, but in so doing it
picks much of the "noise" in the data (the random fluctuations) as
well as the "signal" (the local mean). If we instead try a simple
moving average of 5 terms, we get a smoother-looking set of forecasts:

The 5-term
simple moving average yields significantly smaller errors than the random walk
model in this case. The average age of the data in this forecast is 3 (=(5+1)/2), so that it tends to lag behind turning points by
about three periods. (For example, a downturn seems to have occurred at period
21, but the forecasts do not turn around until several periods later.)

Notice that the** long-term
forecasts **from the SMA model are a *horizontal straight line,* just as
in the random walk model. Thus, the SMA model assumes that there is *no trend
*in the data. However, whereas the forecasts from the random walk model are
simply equal to the last observed value, the forecasts from the SMA model are
equal to a *weighted average of recent values*.

Interestingly,
the **confidence limits **computed by Statgraphics for the long-term
forecasts of the simple moving average do *not* get wider as the
forecasting horizon increases. This is obviously not correct! Unfortunately,
there is no underlying statistical theory that tells us how the confidence
intervals ought to widen for this model. If you were going to use this model in
practice, you would be well advised to use an *empirical estimate* of the
confidence limits for the longer-horizon forecasts. For example, you could set
up a spreadsheet in which the SMA model would be used to forecast 2 steps
ahead, 3 steps ahead, etc., within the historical data sample. You could then
compute the sample standard deviations of the errors at each forecast horizon,
and then construct confidence intervals for longer-term forecasts by adding and
subtracting multiples of the appropriate standard deviation.

If we try
a 9-term simple moving average, we get even smoother forecasts and more of a
lagging effect:

The
average age is now 5 periods (=(9+1)/2). If we take a
19-term moving average, the average age increases to 10:

Notice
that, indeed, the forecasts are now lagging behind turning points by about 10
periods.

**Brown's
Simple Exponential Smoothing (exponentially weighted moving average)**

The simple
moving average model described above has the undesirable property that it
treats the last k observations equally and completely ignores all preceding
observations. Intuitively, past data should be discounted in a more gradual
fashion--for example, the most recent observation should get a little more
weight than 2nd most recent, and the 2nd most recent should get a little more
weight than the 3rd most recent, and so on. The **simple exponential smoothing
(SES) model** accomplishes this. Let denote a "smoothing
constant" (a number between 0 and 1) and let S(t)
denote the value of the smoothed series at period t. The following formula is
used recursively to update the smoothed series as new observations are
recorded:

**S(****t) = ****Y(t) + (1-****)
S(t-1)**

Thus, the
current smoothed value is an interpolation between the previous smoothed value
and the current observation, where controls the closeness of the
interpolated value to the most recent observation. The *forecast* for the *next*
period is simply the current smoothed value:

**Ý(****t+1) = S(t)**

(Note: we
will henceforth use the symbol Ý to stand for a forecast of the time
series Y, because Ý is the nearest thing to a
"y-hat" that can be displayed on a web page.) Equivalently, we can
express the next forecast directly in terms of previous forecasts and previous
observations, in any of the following ways:

**Ý(****t+1) = ****Y(t) + (1-****)Ý(t)
**...forecast=interpolation
between previous *forecast* and previous *observation*

**Ý(****t+1) = Ý(t)
+ ****e(t) **...forecast=previous
*forecast* plus fraction of previous error, where e(t) = Y(t) - Y(t)

**Ý(****t+1) = Y(t) - (1-****)e(t) **...forecast=previous *observation*
minus fraction 1-
of previous error

**Ý(t+1)
= ****[Y(t) + (1-****)Y(t-1) + ((1-****)^2)Y(t-2) + ((1-****)^3)Y(t-3) + . . . ]** ...forecast=*exponentially
weighted (i.e. discounted) moving average* with discount factor 1-

The
preceding four equations are all* mathematically equivalent*--any one of
them can be obtained by rearrangement of any of the others. The first equation
above is probably the easiest to use if you are implementing the model on a
spreadsheet: the forecasting formula fits in a single cell and contains cell
references pointing to the previous forecast, the previous observation, and the
cell where the value of is stored.

Note that
if =1,
the SES model is equivalent to a random walk model (without growth). If =0, the SES model
is equivalent to the mean model, assuming that the first smoothed value is set
equal to the mean.

**The
average age of the data in the simple-exponential-smoothing forecast is 1/**** relative to the period for which the
forecast is computed. (This is not supposed to be obvious, but it can easily be
shown by evaluating an infinite series.) Hence, the simple moving average
forecast tends to lag behind turning points by about 1/ periods. For
example, when =
0.5 the lag is 2 periods; when = 0.2 the lag is 5 periods; when = 0.1 the lag is 10
periods, and so on.**

For a
given average age (i.e., amount of lag), the simple exponential smoothing (SES)
forecast is somewhat superior to the simple moving average (SMA) forecast
because *it places relatively more weight on the most recent observation*--i.e.,
it is slightly more "responsive" to changes occuring
in the recent past.

Another
important advantage of the SES model over the SMA model is that the SES model
uses a smoothing parameter which is continuously variable, so it can easily
optimized by using a "solver" algorithm to minimize the mean squared
error. The optimal value of in the SES model for this series turns out
to be 0.2961, as shown here:

The
average age of the data in this forecast is 1/0.2961 = 3.4 periods, which is
similar to that of a 6-term simple moving average.

The
long-term forecasts from the SES model are a *horizontal straight line*,
as in the SMA model and the random walk model without growth. However, note that
the confidence intervals computed by Statgraphics now diverge in a
reasonable-looking fashion, and that they are substantially narrower than the
confidence intervals for the random walk model. The SES model assumes that the
series is somewhat "more predictable" than does the random walk
model.

An SES
model is actually a special case of an ARIMA model, so the statistical theory
of ARIMA models provides a sound basis for calculating confidence intervals for
the SES model. In particular, **an SES model is an ARIMA model with one nonseasonal difference, an MA(1)
term, and no constant term**, otherwise known as an "ARIMA(0,1,1)
model without constant". The MA(1) coefficient in
the ARIMA model corresponds to the quantity 1- in the SES model.
For example, if you fit an ARIMA(0,1,1) model without
constant to the series analyzed here, the estimated MA(1) coefficient turns out
to be 0.7029, which is almost exactly one minus 0.2961.

It is
possible to add the assumption of a **non-zero constant linear trend** to an
SES model. To do this in Statgraphics, just specify an ARIMA model with one nonseasonal difference and an MA(1)
term *with* a constant, i.e., an ARIMA(0,1,1) model *with* constant.
The long-term forecasts will then have a trend which is equal to the average
trend observed over the entire estimation period. You cannot do this in
conjunction with seasonal adjustment, because the seasonal adjustment options
are disabled when the model type is set to ARIMA. However, you can add a **constant
long-term exponential trend** to a simple exponential smoothing model (with
or without seasonal adjustment) by using the** inflation adjustment** option
in the Forecasting procedure. The appropriate "inflation"
(percentage growth) rate per period can be estimated as the slope coefficient
in a linear trend model fitted to the data in conjunction with a natural
logarithm transformation, or it can be based on other, independent information
concerning long-term growth prospects.

**Brown's
Linear (i.e., double) Exponential Smoothing**

If the *trend*
as well as the mean is varying slowly over time, a higher-order smoothing model
is needed totrack the varying trend.** **The
simplest time-varying trend model is Brown's **linear exponential smoothing
(LES) **model, which uses *two* different smoothed series that are
centered at different points in time. The forecasting formula is based on an
extrapolation of a line through the two centers. (Alternatively, a double
application of the *simple* moving average method can be used to track
time-varying trends--see pages 154-158 in your textbook.)

The
algebraic form of the linear exponential smoothing model, like that of the
simple exponential smoothing model, can be expressed in a number of different
but equivalent forms. The "standard" form of this model is usually
expressed as follows: Let S' denote the *singly-smoothed* series obtained
by applying simple exponential smoothing to series Y. That is, the value of S'
at period t is given by:

**S'(t) =
****Y(****t) + (1-****)S'(t-1)**

(Recall
that, under simple exponential smoothing, we would just let Ý(t+1)
= S'(t) at this point.) Then let S" denote the *doubly-smoothed*
series obtained by applying simple exponential smoothing (using the same ) to series S':

**S''(****t) = ****S'(t) + (1-****)S''(t-1)**

Finally,
the forecast Ý(t+1) is given by:

**Ý(****t+1) = a(t) + b(t)**

where:

**a(****t) = 2S'(t) - S''(t)
**...the
estimated *level* at period t

**b(****t) = (****/(1-****))(S'(t)
- S''(t))**
...the estimated *trend* at period t.

Forecasts with
longer lead times made at period t are obtained by adding multiples of the
trend term. For example, the k-period-ahead forecast (i.e., the forecast for Y(t+k) made at period t) would be
equal to a(t)+kb(t). For purposes of model-fitting (i.e., calculating
forecasts, residuals, and residual statistics over the estimation period), the
model can be started up by setting S'(1)=S''(1)=Y(1),
i.e., set both smoothed series equal to the observed value at t=1.

A **mathematically
equivalent form** of Brown's linear exponential smoothing model, which
emphasizes its non-stationary character and is easier to implement on a
spreadsheet, is the following:

**Ý(t)
= 2Y(t-1) - Y(t-2) - 2(1-****)e(t-1) + ((1-****)^2)e(t-2)**

or equivalently:

**Ý(t)
- Y(t-1) = Y(t-1) - Y(t-2) - 2(1-****)e(t-1)
+ ((1-****)^2)e(t-2)**

In other
words, the *predicted* *difference* at period t (namely Ý(t) - Y(t-1)) is equal to the previous observed
difference (namely Y(t-1) - Y(t-2)) minus a weighted difference of the two
previous forecast errors.

Caution:
this form of the model is rather tricky to start up at the beginning of the
estimation period. The following convention is recommended: first set Ý(1) = Y(1), which yields e(1) = 0 (i.e., cheat a
bit, and let the first forecast equal the actual first observation), then also
set Ý(2) = Y(1), which yields e(2) = Y(2) - Y(1), then continue from
this point using the equation above. This would yield the same fitted values as
the formula based on S' and S'' if the latter were started up using S'(1) = S''(1) = Y(1).

Once
again, you can use your spreadsheet the "solver" or any nonlinear
least squares algorithm to optimize the value of . The optimal value of in the LES model
fitted to this series by Statgraphics is 0.1607. Note that the long-term
forecasts of the LES model for this time series appear to track the local trend
observed in the last 10 periods. Also, the confidence intervals for the LES
model widen faster than those of the SES model.

What's
best for this particular time series? Here is a model comparison report for the
models described above. It appears that the SES model performs better than the
SMA models, and LES model is close behind. Whether you choose SES or LES in
this case would depend on whether you really believe that the series has a
local trend.

Models

------

(A) Random walk

(B) Simple moving average of 5 terms

(C) Simple moving average of 9 terms

(D) Simple exponential smoothing with alpha = 0.2961

(E) Brown's linear exp. smoothing with alpha = 0.1607

Estimation Period

Model MSE MAE MAPE ME MPE

------------------------------------------------------------------------

(A) 14825.3 93.2708 23.6152 1.04531 -5.21856

(B) 10329.9 80.6686 20.2747 1.35328 -5.32013

(C) 10826.1 80.2773 20.1534 6.89349 -4.66414

(D) 9776.75 75.0504 18.987 3.27046 -4.84999

(E) 10320.8 77.5989 19.3382 0.553851 -4.67831

**Brown's quadratic
(i.e., triple) smoothing model** ...uses *three* smoothed series
centered at different points in time and extrapolates a *parabola* through
the three centers. This is rarely used in practice, though, since true
quadratic trends are rare and the model is highly unstable.

**Which type of
trend-extrapolation is best: horizontal, linear, or quadratic?** Empirical evidence
suggests that, if the data have already been adjusted (if necessary) for
inflation, then it may be imprudent to extrapolate short-term linear (or worse,
quadratic) trends very far into the future. Trends evident today may slacken in
the future due to varied causes such as product obsolescence, increased
competition, and cyclical downturns or upturns in an industry. For this reason,
**simple*** *exponential smoothing often performs better out-of-sample
than might otherwise be expected, despite its "naive" horizontal
trend extrapolation. **Damped trend** modifications of the linear exponential
smoothing model are often used in practice to introduce a note of conservatism
into its trend projections--alas, these are not available in Statgraphics.

In principle, it is
possible to calculate **confidence intervals **around long-term forecasts
produced by exponential smoothing models, by considering them as special cases
of ARIMA models. (Beware: not all software does this correctly. In particular,
a number of popular automatic forecasting programs use highly suspect methods
for calculating confidence intervals for exponential smoothing forecasts.) The
width of the confidence intervals depends on (i) the
RMS error of the model, (ii) the value of , (iii) the level of smoothing
(single, double, or triple); and (iv) the number of periods ahead you are
forecasting. In general, the intervals spread out faster as gets larger and/or
or as the order of smoothing increases from single to double to triple. We will
revisit this subject when we discuss ARIMA models later in the course.