**ARIMA models for time series forecasting**

Notes
on nonseasonal ARIMA models (pdf file)

Slides on seasonal and
nonseasonal ARIMA models (pdf file)

Introduction
to ARIMA: nonseasonal models

Identifying the order of differencing in an ARIMA model

Identifying the numbers of AR or MA terms in an ARIMA
model

Estimation of ARIMA
models

Seasonal differencing in ARIMA models

Seasonal random walk: ARIMA(0,0,0)x(0,1,0)

Seasonal random trend: ARIMA(0,1,0)x(0,1,0)

General seasonal models: ARIMA (0,1,1)x(0,1,1) etc.

Summary of rules for identifying ARIMA models

ARIMA models with regressors

The
mathematical structure of ARIMA models (pdf file)

**Estimation of ARIMA
models**

Linear versus nonlinear least squares

Mean versus constant

Backforecasting

**Linear
versus nonlinear least squares**

ARIMA models
which include only AR terms are special cases of linear regression models,
hence they can be fitted by ordinary least squares.

- AR
forecasts are a linear function of the coefficients as well as a linear
function of past data.
- In
principle, least-squares estimates of AR coefficients can be exactly
calculated from autocorrelations in a single "iteration".
- In
practice, you can fit an AR model in the Multiple Regression
procedure--just regress DIFF(Y) (or whatever) on lags of itself. (But you
would get slightly different results from the ARIMA procedure--see below!)

ARIMA models
which include MA terms are similar to regression models, but can't be fitted by
ordinary least squares:

- Forecasts
are a linear function of past data, but they are
*nonlinear*functions of coefficients--e.g., an ARIMA(0,1,1) model without constant is an exponentially weighted moving average:

Ŷ_{t}= (1 - θ_{1})[Y_{t-1}_{ }+ θ_{1}Y_{t-2}+ θ_{1}^{2}Y_{t-3}+ …]

...in which the forecasts are a nonlinear function of the MA(1) parameter ("theta"). - Another
way to look at the problem: you can't fit MA models using ordinary multiple
regression because there's no way to specify ERRORS as an independent
variable--the errors are not known until the model is fitted! They need to
be calculated
*sequentially*, period by period, given the current parameter estimates. - MA
models therefore require a nonlinear estimation algorithm to be used,
similar to the "Solver" algorithm in Excel.
- The
algorithm uses a search process that typically requires 5 to 10 iterations
and occasionally may not converge.
- You
can adjust the tolerances for determining step sizes and stopping criteria
for search (although default values are usually OK).

The
"mean" and the "constant" in ARIMA model-fitting results
are different numbers whenever the model includes AR terms. Suppose that you
fit an ARIMA model to Y in which p is the number of autoregressive terms.
(Assume for convenience that there are no MA terms.) Let y denote the
differenced (stationarized) version of Y, e.g., y_{t}
= Y_{t} - Y_{t-1} if one nonseasonal
difference was used. Then the AR(p) forecasting equation for y is:

ŷ_{t} = μ +
ϕ_{1 }y_{t-1} + ϕ_{2}y_{t-2} +… + ϕ_{p}y_{t--p}

This is just
an ordinary multiple regression model in which μ is the constant
term, ϕ_{1}
is the coefficient of the first lag of y, and so on.

Now,
internally, the software converts this slope-intercept form of the regression
equation to an equivalent form in terms of *deviations from the mean*. Let
m denote the mean of the stationarized series y. Then the p-order
autoregressive equation can be written in terms of deviations from the mean as:

ŷ_{t} = m
+ ϕ_{1 }(y_{t-1}
- m) + ϕ_{2}(y_{t-2} - m) +… + ϕ_{p}(y_{t--p}
- m)

By
collecting all the constant terms in this equation, we see it is equivalent to
the original form of the equation if:

μ = m(1 - ϕ_{1} - ϕ_{2} - … - ϕ_{p}_{ } )

or in words:

**CONSTANT
= MEAN x (1 - sum of AR coefficients)**

The software
actually estimates m (along with the other model parameters) and reports this
as the MEAN in the model-fitting results, along with its standard error and
t-statistic, etc. The CONSTANT (μ) is then calculated according to the formula
above. If the model does *not* contain any AR terms, the MEAN and the
CONSTANT are identical.

In a model
with one order of nonseasonal differencing (only), the MEAN is the trend factor
(average period-to-period change). In a model with one order of *seasonal*
differencing (only), the MEAN is the *annual* trend factor (average
year-to-year change).

- The
basic problem: an ARIMA model (or other time series model) predicts future
values of the time series from past values--but how should the forecasting
equation be
*initialized*to make a forecast for the very first observation? (Actually, AR models can be initialized by dropping the first few observations--although this is inefficient and wastes data-- but MA models require an estimate of a prior error before they can make the first forecast.) *Strange but true*: a__stationary__time series looks the same going*forward*or*backward*in time, therefore...- The
same model that predicts the
*future*of a series can also be used to predict its*past*. - The
solution: to squeeze the most information out of the available data, the
best way to initialize an ARIMA model (or any time series forecasting
model) is to use
*backward forecasting*("backforecasting") to obtain estimates of data values prior to period 1. - When
you use the backforecasting option in ARIMA estimation, the search
algorithm actually makes two passes through the data on each iteration:
first a backward pass is made to estimate prior data values using the
current parameter estimates, then the estimated prior data values are used
to initialize the forecasting equation for a forward pass through the
data.
- If
you DON'T use the backforecasting option, the forecasting equation is
initialized by assuming that prior values of the stationarized series were
equal to the
*mean*. - If
you DO use the backforecasting option, then the backforecasts that are
used to initialize the model are
*implicit parameters of the model,*which must be estimated along with the AR and MA coefficients. The number of additional implicit parameters is roughly equal to the highest lag in the model--usually 2 or 3 for a nonseasonal model, and s+1 or 2s+1 for a seasonal model with seasonality=s. (If the model includes both a seasonal difference and a seasonal AR or MA term, it needs*two season's worth*of prior values to start up!) - Note
that with either backforecasting option, an AR model is estimated in a
different way than it would be estimated in the Multiple Regression
procedure (missing values are not merely ignored--they are replaced either
with an estimate of the mean or with backforecasts), hence
*an AR model fitted in the ARIMA procedure will never yield exactly the same parameter estimates as an AR model fitted in the Multiple Regression procedure*. - Conventional
wisdom: turn backforecasting OFF when you are unsure if the current model
is valid, turn it ON to get final parameter estimates once you're
reasonably sure the model is valid.
- If
the model is mis-specified, backforecasting may lead to failures of the
parameter estimates to converge and/or to unit-root problems.

Go to next topic: Seasonal
differencing in ARIMA models