ARIMA models for time series forecasting
Notes
on nonseasonal ARIMA models (pdf file)
Slides on seasonal and
nonseasonal ARIMA models (pdf file)
Introduction
to ARIMA: nonseasonal models
Identifying the order of differencing in an ARIMA model
Identifying the numbers of AR or MA terms in an ARIMA
model
Estimation of ARIMA
models
Seasonal differencing in ARIMA models
Seasonal random walk: ARIMA(0,0,0)x(0,1,0)
Seasonal random trend: ARIMA(0,1,0)x(0,1,0)
General seasonal models: ARIMA (0,1,1)x(0,1,1) etc.
Summary of rules for identifying ARIMA models
ARIMA models with regressors
The
mathematical structure of ARIMA models (pdf file)
Estimation of ARIMA
models
Linear versus nonlinear least squares
Mean versus constant
Backforecasting
Linear
versus nonlinear least squares
ARIMA models
which include only AR terms are special cases of linear regression models,
hence they can be fitted by ordinary least squares.
- AR
forecasts are a linear function of the coefficients as well as a linear
function of past data.
- In
principle, least-squares estimates of AR coefficients can be exactly
calculated from autocorrelations in a single "iteration".
- In
practice, you can fit an AR model in the Multiple Regression
procedure--just regress DIFF(Y) (or whatever) on lags of itself. (But you
would get slightly different results from the ARIMA procedure--see below!)
ARIMA models
which include MA terms are similar to regression models, but can't be fitted by
ordinary least squares:
- Forecasts
are a linear function of past data, but they are nonlinear
functions of coefficients--e.g., an ARIMA(0,1,1) model without constant is
an exponentially weighted moving average:
Ŷt = (1 - θ1 )[Yt-1 + θ1Yt-2 + θ12Yt-3 + …]
...in which the forecasts are a nonlinear function of the MA(1) parameter
("theta").
- Another
way to look at the problem: you can't fit MA models using ordinary multiple
regression because there's no way to specify ERRORS as an independent
variable--the errors are not known until the model is fitted! They need to
be calculated sequentially, period by period, given the current
parameter estimates.
- MA
models therefore require a nonlinear estimation algorithm to be used,
similar to the "Solver" algorithm in Excel.
- The
algorithm uses a search process that typically requires 5 to 10 iterations
and occasionally may not converge.
- You
can adjust the tolerances for determining step sizes and stopping criteria
for search (although default values are usually OK).
"Mean"
versus "constant"
The
"mean" and the "constant" in ARIMA model-fitting results
are different numbers whenever the model includes AR terms. Suppose that you
fit an ARIMA model to Y in which p is the number of autoregressive terms.
(Assume for convenience that there are no MA terms.) Let y denote the
differenced (stationarized) version of Y, e.g., yt
= Yt - Yt-1 if one nonseasonal
difference was used. Then the AR(p) forecasting equation for y is:
ŷt = μ +
ϕ1 yt-1 + ϕ2yt-2 +… + ϕpyt--p
This is just
an ordinary multiple regression model in which μ is the constant
term, ϕ1
is the coefficient of the first lag of y, and so on.
Now,
internally, the software converts this slope-intercept form of the regression
equation to an equivalent form in terms of deviations from the mean. Let
m denote the mean of the stationarized series y. Then the p-order
autoregressive equation can be written in terms of deviations from the mean as:
ŷt = m
+ ϕ1 (yt-1
- m) + ϕ2(yt-2 - m) +… + ϕp(yt--p
- m)
By
collecting all the constant terms in this equation, we see it is equivalent to
the original form of the equation if:
μ = m(1 - ϕ1 - ϕ2 - … - ϕp )
or in words:
CONSTANT
= MEAN x (1 - sum of AR coefficients)
The software
actually estimates m (along with the other model parameters) and reports this
as the MEAN in the model-fitting results, along with its standard error and
t-statistic, etc. The CONSTANT (μ) is then calculated according to the formula
above. If the model does not contain any AR terms, the MEAN and the
CONSTANT are identical.
In a model
with one order of nonseasonal differencing (only), the MEAN is the trend factor
(average period-to-period change). In a model with one order of seasonal
differencing (only), the MEAN is the annual trend factor (average
year-to-year change).
"Backforecasting"
- The
basic problem: an ARIMA model (or other time series model) predicts future
values of the time series from past values--but how should the forecasting
equation be initialized to make a forecast for the very first
observation? (Actually, AR models can be initialized by dropping the first
few observations--although this is inefficient and wastes data-- but MA
models require an estimate of a prior error before they can make the first
forecast.)
- Strange
but true: a stationary time
series looks the same going forward or backward in time,
therefore...
- The
same model that predicts the future of a series can also be used to
predict its past.
- The
solution: to squeeze the most information out of the available data, the
best way to initialize an ARIMA model (or any time series forecasting
model) is to use backward forecasting ("backforecasting")
to obtain estimates of data values prior to period 1.
- When
you use the backforecasting option in ARIMA estimation, the search
algorithm actually makes two passes through the data on each iteration:
first a backward pass is made to estimate prior data values using the
current parameter estimates, then the estimated prior data values are used
to initialize the forecasting equation for a forward pass through the
data.
- If
you DON'T use the backforecasting option, the forecasting equation is
initialized by assuming that prior values of the stationarized series were
equal to the mean.
- If
you DO use the backforecasting option, then the backforecasts that are
used to initialize the model are implicit parameters of the model,
which must be estimated along with the AR and MA coefficients. The number
of additional implicit parameters is roughly equal to the highest lag in
the model--usually 2 or 3 for a nonseasonal model, and s+1 or 2s+1 for a
seasonal model with seasonality=s. (If the model includes both a seasonal
difference and a seasonal AR or MA term, it needs two season's worth
of prior values to start up!)
- Note
that with either backforecasting option, an AR model is estimated in a
different way than it would be estimated in the Multiple Regression
procedure (missing values are not merely ignored--they are replaced either
with an estimate of the mean or with backforecasts), hence an AR model
fitted in the ARIMA procedure will never yield exactly the same parameter
estimates as an AR model fitted in the Multiple Regression procedure.
- Conventional
wisdom: turn backforecasting OFF when you are unsure if the current model
is valid, turn it ON to get final parameter estimates once you're
reasonably sure the model is valid.
- If
the model is mis-specified, backforecasting may lead to failures of the
parameter estimates to converge and/or to unit-root problems.
Go to next topic: Seasonal
differencing in ARIMA models