FORECAST PRO Example #1: Expert-aided analysis of data from assignment #3

Variables entered in the initial tableau were S, ADV[-1], CADV[-1], CC[-1], Y[-1], and P[-1] and _CONST
(Note: ADV[-1] is ADV lagged by one period, etc.) 8 values were held out for validation.

```Forecast Pro for Windows Standard Edition Version 2.00
Sun Oct 06 10:54:27 1996

Expert data exploration of dependent variable S
---------------------------------------------------------------------
Length 29  Minimum 4.396  Maximum 6.676
Mean 5.481 Standard deviation 0.580

Classical decomposition (multiplicative)
Trend-cycle: 24.00%  Seasonal: 5.14%  Irregular: 70.86%

There are 5 strongly significant regressors.
_CONST
ADV[-1]
CADV[-1]
CC[-1]
P[-1]

Series is trended and seasonal.
```

Seasonal? (I don't know where this comment came from--the estimated seasonal variance component is only 5%!)

```Recommended model: Dynamic Regression

```

"Dynamic regression" simply means a time series regression model--i.e., a model that may end up including lagged variables and/or lagged error terms. Let's start by running an ordinary multiple regression with all the variables, obtaining the following standard output:

```Forecast Model for S
Regression(6 regressors, 0 lagged errors)

Term          Coefficient  Std. Error  t-Statistic  Significance
---------------------------------------------------------------------
_CONST         3.450625     0.990532     3.483609     0.997993
ADV[-1]        0.009580     0.003431     2.791892     0.989639
CADV[-1]      -0.000602     0.000755    -0.797894     0.566907 +++
CC[-1]         0.003256     0.001341     2.427545     0.976562
P[-1]          0.033864     0.015039     2.251762     0.965812
Y[-1]         -0.002405     0.001313    -1.831939     0.920057 +++

Marked regressors are insignificant.
```

Note: the "significance" values reported here are one minus the usual values--here a value less than 0.95 means NOT significant. (Confusing!)

```Standard Diagnostics
-------------------------------------------------------------
Sample size 29                   Number of parameters 6
Mean 5.481                       Standard deviation 0.5907
R-square 0.5208                  Adjusted R-square 0.4166
Durbin-Watson 1.901              Ljung-Box(13)=16.37 P=0.7705
Forecast error 0.4512            BIC 0.5692 (Best so far)
MAPE 0.05933                     RMSE 0.4018
MAD 0.3208
```

The sample "mean" and "standard deviation" are statistics of the dependent variable (here, S). The "forecast error" is the standard error of the estimate (RMSE in units fitted). The Bayesian Information Criterion (BIC) is the forecast error magnified by a penalty factor for the number of parameters estimated--theoretically it is the best bottom-line figure for comparisons between models of the same general type with different numbers of parameters. MAPE, MAD (=MAE), and RMSE are calculated in original units, if different from the units fitted. (Here they are in the same units because no nonlinear transformation was used.) RMSE apparently does not include an adjustment for the number of parameters fitted, which is why it is different from "forecast error." (?)

```Rolling simulation results
Cumulative        Cumulative
H   N       MAD      Average   MAPE    Average
---------------------------------------------------------------------
1   8      0.446      0.446    0.080    0.080
2   7      0.364      0.408    0.064    0.073
3   6      0.301      0.377    0.052    0.067
4   5      0.334      0.369    0.058    0.065
5   4      0.322      0.363    0.053    0.063
6   3      0.411      0.367    0.068    0.064
7   2      0.398      0.369    0.069    0.064
8   1      0.536      0.374    0.089    0.065
```

Wait a minute: this regression model cannot really forecast more than one period ahead! "Rolling simulation is not meaningful for regression models" according to the software developer. I believe that in this case the first row of statistics represents the validation period for the model, and the other rows are just double-counting some of the same errors .

The following diagnostic test for lagged variables not currently in the model is really useful, however:

```Variable specification test battery
------------------------------------------------------------------------
_CONST[-1]                        ChiSq( 1)=0.53  Percentile=0.5334
ADV[-2]                                     1.69             0.8062
CADV[-2]                                    0.53             0.5342
CC[-2]                                      0.53             0.5334
P[-2]                                       0.56             0.5439
Y[-2]                                      12.65             0.9996 **
_TREND                                     12.15             0.9995 **

Try adding Y[-2] to model.
```

Good suggestion--will do! Now here are some additional useful tests to see if ARIMA corrections would be helpful in the model--i.e., lags of the dependent variable and/or lags of the errors. Here, S[-1] is the dependent variable lagged by one period, and _AUTO[-1] refers to the errors lagged by one period, etc.

```Dynamics test battery
------------------------------------------------------------------------
S[- 1] ChiSq( 1)=0.83 Percentile=0.6373
S[- 2] 0.98 0.6777
S[- 3] 0.83 0.6382
S[- 4] 2.01 0.8433
S[- 8] 2.85 0.9084

_AUTO[- 1] ChiSq( 1)=0.59 Percentile=0.5573
_AUTO[- 2] 1.03 0.6904
_AUTO[- 3] 2.84 0.9082
_AUTO[- 4] 1.96 0.8382
_AUTO[- 8] 4.21 0.9599 *

Dynamics tests successful.

```

OK, nothing else surprising turned up. Note, however, that no tests were performed to determine the relative stationarity of the variables--for example, the fact that P ought to be differenced to be comparable to the other variables does not register.

Let's now try adding Y[-2], as recommended above, before removing any insignificant variables:

```Forecast Model for S
Regression(7 regressors, 0 lagged errors)

Term          Coefficient  Std. Error  t-Statistic  Significance
---------------------------------------------------------------------
_CONST         3.763999     0.662905     5.678036     0.999990
ADV[-1]        0.008084     0.002304     3.508493     0.998017
CADV[-1]      -0.000561     0.000503    -1.114696     0.722990 +++
CC[-1]         0.001823     0.000932     1.956479     0.936779 +++
P[-1]         -0.005059     0.012307    -0.411072     0.315003 +++
Y[-1]         -0.001514     0.000891    -1.700453     0.896859 +++
Y[-2]          0.005299     0.000972     5.453561     0.999982

Marked regressors are insignificant.

Standard Diagnostics
----------------------------------------------------------------
Sample size 29                   Number of parameters 7
Mean 5.481                       Standard deviation 0.5907
R-square 0.7962                  Adjusted R-square 0.7407
Durbin-Watson 2.306              ** Ljung-Box(12)=32.29 P=0.9988
Forecast error 0.3008            BIC 0.3934 (Best so far)
MAPE 0.03753                     RMSE 0.262
MAD 0.198

Rolling simulation results
Cumulative        Cumulative
H   N       MAD      Average   MAPE    Average
---------------------------------------------------------------------
1   8      0.124      0.124    0.022    0.022
2   7      0.122      0.123    0.022    0.022
3   6      0.094      0.115    0.016    0.020
4   5      0.100      0.112    0.017    0.020
5   4      0.096      0.110    0.016    0.019
6   3      0.111      0.110    0.019    0.019
7   2      0.095      0.109    0.017    0.019
8   1      0.081      0.108    0.014    0.019
```

Now we'll try removing the insignificant variables one at a time--i.e., manually perform backward stepwise regression from this point:

```Forecast Model for S
Regression(6 regressors, 0 lagged errors)

Term          Coefficient  Std. Error  t-Statistic  Significance
---------------------------------------------------------------------
_CONST         3.560179     0.431978     8.241575     1.000000
ADV[-1]        0.008105     0.002262     3.583611     0.998428
CADV[-1]      -0.000550     0.000493    -1.113756     0.723112 +++
CC[-1]         0.001932     0.000878     2.200628     0.961929
Y[-1]         -0.001683     0.000776    -2.169706     0.959392
Y[-2]          0.005068     0.000777     6.520354     0.999999

Marked regressors are insignificant.

Standard Diagnostics
----------------------------------------------------------------
Sample size 29                   Number of parameters 6
Mean 5.481                       Standard deviation 0.5907
R-square 0.7947                  Adjusted R-square 0.75
Durbin-Watson 2.281              ** Ljung-Box(13)=29.76 P=0.9949
Forecast error 0.2953            BIC 0.3726 (Best so far)
MAPE 0.03892                     RMSE 0.263
MAD 0.2052

Rolling simulation results
Cumulative        Cumulative
H   N       MAD      Average   MAPE    Average
---------------------------------------------------------------------
1   8      0.145      0.145    0.026    0.026
2   7      0.136      0.141    0.025    0.026
3   6      0.101      0.129    0.018    0.023
4   5      0.102      0.124    0.018    0.022
5   4      0.087      0.119    0.015    0.021
6   3      0.094      0.117    0.016    0.021
7   2      0.085      0.115    0.016    0.021
8   1      0.010      0.112    0.002    0.020

Forecast Model for S
Regression(5 regressors, 0 lagged errors)

Term          Coefficient  Std. Error  t-Statistic  Significance
---------------------------------------------------------------------
_CONST         3.449871     0.422572     8.163984     1.000000
ADV[-1]        0.008161     0.002272     3.591697     0.998532
CC[-1]         0.001916     0.000882     2.171779     0.960020
Y[-1]         -0.001833     0.000768    -2.387589     0.974816
Y[-2]          0.005118     0.000780     6.563664     0.999999

Standard Diagnostics
---------------------------------------------------------------
Sample size 29                   Number of parameters 5
Mean 5.481                       Standard deviation 0.5907
R-square 0.7836                  Adjusted R-square 0.7475
Durbin-Watson 2.196              * Ljung-Box(14)=28.72 P=0.9886
Forecast error 0.2968            BIC 0.3609 (Best so far)
MAPE 0.04002                     RMSE 0.27
MAD 0.2117

Rolling simulation results
Cumulative        Cumulative
H   N       MAD      Average   MAPE    Average
---------------------------------------------------------------------
1   8      0.146      0.146    0.026    0.026
2   7      0.148      0.147    0.026    0.026
3   6      0.135      0.143    0.024    0.025
4   5      0.125      0.140    0.022    0.025
5   4      0.138      0.140    0.024    0.025
6   3      0.173      0.143    0.030    0.025
7   2      0.177      0.145    0.033    0.025
8   1      0.104      0.143    0.017    0.025

Variable specification test battery
------------------------------------------------------------------------
CADV[-1]                          ChiSq( 1)=1.18  Percentile=0.7220
P[-1]                                       0.12             0.2711
_CONST[-1]                                  0.05             0.1787
ADV[-2]                                     0.58             0.5527
CC[-2]                                      0.06             0.1856
Y[-3]                                       0.38             0.4650
_TREND                                      1.62             0.7968

Variable specification tests successful.

Dynamics test battery
------------------------------------------------------------------------
S[- 1]                            ChiSq( 1)=0.24  Percentile=0.3754
S[- 2]                                      0.34             0.4379
S[- 3]                                      0.67             0.5855
S[- 4]                                      0.20             0.3490
S[- 8]                                      1.09             0.7036

_AUTO[- 1]                        ChiSq( 1)=0.52  Percentile=0.5293
_AUTO[- 2]                                  5.45             0.9805 *
_AUTO[- 3]                                  0.61             0.5645
_AUTO[- 4]                                  1.87             0.8283
_AUTO[- 8]                                  1.14             0.7146

Dynamics tests successful.
```

Done! Notice that this is the model that we obtained by automatic backward stepwise regression in Statgraphics, starting with 2 lags of all variables. The nice thing about Forecast Pro's analysis is that it tested for lags of all variables and lags of the errors which we didn't use in the original model (although it didn't test for some other things, like the usefulness of differencing any of the variables).

For comparison, here are the results of fitting and validating the same model in the GLM procedure in Statgraphics. The estimated coefficients are of course the same. Also, the validation period statistics seem to agree with those in the first row of the "rolling simulation results" table, as we had guessed earlier. (MAE=MAD=0.146, MAPE=2.6%)

```--------------------------------------------------------------------------------------------
Standard
Parameter                  Estimate         Error     Lower Limit    Upper Limit      V.I.F.
--------------------------------------------------------------------------------------------
CONSTANT                    3.44987       0.422572        2.57772        4.32202
LAG(Y,1)                 -0.0018332    0.000767803    -0.00341787   -0.000248527     1.08898
LAG(Y,2)                 0.00511806    0.000779756     0.00350872      0.0067274     1.08911
LAG(CC,1)                0.00191563    0.000882057   0.0000951528     0.00373611     1.15488
LAG(ADV,1)               0.00816141      0.0022723      0.0034716      0.0128512     1.21309
--------------------------------------------------------------------------------------------

R-Squared = 78.3605 percent
R-Squared (adjusted for d.f.) = 74.7539 percent
Standard Error of Est. = 0.29679
Mean absolute error = 0.211697
Durbin-Watson statistic = 2.19583

Residual Analysis
---------------------------------
Estimation       Validation
n     29               8
MSE   0.088084         0.0261489
MAE   0.211697         0.146084
MAPE  4.00221          2.60524
ME    3.98149E-16      -0.0704765
MPE   -0.261927        -1.41513
```