Time Series Forecasting in Statgraphics

The Forecasting procedure in the Time Series module is one of the most innovative features of Statgraphics. Here are some basic instructions on how to use it effectively.

Data Input Panel
Model Specification Panel
Reports and Graphs
Where's the residual probability plot?
Model Comparisons
Transformed versus untransformed units
If you are fitting a regression model....
Bugs?

Data Input Panel: The Forecasting procedure is found under Special/Time Series Analysis on the main menu. When you first enter this procedure, you will be presented with a Data Input panel asking you to specify the Sampling Interval (e.g, months, days), the Starting Date (e.g, 1/81 for January 1981), and the Seasonality (12 for monthly data, 4 for quarterly data, 1 for annual or non-seasonal data). You can also specify the number of forecasts you want to generate beyond the end of the sample, and the number of observations in the original sample to withhold for validation--i.e., out-of-sample testing. If you have sufficient data, it is always a good idea to withold a sizable chunk of the data (at least one year's worth or 20% of the original sample, whichever is larger) for validation. (Return to top of page.)

Model Specification Panel: After specifying the data input options as described above, you will initially get a single text report, namely the Analysis Summary, which in this case gives a description of the model fitted (what model?) and its estimated parameters. Now the fun begins. Click the right mouse button and you will get a full-screen Model Specification panel providing options for choosing a forecasting model. You can actually analyze and compare up to five models at once. The models are code-lettered A through E, and you switch between models using the A-E "radio buttons" in the upper left of the screen. (When you first enter the procedure, you are looking at model A, whose default specification is random-walk.) One of the great things about this procedure is that it lets you specify a variety of model features in one place--e.g., deflation, seasonal adjustment, logarithm or power transformations, exponential smoothing or ARIMA terms, regressors, etc. (By the end of the term, you will know what all these model features are and how to use them.) Some features may be grayed-out depending on the model type which is selected: for example, the order of differencing and AR/MA parameters are only activated when an ARIMA model is specified. Regressors are an option only if a mean or trend-line or ARIMA model is specified. Seasonal adjustment or seasonal differencing are options only if you set the "Seasonality" to something other than 1 on the Data Input panel. Choose the features you wish use to use in one or more models, and then click OK to continue. (Return to top of page.)

Reports and Graphs: When the Analysis Summary report reappears, use the buttons on the Analysis Window Toolbar to select the full range of reports and graphs that are available. (Click the second and third buttons from the left on the toolbar, etc.) Most of the graphs and text reports refer only to the "current" model (A, B, or whatever), i.e., the one that was last displayed on the Model Specification panel. To see the reports and graphs for a different model, click the right mouse button again to return to the Model Specification panel, click the radio button for the desired model, and hit OK to redraw the reports and graphs. When looking at the graphs for a given model, you should ask: do the forecasts and fitted values look reasonable? Does the model seem to be satisfactorily tracking the qualitative features of the data (trends, seasonality, etc.)? The text reports provide the numerical details and statistical tests to corroborate what is suggested by the graphs. Do the diagnostic tests indicate that the model's underlying assumptions (residual independence, homoscedasticity, etc.) are satisfied? (Return to top of page.)

Where's the residual probability plot? When you look at the array of residual plots initially generated by the forecasting procedure, there's no sign of a residual probability plot. (This plot is used to verify the assumption that the residuals are normally distributed, which justifies the use of least-squares parameter estimation and the calculation of confidence intervals. Ideally the points on the probability plot lie along a diagonal line.) There is a normal probability plot, it's just not obvious where to find it. Maximize the residual time series plot (the third plot from the top), and then click the right mouse button to get the menu of options. The options include horizontal and vertical probability plots, as well as the default time series plot of the residuals. This is another reminder that Statgraphics has more features than are apparent on the menus: you often have to test the right-mouse-button options to found out the full range of features available for each procedure. (Return to top of page.)

Model Comparisons: One text report that is particularly interesting (and unique to Statgraphics) is the Model Comparison report (the third text report from the top) which gives side-by-side statistical comparisons of all models fitted, both in-sample and out-of-sample, plus a summary table of residual diagnostic test results. This report is a very powerful tool for comparing models, although you should be careful not to get too caught up in hair-splitting differences. Look to see which models are best in-sample and which are best out-of-sample in terms of mean squared error, mean absolute percentage error, etc. (The out-of-sample performance is the real proof of the pudding, of course, but you need to be circumspect if the holdout sample is small.) Also, compare the out-of-sample error statistics to the in-sample statistics within models: if they are not similar, the data may be over-fitted. Finally, see whether the diagnostic tests for residual randomness and stationarity tests are satisfactory (ideally an OK or * symbol, indicating no statistically significant violations of model assumptions). If many red flags appear (** or *** symbols), you probably should take a closer look at the Residual Randomness Tests report and the residual plots themselves. In general, the smaller and more random the errors, the better, but you should not always slavishly pick the model that is "best in the rankings": there are other factors to weigh as well. Does the model make sense from a business perspective? Is it easy to use and understand? Do the diagnostic tests suggest that there might be an even better model out there somewhere? Do the test results appear to be correct? (See the warning below about bugs.) (Return to top of page.)

Transformed versus untransformed units: If a nonlinear transformation such as deflation or logging was used in the model, the residual plots and diagnostics are given in transformed (e.g., deflated or logged) units, whereas the forecast plots and the error summary statistics in the Model Comparison report are given in untransformed (i.e. original) units. The reason for this is that least-squares estimation takes place in transformed units, hence it is there that tests of residual autocorrelation, heteroscedasticity, etc., should be applied. The forecasts and error summary statistics, on the other hand, need to be comparable between models that may have used different transformations, so they are given in original units to ensure a common currency.(Return to top of page.)

If you are fitting a regression model: There are two ways to fit multiple regression models in Statgraphics: either use the regular multiple regression procedure that is found under Relate/Multiple_Regression on the main menu, or use the regression option that is part of the the Time Series/Forecasting procedure. Each has its advantages and disadvantages. The advantages of the "regular" multiple regression procedure are that it is capable of doing stepwise variable selection and it is unperturbed by missing values within the sample. The disadvantages are that it does not provide all the residual diagnostics (e.g., autocorrelation plots and tests) that you would like to have when working with time series data, and it doesn't do out-of-sample validation (a big minus), autocorrelated error structures, or between-model comparisons. The advantages of the "time series" regression option are that it provides time-series diagnostics, out-of-sample validation, autocorrelated error structures, and side-by-side comparisons with other forecasting models. The disadvantage is that it doesn't do any automatic variable selection (not a big loss) and it doesn't tolerate missing values within the estimation period (a nuisance, but not fatal).

If you are going to use the regression procedure in the Time Series/Forecasting module, then you need to have observations of all the independent variables available for any periods in which forecasts are to be generated. For example, if you have 100 observations (rows) of the dependent variable, and you wish to generate regression forecasts for the next 5 periods, then you must have 105 observations of all the independent variables, otherwise you will get an error message saying "Can't have missing values in the estimation or forecast period." If these forecasts are of a what-if nature--i.e., if you want to compute what the forecasts would be if the independent variables took on hypothetical values--then you should go to the data spreadsheet and enter the desired future values of the independent variables in the next few rows before running the Forecasting procedure.

Also, as noted above, the time series regression procedure does not allow missing values of any of the variables within the estimation period, either. (There is no good reason for this, and I hope it will be fixed in future releases.) So, if some of your variables contain missing values--e.g., if they do not all begin and end at the same points in time--you may need to delete the offending rows at either end. (If the missing observations are in the middle of the sample, you probably should not delete their rows: it might be better to either replace them by their mean values or by interpolations between the points on either side.) (Return to top of page.)

Bugs? Version 5 of Statgraphics has fixed some long-standing bugs (e.g., concerning the residual cross-correlation plot and ARIMA models with regressors) and has also added some new features, including the addition of a "constant" (i.e., drift or trend) option to the random walk model. Unfortunately, there is a bug in the random-walk-with drift model: if you try to hold out data for validation, the drift term is still estimated from the entire sample. Thus, if you want to fit a random-walk-with-drift model with hold-out data, you still need to specify it as an ARIMA(0,1,0)+constant model. (Return to top of page.)

Last update March 18, 2002