What's the bottom line? How to compare models


RegressIt: free Excel add-in for linear regression and multivariate data analysis


After fitting a number of different regression or time series forecasting models to a given data set, you have many criteria by which they can be compared:

With so many plots and statistics and considerations to worry about, it's sometimes hard to know which comparisons are most important. What's the real bottom line?


If there is any one statistic that normally takes precedence over the others, it is the root mean squared error, which is the square root of the mean squared error. When it is adjusted for the degrees of freedom for error (sample size minus number of model coefficients), it is known as the standard error of the regression or standard error of the estimate in regression analysis or as the estimated white noise standard deviation in ARIMA analysis. This is the statistic whose value is minimized during the parameter estimation process, and it is the statistic that determines the width of the confidence intervals for predictions. A 95% confidence interval for a forecast is approximately equal to the point forecast "plus or minus 2 standard errors"--i.e., plus or minus 2 times the standard error of the regression.

Having planted this stake in the ground, there are several observations and qualifications that need to be made:


So... the bottom line is that you should put the most weight on the error measures in the estimation period--most often the RMSE (or standard error of the regression, which is RMSE adjusted for the relative complexity of the model), but sometimes MAE or MAPE--when comparing among models. (If your software is capable of computing them, you may also want to look at Cp, AIC or BIC, which more heavily penalize model complexity.) But you should keep an eye on the validation-period results, residual diagnostic tests, and qualitative considerations such as the intuitive reasonableness and simplicity of your model. The residual diagnostic tests are not the bottom line--you should never choose Model A over Model B merely because model A got more "OK's" on its residual tests. (What would you rather have: smaller errors or more random-looking errors?) A model which fails some of the residual tests or reality checks in only a minor way is probably subject to further improvement, whereas it is the model which flunks such tests in a major way that cannot be trusted.

The validation-period results are not necessarily the last word either, because of the issue of sample size: if Model A is slightly better in a validation period of size 10 while Model B is much better over an estimation period of size 40, I would study the data closely to try to ascertain whether Model A merely "got lucky" in the validation period.

Finally, remember to K.I.S.S. (keep it simple...) If two models are generally similar in terms of their error statistics and other diagnostics, you should prefer the one that is simpler and/or easier to understand. The simpler model is likely to be closer to the truth, and it will usually be more easily accepted by others.


Related topics:

·         Introduction to linear regression

·         What to look for in regression output

·         Testing the assumptions of linear regression

·         Additional notes on regression analysis