The mean (constant, intercept-only) model for forecasting

Simple forecasting models

Statistics review and the simplest forecasting model: the sample mean (pdf)
Notes on the random walk model (pdf)
Mean (constant) model
Linear trend model
Random walk model
Geometric random walk model
Three types of forecasts: estimation, validation, and the future

Mean (constant) model

The most elementary case of statistical forecasting is that of predicting a variable whose values are independently and identically randomly distributed ("i.i.d"). For example, consider a variable X1 for which a sample of size 30 has been observed and whose values are plotted below versus the row number in the data file:

(The file containing this data and the model below can be found here.) Suppose that a 31^st value is going to be observed at some point in the future, and we would like to make a forecast for it in advance. How should we do it? If the values of X1 are independently and identically distributed, then the most natural forecast to use is the sample mean of the historical data. Why the sample mean? Because by definition it is an unbiased predictor and also it minimizes the mean squared forecasting error regardless of the shape of the probability distribution. The sample mean has the property that it is the value around which the sum of squared deviations of the sample data is minimized. This is not obvious, but it is easily proved with a little calculus. This very simple forecasting model will be called the "mean model" or "constant model." The sample mean of X1 is 38.5, so according to the mean model, we should predict that its 31^st value (and all future values) will be 38.5.

Now, how accurate is this estimate of the mean, based on the limited sample of data? That is measured by the standard error of the mean, which is the estimated standard deviation of the error in the sample mean. The standard error of the mean is equal to the sample standard deviation divided by the square root of the sample size. Here the sample standard deviation is 12.0, and the sample size is 30, so the standard error of the mean is 12/SQRT(30) = 2.2. By virtue of the Central Limit Theorem, we can assume the error in the estimating the mean to be approximately normally distributed for a sample this large, regardless of the distribution of the random variable itself, and on this basis we can construct confidence intervals for the mean. In general, a confidence interval for a parameter estimate is equal to its point value plus or minus an appropriate number of standard errors. The appropriate number is the so-called critical t-value and it is determined by the desired confidence level and by the number of degrees of freedom for error (the sample size minus the number of parameters estimated from it). The T.INV.2T function in Excel can be used to compute critical t-values for 2-tailed confidence intervals. For a 95% confidence interval for the mean, the critical t-value is T.INV.2T(0.05, n-1) where n is the sample size. Here the sample size is 30, so the critical t-value for a 95% confidence interval is T.INV.2T(0.05, 29), which is 2.05. A 95% confidence interval for the mean is therefore 38.5 plus or minus 2.05 times 2.2, which is [34.0, 43.0].

For samples of this size or larger, the critical t-value for a 95% confidence interval is always very close to 2 (it approaches 1.96 in the limit as the sample size goes to infinity), so a 95% confidence interval is roughly "plus or minus 2 standard errors" under a wide range of conditions. (Return to top of page.)

Next, how accurate is the estimated mean as a forecast for the next value of X1 that will be observed? In general, when forecasts are being made for future values of random variables, there are two sources of error: (i) intrinsically unexplainable variations ("noise") in the data, and (ii) errors in the parameter estimates upon which the forecasts are based. For the mean model, the magnitudes of these two sources of error are measured by the sample standard deviation and the standard error of the mean respectively, and together they determine the standard error of the forecast, which is the estimated standard deviation of the error in the forecast. (The error in the forecast is defined as the observed value minus the forecast.) Specifically, the standard error of a forecast from the mean model is equal to the square root of {the square of the sample standard deviation plus the square of the standard error of the mean}. Because the standard error of the mean is just the sample standard deviation divided by the square root of n, it follows that, for the mean model, the standard error of the forecast is equal to the sample standard deviation multiplied by the square root of 1+1/n. In the case of X1, the standard error of the forecast is therefore 12.0 x SQRT(1+1/30) = 12.2, which is only slightly larger than the sample standard deviation.

The standard error of the forecast can used to determine confidence intervals for forecasts in the usual way: a confidence interval for a forecast is the point forecast plus-or-minus the appropriate critical t-value times the standard error of the forecast. The critical t-value is the same as the one used to calculate a confidence interval for the mean. So, the 95% confidence interval for the forecast in this model is 38.5 plus-or-minus 2.05 times 12.2, which is [13.5, 63.5] Here we must be a bit careful, though. The formula for calculating a symmetric 2-tailed confidence interval is based on an assumption of normally distributed errors. But the error in a forecast based on the mean model is normally distributed only if the variable itself is normally distributed. (We cannot automatically appeal to the Central Limit Theorem here, because what is being predicted is a single observation, not the mean of many observations.) So, to be careful, we ought to look at the probability distribution of the forecast errors, as revealed by charts and statistics that test for normality of the distribution.

The accompanying Excel file in which this analysis was carried out with RegressIt includes the following table and chart which show the forecasts for X1 produced by the model, together with the 95% confidence interval around the forecast for observation #31. (It also includes various plots and statistical tests of the errors.)

The forecast and the confidence interval look somewhat reasonable (if a little boring), although they rest on strong assumptions, namely that the values of X1 are independently and identically normally distributed. It would be good to look deeper into these assumptions, and this will be done in the section on the linear trend model.

A few words about confidence levels: It is conventional to use 95% as the default confidence level when reporting confidence intervals for parameter estimates or forecasts, although there is no magical signficance attached to this value. It is merely an arbitrary standard of "very confident but not certain". The number 0.95 is close to 1 but not so close as to be visually indistinguishable, and a 1-out-of-20 chance of a surprise is not too tiny to think carefully about. (Most persons have trouble in appreciating the relative importance of very small probabilities, though, such as a 1-out-of-100 or 1-out-of-1000 chance.) Also, 2 is a nice round number for a critical t-value. But in some situations it may be helpful to compute and plot confidence intervals for some other confidence level, say, 90% or 99% or only 50%, particularly when the setting is one of forecasting rather than inference about parameter values. The critical t-value for a 50% confidence interval is approximately 2/3, so a 50% confidence interval is one-third the width of a 95% confidence interval. Here's what the forecast chart for the mean model for X1 looks like with 50% confidence limits:

The nice thing about a 50% confidence interval is that it is a "coin flip" as to whether the true value will fall inside or outside of it, which is extremely easy to think about. Also, confidence intervals for forecasts at high levels of confidence tend to be so wide as to not be very informative on a plot, particularly when sample sizes are small. The 95% confidence interval for a forecast from the mean model is often approximately the range of the sample data, as in the first chart above. 50% intervals are often more helpful as visual reference points, particularly when comparing the degree of overlap between forecasts produced by different models. In general, the consequences of error in the decision problem at hand, as well as the expectations of the audience, should be taken into account when choosing a confidence level to emphasize. (Return to top of page.)

The mean model may seem overly simplistic (always expect the average!), but it is actually the foundation of the more sophisticated models that are mostly commonly used. It is the starting point for regression analysis: the forecasting equation for a regression model includes a constant term plus multiples of one or more other variables, and fitting a regression model can be viewed as a process of estimating several means simultaneously from the same data, namely the "mean effects" of the predictor variables as well as the overall mean. In fact, in the special case of regression models that contain only dummy variables (such as ANOVA models), literally all that is going on is the estimation of the mean of the dependent variable under different conditions, under an assumption that the errors have the same standard deviation under all those conditions. Believe it or not, if you understand the mathematics of parameter estimation, calculation of forecasts and confidence intervals, and testing goodness of fit for the mean model, you are almost halfway to understanding how to do the same things for regression models.

The mean model is also the starting point for constructing forecasting models for time series data, including random walk and ARIMA models. If we can find some mathematical transformation (e.g., differencing, logging, deflating, etc.) that converts the original time series into a sequence of values that are independently and identically distributed, we can use the mean model to obtain forecasts and confidence limits for the transformed series, and then reverse the transformation to obtain corresponding forecasts and confidence limits for the original series. (Return to top of page.)

For a much more detailed discussion of this topic, see the handout: “Review of basic statistics and the simplest forecasting model: the mean model”. For a concise summary of the math, see the page on mathematics of simple regression.

Go to next topic: linear trend model.