We will be interested in forecasting Rt as a function of lagged information Zt-1. It is logical to start with a linear regression model. Later we discuss the generalization of this linear model using nonparametric density estimation techniques.
The linear regression model is with a single explanatory variable:
Rt = d0(Z0) + d1(Z1,t-1) + residualt [1]
where d0, d1 are regression coefficients.
This is often presented as
Rt = d0 + d1(Z1,t-1) + residualt [2]
The d0 is interpreted as the intercept and the d1 as the slope coefficient. Equation [1] and [2] are identical. Remember we have a single explanatory variable. It turns out, in the standard implementation, of regression, that the Z contains two variables: Z1 might be an interest rate level and Z0 is a constant vector of ones. In a spread sheet, one can think of the first column as the returns, say from January 1970 through December 1994, the second column has a "1" in every row, the third column is the interest rate from December 1969 through to November 1994 (it is lagged). Notice I have no time subscript on Z0 because it is just a column of ones.
Suppose we ran the following regression:
Rt = d0(Z0) + residualt [3]
This is a regression on the column of ones. What is d0 in this case? It is just the average return. It is also an equally weighted average return. According to regression theory, the coefficient is
d0 = INV(Z'Z)Z'R [4]
where Z is just a column of ones. This can be broken down into two parts.
1
INV(Z'Z) = INV(#obs) = ----
#obs
Z'R = SUM(returns)
Hence, it is obvious that the d0 is the average return, i.e. the sum divided by the number of observations!
Why are we focussing on this trivial regression? Well, the traditional style of asset management uses average returns (as well as variances and covariances) the mean-variance optimization. Sometimes, moving-window averages (MA) are used, say the last five years. In this case, Z0 would have zeros in the initial rows and "1"s in the last 60 rows (assuming monthly data is used). Sometimes, exponentially weighted moving averages (EWEMA) are used. Again, we can set the Z0 to handle this.
What is the R-square of this regression in [3]. Remember, the definition of R-square is the variance of the regression fitted values divided by the variance of the dependent variable. An R-square of 1.0 or 100% implies that the fitted values exactly coincide with the realized returns.
Var(fitted) Var(d0)
R-square = ------------ = -------- = 0
Var(R) Var(R)
The R-square is zero. Why? The variance of a constant, d0, is exactly zero. Remember definition of variance. It is the squared deviation of the variable from its average. Since the variable is always equal to one, there is no variance.
Another way of looking at this exercise is to note that those using this style of model are assuming that no other Z variable influences future returns. In fact, in running this special regression (and, indeed, you do not need to run a regression, you simply need to push the average button), they are assuming the d1 and other coefficients are exactly equal to zero.
Using the average as a forecast forces the asset manager implement a strategy with a zero R-square. This is not necessarily a desirable strategy. Indeed, it implies that no other information affects expected returns. It implies that expected returns are constant (at least over the 60-month window of the MA).
Using a more general regression model, we can incorporate predictability. We can execute statistical tests to ensure that the predictability is genuine rather than an artifact of data snooping. The research protocol details procedures that avoid potential misspecification.
This problem occurs when the variance of the error term changes through time or across a cross-section of data. As a result, the least squares estimator will be unbiased but inefficient, i.e. you get the right point estimates for the parameters but the variances of the estimated parameters are not minimum variances.
The correction for heteroskedasticity is straight forward and involves weighted least squares. There are a number of approaches to this correction. Basically, each variable is transformed (often by dividing by a variance measure), and the regression is reestimated.
Autocorrelation or serial correlation is commonplace in time series regressions. Autocorrelation implies that the errors in previous periods carry over to the present period. Like heteroskedasticity, an autocorrelated regression will have unbiased but inefficient estimators. In fact, the variance of the regression coefficients will by underestimated leading one to falsely believe some parameters are statistically significant. Furthermore, if the model is used to forecast, the predictions will be inefficient (i.e. unnecessarily large sampling variances because we are not using important information -- in the previous error terms).
The solution strategy is to transform the regression variables. Suppose we have the following bivariate regression:
Yt = d0 + d1 Xt + resid1t
and suppose the error follows a first-order autoregressive process:
ut = r0 + r1(ut-1) + resid2t
where r1 is less than one and resid2 is normally distributed and has constant variance. Transform the regression by multiplying the lag of Y by r1 and taking the first difference:
Yt - r1Yt-1 = d0(1 - r1) + d1(Xt - r1Xt-1) + (resid1t - r1resid1t-1)
or
Y* t = d0* + d1X* t + resid1*t
Now the model is properly specified and the estimation can proceed as usual.
One of the basic assumptions was that no linear dependence exists between any of the independent variables. The reason that this is important is that we need to invert the matrix, X'X to get the least squares estimator of the coefficients. The problem of multicollinearity does not arise when we have two independent variables that are exactly the same -- because the computer cannot estimate the coefficients. Multicollinearity arises when some of the independent variables are close to being the same.
The main consequence of multicollinearity is that the precision of the estimates deteriorate. It becomes very difficult to determine the relative influences of the independent variables. Investigators may be falsely led to drop variables that are insignificantly different from zero. Furthermore, the coefficient estimates could be sensitive to the block of data used, i.e. the first subperiod could deliver parameter estimates that are different from the second subperiod.
The usual solution strategy is to calculate the correlation matrix of all the independent variables. If two variables have a high degrees of correlation, judgement should be used to determine which one to drop from the regression.
Another possibility is orthogonalization. Suppose Z1 and Z2 are correlated but you do not want to drop one of the variables. One can regress Z2 on Z1 and save the residuals. A regression could then be run on Z1 and these residuals. The interpretation of the residuals is that they are the part of Z2 that is uncorrelated with Z1.
This is a common specification error. In general, the parameter estimates will be biased as a result of omitting important variables. The only case where the bias disappears is if the omitted variable is uncorrelated with the included variables -- this case, however, is unlikely. If the omitted variable has a positive covariance with variable Xi, then the parameter estimate di will be biased upward. The omitted variable problem will also affect efficiency. Inference about the coefficients will be wrong because the residual variance is biased upward.
The errors in the variables problem arises when one or more of the independent variables are measured with error. In this case, the parameter estimates will be biased and the degree of bias depends on the variance of the measurement error.
The usual solution strategy is to opt for an instrumental variables estimator rather than ordinary least squares. The properties of this estimator are beyond the scope of this note.
Conditional heteroskedasticity occurs when the variance of the error term changes through time. Many financial time-series exhibit heteroskedasticity. Some examples are interest rates and volatilities of stock returns. Using ordinary least squares will deliver the correct estimates for the coefficients but the standard errors and t-statistics will be incorrect. If you are drawing inferences about the coefficients, then you must have the correct standard errors.
It is important to check for heteroskedasticity and to correct for it when it exists. Unfortunately, most computer packages do not have corrections for conditional heteroskedasticity. Most software packages (like Statgraphics) do not deliver corrected standard errors. However, the Fuqua version of Statgraphics has been modified to allow us to get heteroskedasticity consistent standard errors.
The best method of detection involves saving the residuals from a regression and plotting the residuals against time. If there is an obvious pattern, then it is likely that there is a conditional heteroskedasticity problem. Here are two more sophisticated tests.
This is a test for Autoregressive Conditional Heteroskedasticity or ARCH. There is no statistics package available at this time that corrects for this type of heteroskedasticity. But you should be aware of this form -- since many financial time-series exhibit ARCH disturbances.
This is a popular test for Conditional Heteroskedasticity. The steps are as follows:
There are three important references for corrections. White (1980, Econometrica) provides the most widely used correction (it is implemented in Statgraphics, SAS, and RATS). Hansen (1982, Econometrica) provides a more general correction whereby White is a special case. Newey and West (1987, Econometrica) provide an alternative to Hansen (in some situations's Hansen's correction will not work, however, you will know that it has not worked because the estimation routine will fail). Finally, Andrews (1991, Econometrica) provides the state-of-the-art correction. My recent research employs the Andrews' correction.