Global Financial Management

Quantitative Performance Evaluation

Copyright 1997 by Campbell R. Harvey and Stephen Gray. All rights reserved. No part of this lecture may be reproduced without the permission of the authors.

Latest Revision: January 4, 1997

9.0  Overview

This class considers the efficiency of capital markets and how to measure the performance of individual stocks and mutual funds over time. Three notions of efficiency are introduced based on the type of information being used to forecast returns. A number of technical trading rules are introduced and their success is evaluated. Three traditional measures of portfolio performance are reviewed, and some more recent measures of performance are introduced. The relative strengths and weaknesses of the various measures are compared and contrasted.

9.1  Objectives

After completing this class, you should be able to:

Understand the different types of market efficiency.

9.2  Efficient Capital Markets

According to Fama: "An efficient capital market is a market that is efficient in processing information. The prices of securities at any time are based on correct evaluation of all information available at that time. In an efficient capital market, prices fully reflect available information."

We will be concerned with how well the market absorbs new information. In order to tell if the market adjusts to information, you need a pricing model. For example, in the last lecture, we considered the performance a mutual fund: Franklin Income. We assumed that the beta of Franklin was 1.00. We then showed that Franklin's performance was below the market's risk adjusted expectation. Suppose that we estimated the beta to be 0.5. Then Franklin made a very large abnormal return. Is this evidence of market efficiency? Franklin is just a portfolio of NYSE stocks. How could the fund managers consistently pick the right stocks over 15 years? We cannot conclusively state that the market is inefficient because:

As a result, we must always condition our statements about market efficiency in terms of the model that we are using to test efficiency.

Hence, any test of efficiency, is a joint test of efficiency and the pricing model. This means that given a certain pricing model, you might find evidence against market efficiency. But another hypothesis is that the market if efficient and you are using the wrong pricing model. This is a common dilemma in testing joint hypotheses.

9.3  Three Versions of Market Efficiency

As previously defined, in an efficient market, prices are set to fully reflect all relevant information. There are three types of efficiency which are based on different notions of what type of information is understood to be relevant.

9.4  Weak-Form Efficiency

No investor can earn abnormal returns by developing trading rules based on historical price or return information. Information from past returns is available at a very low cost. It makes sense that these past returns cannot be used to forecast future returns. Most people would agree that weak-form efficiency holds.

The investment strategies of charting and technical analysis are based on the premise of weak-form inefficiency. These techniques involve searching for patterns or regularities in past prices n order to forecast future prices. The process begins by collecting a past series of prices such as in Figure 1, which shows the high, low and closing prices for IBM over the recent past.

One strand of technical analysis involves identifying common patterns in past prices. Figure 2 shows a classic head and shoulders pattern. The argument is as follows: After some random movements, the stock begins to gather strength. This results in some profit-taking and the price settles back. There is further strength in the stock and the price reaches a new high. Further profit-taking reduces the price. After a moderate rally, support for the stock dies away and the price falls. Once the price falls below the dashed line connecting the shoulders, the price is expected to breakout and fall abruptly. These arguments are, of course, complete and utter nonsense.

Another example of a technical trading rule is the Bollinger Band. This is simply a band with upper boundary two standard deviations above a 15-day moving average and lower boundary two standard deviations below a 15-day moving average. When the actual stock price moves outside the band, a breakout from a resistance level is signaled and the stock should be bought or sold depending on whether the upper or lower band is broken.

Figure 1

Figure 2: Head and Shoulders Pattern

9.5  Semistrong-Form Efficiency

No investor can earn abnormal returns from trading rules based on any publicly available information. Examples of publicly available information are: annual reports of companies, investment advisory data such as "Heard on the Street" in the Wall Street Journal, or ticker tape information. All of this information is relatively inexpensive to collect.

9.6  Strong-Form Efficiency

No investor can earn abnormal returns using any information -- private or public. Private information is usually very costly to collect. Almost nobody believes that markets are strongly efficient.

9.7  A Test of Weak-Form Efficiency

If a market exhibits weak-form efficiency, then it is impossible to earn abnormal returns by developing a forecasting model based on past returns. The first thing we have to deal with is the definition of abnormal returns. In the context of the Capital Asset Pricing Model, an abnormal return is defined as the return in excess of what was expected according to the CAPM equation. The CAPM suggests that the only variables that we need in calculating the expected return on security i are: the riskfree rate (a constant), the expected excess return on the market, and the security's beta (a constant). A similar implication holds for the APT. The expected excess return is a function of the factor loading times the factor premium.

or with a more general factor model,

where the are factor premia. There are no additional variables that should be included in this equation.

We will carry out a test of weak-form market efficiency by trying to forecast IBM's stock return with a past IBM stock return. The model we will consider is:

The weak-form hypothesis tells us that the delta1 coefficient should be indistinguishable from zero. Consider the results of the following regressions. I have provided some results for IBM from 1926--1984. The returns data comes from the Center for Research in Security Prices (CRSP).





















































































The results indicate that the delta1 coefficient is two standard errors from zero only in the overall period. In all the sub-periods, the coefficient is indistinguishable from zero. The best R2 from all of the regressions is 1%. It is hard to see a successful trading rule being built from a model that explains only 1% of the variation in returns. The regressions with the IBM stock returns do not provide any evidence against the weak-form version of the efficient market hypothesis.

9.8  Mutual Fund Performance

In a previous lecture we informally evaluated the performance of the Franklin Income fund. Based on our asset pricing model (the CAPM) we can make statements about the risk-adjusted performance of this mutual fund. In this section, we develop some specific measures to evaluate the performance of a mutual fund or a portfolio.

There are two major types of investment funds: closed-end and open-end. After the initial offering of shares, the management of closed-end funds cannot offer additional shares to the public. Investors can redeem their shares to reduce the number of shares outstanding. The management of the open-end fund can offer new shares at any time. The open-end funds are known as mutual funds.

Most investment management groups run several funds. The Franklin Group has about 20 funds ranging from a gold fund to a utilities fund. The financial press reports a number of statistics for each fund which usually include

Figures 3 to 5 illustrate the risk and return characteristics of various mutual fund types. Income funds are the most conservative, with a relatively low market risk (beta) and total risk (standard deviation of return). Maximum capital gain or aggressive growth funds have the highest market risk and a high total risk. Consistent with modern finance theory, higher risk appears to be rewarded with higher return.

We have already informally evaluated the performance of the Franklin Income fund. Now that we have the asset pricing model we can say something about risk adjusted performance. In this section, we will develop three measures to evaluate the performance of a mutual fund or a portfolio.

Figure 3

Figure 4

Figure 5

9.9  Jensen's Performance Measure

The first measure of performance is called the Jensen index or Jensenís alpha. The alpha is simply the intercept from a regression of fund excess returns on market excess returns

The CAPM predicted excess return is:

So the alpha is:

Another way to think of this regression is to note that any expected return can also be written as a realized return, rit or rmt, plus an error term, nit or nmt, distributed with a mean of zero. The CAPM implies the following relationship where alpha is zero by definition

Rearranging terms yields

According to the CAPM, the intercept alpha should be zero. The extent to which a differs from zero measures the extent to which the CAPM is unable to account for the returns of the asset (or mutual fund) I. That is, alpha measures abnormal performance relative to the CAPM.

Thus, the alpha measures the risk adjusted performance of a security. The alpha is the Jensen index. In his 1969 paper, Michael Jensen runs time series regressions for 115 open-end mutual funds from 1945 to 1964. He found that the average alpha was negative, -0.011. I have provided a graph of the distribution of alpha from another study. The distribution is centered around zero. This suggests that mutual funds on average do not do better than the market on a risk adjusted basis. I have also provided a graph of the t-statistics for the alphas. Only 5% of the funds had a t-statistic that is greater than two. This provides support for the semi-strong version of the efficient market hypothesis. The mutual funds have not shown any abnormal performance. If we take into consideration transactions costs, the mutual funds have done worse than a passive market portfolio.

Figure 6

I have also included some charts which update Jensen's study based on work by Campbell Harvey and Ravi Bansal "Performance Evaluation in the Presence of Dynamic Trading Strategies." I provide both a histogram of the alphas as well as the t-statistics. The alpha* is a new measure of performance evaluation which is detailed in our paper. Instead of assuming that some market index like the S&P 500 is efficient, we choose an index which is exactly efficient.

Figure 7

Figure 8

9.10  Stock Selection Ability

Suppose a fund manager is able to pick winners among individual stocks on a risk-adjusted basis. This is known as stock selection ability. Following the strategy of investing in stocks that the manager believes will outperform the market should result in a fund with a relatively constant beta and a positive alpha. Jensenís alpha is ideally suited to measuring this type of superior performance. I have included a diagram of three scenarios of superior market performance. In all of the panels, the excess returns on the fund are plotted against the excess returns on the market. Notice that the regression line in the first panel has a positive intercept. This is the measure of abnormal performance.

Figure 9

9.11  Market Timing Ability

Alternatively, the fund manager may have no ability to pick winners among individual stocks, but may have ability in forecasting market-wide movements (e.g., a shift from a bull to bear market). This strategy is known as market timing. The second panel shows an example of market timing. If the portfolio manager knows when the stock market is going to go up, he will shift into high beta stocks. These stocks will go up even further than the market. If the portfolio manager knows the market is going to go down, he will switch into low beta stocks which will go down less than the market. This accounts for the bowed shape of the data. Notice that the Jensen measure is positive signaling superior performance.

Figure 10

The third panel shows another market timing example. In this case, the manager is so good that there are no negative returns. Perhaps the manager invests in Tbills whenever the market is falling. The manager exhibits good market timing abilities. When the market goes up, the fund goes up by more than the market --indicating a shift into high beta stocks. It is important to notice that the Jensen measure or intercept is nearly zero or negative (when estimated with a linear regression). So even though this manager has exhibited strong market timing abilities, the performance evaluation criteria suggests he is not doing a superior job. This is a major problem with the Jensen measure.

Figure 11

9.12  Sharpe's Measure of Investment Performance

We have already been introduced to the Sharpe measure of investment performance. The term

is known as the Sharpe measure or the Sharpe index. This index uses the Capital Market Line as the benchmark. Below is a graph depicting the expected return--standard deviation space. The Sharpe measure is the slope of the line from rf (rise is E[ri-]-rf over run which is sigmai). The intercept is the riskfree rate, rf . The higher the Sharpe measure is, the better the security looks.

Note that if the mutual fund is positioned on the Capital Market Line, then the fund has neutral performance. This makes sense under the CAPM, because, on the basis of public information alone, any investor can construct a portfolio that is positioned on the CML. The higher the Sharpe measure the larger the excess return per unit of standard deviation. Higher Sharpe measures are associated with superior performance.

Figure 12

Note, however, that the Sharpe measure is an ex-post measure, dealing with actual realized returns. Under the CAPM, no asset or fund can plot above the CML ex-ante, because this would imply that the market portfolio is not efficient (i.e., we could do better than the market by investing in the riskless asset and the fund that plots above the CML) so that

holds for all assets and funds . The same does not hold for ex-post returns. The CAPM does not prevent a particular asset or fund from performing above expectations in a particular period so that it is possible for

9.13  Treynor's Performance Measure

The Treynor index uses the Security Market Line as a benchmark. Remember that the Security Market Line is drawn in return--beta space. The measure is defined as:

The index measures the excess return per unit of risk taken. The index has a geometric interpretation that is similar to the Sharpe index. The Treynor index measures the slope of a line that starts at the riskfree rate and connects with the point that marks the fund's beta and expected return.

The higher the Treynor index the more return the fund is making per unit of risk it is taking. The benchmark line is the security market line which has a slope of E[rm]-Rf. If the fund has a Treynor index equal to E[rm]-Rf., then the fund exhibits neutral performance. A higher Treynor index indicates superior performance.

A plot of the SML and individual mutual funds is provided.

Figure 13

Figure 14

Note that the SML is drawn with excess returns on the y-axis. This does not change the magnitude of the Treynor index. The SML is the dashed line that starts at zero in the excess return axis. Notice that the mutual funds are distributed randomly above and below the SML. This suggests that mutual funds did not systematically outperform the market.

9.14  Graham-Harvey Performance Measures

Graham and Harvey propose two additional measures of performance that are related to the Sharpe measure. Given a mutual fundís volatility, we create a portfolio of the S&P 500 futures and a money market account that exactly replicates the fund volatility. Graham-Harvey measure 1 is the difference between the mutual fund return and the "passive" futures return.

The second measure is the following. Given the volatility of the S&P 500 futures, we lever the mutual fund volatility up or down(by borrowing and lending respectively) to exactly match the S&P 500 volatility. The difference between the fund return and the S&P 500 return is the Graham-Harvey measure 2. A description of the Graham-Harvey method is detailed in a recent Forbes article.

Forbes, June 19, 1995 page 160.
Forbes, June 19, 1995 page 161.

Figure 15

9.15  Mutual Fund Performance: Conclusion

I have presented a number of different ways to evaluate the performance of mutual funds. There is a considerable literature that suggests that after expenses the mutual funds have not outperformed the market on a risk adjusted basis. Perhaps the most interesting issue has to do with the persistence of returns. The following chart is from Jensen (1969).

Number of Consecutive Years Fund Exceeded That of a Passive Portfolio with Equal Market Risk

Proportion of Group with PerformancePerformance Exceeding That of a Passive Portfolio with Equal Market Risk in the Subsequent Year (%)











Notice that it is fairly random whether the funds do better than a passive portfolio constructed to have the same market risk as the fund. The evidence suggests that fund managers are extracting management fees and delivering performance that could be equaled on average by holding a combination of Treasury bills and the market portfolio.

9.16  The Value Line Enigma

While the mutual fund performance supports the idea of markets being semi-strong efficient, the recommendations of the ValueLine Investment Survey provide a counterexample. Value Line ranks stocks on the basis of timeliness. The scale runs from 1 to 5. A rank of 1 means that Value Line expects the stock to appreciate in value. A rank of 5 means that Value Line believes that it is unlikely that there will be any appreciation in the stock value. All of the information that Value Line uses in determining the forecast comes from publicly available information. A graph of Value Line's performance is attached. One of the first studies of the quality of the Value Line recommendations was done by Black (1970). He found that by purchasing a portfolio of the rank 1 stocks and short-selling a portfolio of rank 5 stocks that you could earn 20% per year on a risk adjusted basis. This ran counter to the efficient markets hypothesis. Copeland and Mayers (1982) investigate whether abnormal returns could be made by following Value Lineís recommendations. They find that buying portfolio 1 and selling portfolio 5 earns 6.8% per year. The betas for the portfolio 1 and portfolio 5 are very close to one. Hence, we can consider the 6.8% a risk adjusted return. The return is much lower than 20% that Black (1970) found. In fact, Copeland and Mayers find that the size of the abnormal return has been consistently dropping through time.

Some of the strongest evidence that Copeland and Mayers gather is from the change portfolios. They form portfolios that buy stocks that Value Line has moved up in the rankings and sell stocks that Value Line has moved down in the rankings. There are significant abnormal returns if this strategy is following. Furthermore, these returns tend to persist over time. Finally, the existence of transactions costs cannot explain the size of these abnormal returns.

The Value Line performance appears to run counter to the semi-strong version of the efficient market's hypothesis. But we cannot say for sure that the market is not efficient. This is because we are jointly testing the model (CAPM) and market efficiency. If the model is incorrect, then we may be incorrectly rejecting market efficiency.

However, it is not clear that anything is going on with the Value Line ratings. The mutual funds which follow Value Line strategies have shown no evidence of out performing naive benchmarks.

9.17  Performance of Investment Newsletters

Graham and Harvey study the performance of over 200 investment newsletters which recommend mixes of equities and cash. The following details the performance of all newsletter portfolios.

Figure 16

We also examined newsletters which existed for four years or more in the sample (some only existed for a few months).

Figure 17

One question that we examine is whether equity weights increase before positive market returns. This is the definition of market timing. There is no evidence of timing the positive returns

Figure 18

We also examined whether equity weights decrease before negative market returns. This is also market timing. There is no evidence of timing the negative returns

Figure 19

9.18  Persistence in Abnormal Performance

Suppose we estimate the alphas for a number of funds over a three year period. Can we use those alphas to predict next periodís alphas? Does superior performance in the past indicate a high probability of superior performance in the future? Does poor performance in the past indicate future poor performance?

Consider the following cross-sectional predictive regression:

A positive theta1 will indicate that performance persists.

Research suggests that there is evidence of persistence. However, most of the persistence is coming from poor performance. That is, poor performers are likely to be poor performers in the future.

9.19  Conditional Performance Evaluation

Recent research in performance evaluation has centered on conditional probabilities and conditional expectations. Instead of looking at the average return of a security, we can look at the expected return given certain information, such as the state of the economy, the financial situation of the company, and recent performance.

9.20  Predictability and Efficiency

There are many new studies that show that stock returns at time t can be forecasted with information based at time t-1. For example, Harvey (1989) ( P3) shows that up to 18% of the variation in U.S. stock portfolios can be forecasted on a monthly basis. Harvey (1991) ( P10) finds similar results with international data.

Many people argue that predictability implies inefficiency. However, this is not necessarily true. Consider the conditional CAPM examined by Harvey (1989). The CAPM restricts the conditionally expected returns to be linearly related to conditionally expected excess returns on a market wide portfolio. The coefficient in the linear relation is the asset's beta or the ratio of the conditional covariance with the market to the conditional variance of the market.

Predictability could be driven by changing risk (changing conditional covariance), changing market volatility or changes in the expected returns of the market as a whole. If these changes in risk and volatility are predictable, then it follows that the stock returns for asset j are also predictable.

Ferson and Harvey (1991a) ( P5) and (1991b) ( P7) find that most of the predictability in stock and bond returns can be attributed to predictable shifts in risks and the market wide reward for risks (risk premiums). They consider a multi-risk formulation and find that the risk and risk premium associated with the market return (like in the single risk CAPM) is by far the most important component in explaining the predictabilility of returns. In another decomposition, they find that changing risk premiums -- rather than changing betas -- account for the bulk of the predictability. Their evidence suggests that predictable returns result from predictable shifts in risk and risk premiums rather than some market inefficiency.

9.21  Predictability and Asset Allocation

The new conditional approach to asset pricing and asset allocation has an important implication for performance evaluation. Suppose there is a portfolio manager that is using the today's available information to determine expected returns, conditional variances and covariances. Based upon these conditional measures, she constructs an efficient portfolio. Every month the programs are rerun and she adjusts her portfolio to be on the conditional mean-variance frontier. This is clearly as good as one can do in portfolio management.

Now suppose that we evaluate her performance. It is not fair to evaluate this person based upon realized risk -- such as the method of Sharpe, Treynor, Jensen or Graham-Harvey. The portfolio strategy was based upon the forecast of risk. Realized risk or average risk over the evaluation period is unlikely to be the same as the conditional risk.

This is the key insight. One may reject the mean-variance efficiency of a portfolio based upon average measure. This rejection does not imply that the portfolio is conditionally mean-variance inefficient. In other words, we may incorrectly fire the portfolio manager if we are using the average risk and average return measures.

Conditional portfolio evaluation is possible. Harvey (1989) first proposes the conditional analogue to Jensen's (1969) measure:

where alphai represents the performance in excess of the conditionally expected performance based upon the conditional risk of the portfolio.

9.22  Bansal and Harvey (1995)

The idea pursued in Bansal and Harvey (1995) is that it is naive to evaluate managers relative to passive indices. That is, the managers should be using information to dynamically rebalance their portfolios to capture any predictability in asset returns.

When we allow for dynamic strategies, the performance of the managers becomes even more disappointing. The following is the distribution of the conditional alphas. Notice that our sample has an extreme survivorship bias. We sample the mutual funds which have existed from 1968-1993. This should positively bias performance! However, there is no evidence of this.

Figure 20

The t-ratios for the alphas are presented below.

Figure 21

We also ask the question of whether managers do better (in risk adjusted terms) in recessions or expansions. We find that managers tend to underperform the most during expansions. The CAPM alphas can be seen in Figure 22.

Figure 22

Using our r* benchmark, a similar conclusion is reached.

Figure 23

Finally, we draw the mean-variance frontier which includes the passive strategies as well as dynamic strategies which are functions of known information (e.g., increase equity weight for time t if the interest rate drops during month t-1).

We draw a four standard error range around the frontier and not one fund penetrates the lower bound. Notice that this frontier is drawn with three assets: the T-bill, T-bond and S&P 500. The idea is that these benchmark assets can be easily traded in the futures market. Along with the three benchmarks, we include dynamic strategies which are simply the benchmark at time t multiplied by information (such as the interest rate) at time t-1.

Figure 23