Graham-Harvey Performance Metrics

John R. Graham

Fuqua School of Business, Duke University, Durham, NC 27708

Campbell R. Harvey

Fuqua School of Business, Duke University, Durham, NC 27708
National Bureau of Economic Research, Cambridge, MA 02123

Motivating new performance measures

There is considerable dissatisfaction with traditional measures of performance evaluation. There is plenty of evidence that the Capital Asset Pricing Model has serious short comings. It is somewhat unsatisfactory to compare investment managers to their peers - the investment manager could look good just because the group as a whole looks bad.

John Graham and Campbell Harvey have developed some simple new performance metrics based on their 1994 NBER working paper, ``Market timing ability and volatility implied in investment newsletters' asset allocation recommendations", their 1996 Journal of Financial Economics paper (Volume 42, pp. 397-422) [View PDF, 2.2mb] and their 1997 Financial Analysts Journal article [View PDF, 1.8mb].

These measures have attracted considerable attention from investment managers and the business press. See Forbes ``The Graham-Harvey Test" June 19, 1995.


Note students and Duke alumni need to send me a separate email verifying their academic status to receive the software.

Traditional mean-variance intuition

One approach to performance evaluation is to plot the mean and the standard deviation of each fund using traditional mean-variance analysis. Using this approach, we can assess where each fund lies compared to a benchmark. From the graph, we can tell if one fund is "dominated" by the benchmark (i.e. fund has lower return and higher volatility). However, it is more difficult to discern performance if the fund has lower return and lower volatility. We start our analysis with the following intuition. To make the fund and the benchmark comparable, we can (1) lever up/down the benchmark to match the fund's volatility (so we can compare apples with apples) or (2) lever up/down the fund to match the volatility of the benchmark (here all funds are compared on the same basis).

New Performance Measures

The idea of our Graham-Harvey ``Measure 1" (GH1) is to lever or unlever the S&P 500 futures to have the exact same volatility as the fund over the evaluation period. GH1 is the difference between the fund return and the return on the volatility-matched futures portfolio. Figure 1 details the geometry of the measure. In the figure, a strategy that unlevers the S&P 500 (by combining the S&P 500 with the Treasury bill to match the volatility of Fund A) has a much higher return than Fund A. Hence, GH1 for Fund A is negative indicating underperformance. Fund B achieves greater performance than a levered S&P futures position and receives a positive GH1. The intuition is simple. If the investor had a target level of volatility equal to Fund A, then the investor would have been much better off holding a fixed weight combination of S&P 500 futures and Treasury bills than holding the fund (or implementing the funds portfolio recommendations every month).

Our Graham-Harvey "Measure 2" (GH2) is related but different. In this measure, we lever up or down the fund's recommended investment strategy (using a Treasury bill), so that the strategy has exactly the same volatility as the S&P 500. Figure 2 shows the geometry of this measure. If Fund A is levered up to achieve the same volatility as the S&P 500 over the evaluation period, it has a lower average return than a simple buy-and-hold in the S&P 500. Hence, the GH2 measure is negative. In contrast, if we lever Fund B downwards (by combining the fund strategy with a cash investment) to achieve the same volatility as the S&P 500, the unlevered fund return is greater than the buy-and-hold S&P 500 and the performance measure is positive. For Fund B, investors would have been better off acting on the fund recommendations compared to a buy-and-hold strategy.

The measures provide different perspectives. Over the evaluation period, Measure 1 just draws an efficient frontier using the S&P and cash and checks to see if the fund lies above or below this constructed frontier. The volatility matching approach displayed in Figure 1 compares the fund return to that for a volatility-matched benchmark over the exact same sample period. Measure 2 compares all funds to a common level of volatility - the S&P 500 buy-and-hold volatility. All funds are on the same footing with GH2. The only potential disadvantage of GH2 is that it assumes the investor has the ability to lever an investment fund return to have the same volatility as the market.

New Performance Measures vs. Traditional Measures

How are our new performance measures related to traditional measures? Consider the alpha from the Capital Asset Pricing Model (CAPM) of Sharpe. In the CAPM environment, the manager's excess return is regressed on the market excess return. Roughly, the beta picks up the average level of market exposure. The alpha represents the extra return that the manager earns over and above a position with a (fixed) average market exposure. This is very much related to GH1 where we adjust the market variance to have exactly the same variance as the fund. The difference is the following. In GH1, the benchmark (market index and cash) will be constructed to have exactly the same volatility as the fund. In the CAPM, the benchmark portfolio (beta times the market index) will have a different volatility than the fund. Using the CAPM, the fund volatility equals beta times the standard deviation of the market index return (the benchmark) plus the standard deviation of the idiosyncratic return. In contrast, GH1 exactly matches the total volatility of the fund.

To see the difference from another perspective, suppose there exists a fund with a purely random strategy that switches between 200% long in the market and 200% short in the market. Also, suppose that the return from this random strategy happens to be one percent above the risk free rate. If the CAPM beta is zero, then the alpha is 1% and this strategy would be identified as superior. In contrast, GH1 would find a portfolio of S&P 500 futures and cash that has identical realized variance. This strategy would likely have twice the variance of the market. Hence, the random strategy would be compared to a buy and hold portfolio with double the variance of the market. The random strategy would be a significant underperformer according to the GH1 measure.

An Economic Interpretation of the Graham-Harvey Performance Measures

Our performance measures focus on long-run performance. We demonstrate that there is a direct link between market timing and long-run performance. In particular, Measure 1 compares the returns on the fund portfolios whose weights change through time with the returns on a constant-weight portfolio with equal volatility. We show that Measure 1 consists of two components, each of which has a direct link to market timing: (i) covariance between equity weights and market returns, and (ii) a factor that penalizes changes in equity weights that do not time the market.

It is best to think of this in terms of market timing. The idea of market timing is to reduce equity exposure before market declines and to increase exposure before market rallies. The successful timer's average return should be greater than the return on the constant-weight portfolio. Indeed, ignoring the cash returns, the following expression should be positive for a successful market timer:

E[wi rm ] - E[wi]E[rm] (1)

where w i represents fund i's equity weights and rm is the market equity return. The first term is the average fund performance where w i is changing through time as recommendations change. The second term represents a return on a constant weight strategy where the constant is just the average market exposure of the fund. However, (1) is just the definition of the covariance between weights and market returns. A positive covariance defines successful market timing [weights increasing (decreasing) during market rallies (declines)]. By definition, a positive covariance implies that the variable weight fund strategy has a higher average return than the constant weight strategy. Thus, the component of Measure 1 that measures the covariance between equity weights and market returns is a direct measure of market timing.

The second component of Measure 1 penalizes managers for changes in equity weights that do not time the market. To see why this makes sense, notice that the variance of a manager's returns has two sources (when returns and weights are uncorrelated): the variance of equity returns and the variance of the weights. That is, a manager that randomly changes weights induces volatility into its portfolio returns simply by changing the equity weights. The component of Measure 1 that penalizes strategies that are changing weights for the wrong reasons essentially says, "if you are changing weights and given that it is obvious that random weight changes contribute to variance, then you better be changing weights to achieve a higher return, i.e. you better be timing the market."

What could cause a fund to change equity weights? Given a level of risk aversion, weights would change if (i) the manager believes there are time-varying expected returns; and/or (ii) if the manager believes there are time-varying market volatilities. Let's concentrate on (i). With time-varying expected returns, the manager would increase (decrease) weights when the expected market returns is above (below) the average expected return. A random shift in the weight may increase volatility. However, a carefully planned shift in the weights to time the market, may not. Further, expected returns should increase if the investor has some ability to detect time-varying expected market returns. That is, if you are changing weights in a way which successfully times the market, then it is possible to increase average returns and perhaps even decrease variance.

In terms of the familiar mean-variance analysis, the successful market timer should be above the efficient frontier when he increases market exposure before upturns and decreases market exposure during downturns. Remember, that the mean-variance frontier is traditionally drawn with fixed investment weights. So even though changing weights are contributing to variance, the positive covariance between the weights and the market returns could actually decrease fund volatility for the successful market timer. Thus, a successful market timer will have a positive value for Measure 1, which indicates that he lies above the efficient frontier. The successful timer will lie above the frontier because 1) there is a positive covariance between his equity weights and the market, and/or he is not unduly penalized for changes in equity weights that do not time the market.

We have applied our ideas to evaluate the strategies of market timing investment newsletters. In principle, our metrics could be applied to any well diversified managed portfolio.

Forbes featured an analysis of our measure applied to 326 mutual funds. Using our measure 1, Forbes identified a collection of funds that outearn the market and yield a lower volatility. Also, Forbes' funds dominate those selected by using just the Sharpe ratio.This evidence suggests that performance is persistent. In addition, there appears to be a benefit to using our measures to assess future performance based on past performance.

You Should Consider Using GH1 and GH2

We propose two measures for evaluating the performance of asset allocation recommendations from investment funds. The idea of our first measure is to compare the fund's return to a portfolio of S&P 500 futures and cash which has the same volatility over the evaluation period. The benchmark has fixed investment weights. Most fund strategies have variable weights. Presumably, if the manager is successfully timing the market (increase weights before market upticks and decrease weights before market downticks), the manager should be able to outperform this passive benchmark.

A second measure volatility-adjusts the fund's strategy. We construct a portfolio of the fund return and a Treasury bill that has exactly the same volatility as the S&P 500. The difference between the returns on the volatility-adjusted strategy and the S&P 500 defines Measure 2.


Note students and Duke alumni need to send me a separate email verifying their academic status to receive the software.

Endnotes on M-squared and GH2

Franco Modigliani and Leah Modigliani (JPM 1997) apply a measure similar to GH2 to a sample of mutual funds. However, they do not allow for curvature in the efficient frontier. That is, they assume that the cash return has zero variance and zero covariance with other assets. Indeed, their measure is a simple transformation of the famous Sharpe ratio. Applying the M-squared measure to evaluate mutual funds gives the exact same ordering as the Sharpe measure.

However, the assumption that M-square make about about the zero correlation of the interest return and the asset being evaluated is only true if the maturity of the cash instrument exactly coincides with the evaluation period. Indeed, it is reasonably well known that there is a negative correlation between the interest rate changes and both stock and bond returns. Further relative to GH2, the assumption could result in misleading inference about the performance of low volatility funds where substantial leverage is needed to achieve the S&P 500 volatility. In a sample of well diversified funds, this is not an issue. However, in applying GH2 to a broader class of asset returns, certain issues arise. For example, substantial leverage would have to be employed to lever a money market fund to achieve the volatility of the S&P 500.

The following movie demonstrates the differences between the GH2 and the M-squared measure. Depending on the level of correlation, the M-squared measure could be highly misleading. View video.