There is considerable dissatisfaction with traditional measures of performance evaluation. There is plenty of evidence that the Capital Asset Pricing Model has serious short comings. It is somewhat unsatisfactory to compare investment managers to their peers - the investment manager could look good just because the group as a whole looks bad.
John Graham and Campbell Harvey have developed some simple new performance metrics based on their 1994 NBER working paper, ``Market timing ability and volatility implied in investment newsletters' asset allocation recommendations", their 1996 Journal of Financial Economics paper (Volume 42, pp. 397-422) [View PDF, 2.2mb] and their 1997 Financial Analysts Journal article [View PDF, 1.8mb].
These measures have attracted considerable attention from investment managers and the business press. See Forbes ``The Graham-Harvey Test" June 19, 1995.
Note students and Duke alumni need to send me a separate email verifying their academic status to receive the software.
One approach to performance evaluation is to plot the mean and the standard deviation of each fund using traditional mean-variance analysis. Using this approach, we can assess where each fund lies compared to a benchmark. From the graph, we can tell if one fund is "dominated" by the benchmark (i.e. fund has lower return and higher volatility). However, it is more difficult to discern performance if the fund has lower return and lower volatility. We start our analysis with the following intuition. To make the fund and the benchmark comparable, we can (1) lever up/down the benchmark to match the fund's volatility (so we can compare apples with apples) or (2) lever up/down the fund to match the volatility of the benchmark (here all funds are compared on the same basis).
The idea of our Graham-Harvey ``Measure 1" (GH1) is to lever or unlever the S&P 500 futures to have the exact same volatility as the fund over the evaluation period. GH1 is the difference between the fund return and the return on the volatility-matched futures portfolio. Figure 1 details the geometry of the measure. In the figure, a strategy that unlevers the S&P 500 (by combining the S&P 500 with the Treasury bill to match the volatility of Fund A) has a much higher return than Fund A. Hence, GH1 for Fund A is negative indicating underperformance. Fund B achieves greater performance than a levered S&P futures position and receives a positive GH1. The intuition is simple. If the investor had a target level of volatility equal to Fund A, then the investor would have been much better off holding a fixed weight combination of S&P 500 futures and Treasury bills than holding the fund (or implementing the funds portfolio recommendations every month).
Our Graham-Harvey "Measure 2" (GH2) is related but different. In this measure, we lever up or down the fund's recommended investment strategy (using a Treasury bill), so that the strategy has exactly the same volatility as the S&P 500. Figure 2 shows the geometry of this measure. If Fund A is levered up to achieve the same volatility as the S&P 500 over the evaluation period, it has a lower average return than a simple buy-and-hold in the S&P 500. Hence, the GH2 measure is negative. In contrast, if we lever Fund B downwards (by combining the fund strategy with a cash investment) to achieve the same volatility as the S&P 500, the unlevered fund return is greater than the buy-and-hold S&P 500 and the performance measure is positive. For Fund B, investors would have been better off acting on the fund recommendations compared to a buy-and-hold strategy.
The measures provide different perspectives. Over the evaluation period, Measure 1 just
draws an efficient frontier using the S&P and cash and checks to see if the fund lies
above or below this constructed frontier. The volatility matching approach displayed in
Figure 1 compares the fund return to that for a volatility-matched benchmark over the exact
same sample period.
Measure 2 compares all funds to a common level of volatility - the S&P 500 buy-and-hold
volatility. All funds are on the same footing with GH2. The only potential disadvantage of GH2
is
that it assumes the investor has the ability to lever an investment fund return to have the
same volatility as the market.
To see the difference from another perspective, suppose there exists a fund with a purely
random strategy that switches between 200% long in the market and 200% short in the
market. Also, suppose that the return from this random strategy happens to be one percent
above the risk free rate. If the CAPM beta is zero, then the alpha is 1% and this strategy
would be identified as superior. In contrast, GH1 would find a portfolio of S&P 500 futures
and cash that has identical realized variance. This strategy would likely have twice the
variance of the market. Hence, the random strategy would be compared to a buy and hold
portfolio with double the variance of the market. The random strategy would be a significant
underperformer according to the GH1 measure.
Our performance measures focus on long-run performance. We demonstrate that there is a
direct link between market timing and long-run performance. In particular, Measure 1
compares the returns on the fund portfolios whose weights change through time with
the returns on a constant-weight portfolio with equal volatility. We show that Measure 1
consists of two components, each of which has a direct link to market timing: (i) covariance
between equity weights and market returns, and (ii) a factor that penalizes changes in equity
weights that do not time the market.
It is best to think of this in terms of market timing.
The idea of market timing is to reduce equity exposure before market declines and to increase
exposure before market rallies. The successful timer's average return should be greater than
the return on the constant-weight portfolio. Indeed, ignoring the cash returns, the following
expression should be positive for a successful market timer:
where w i represents fund i's equity weights and rm is the market equity return. The
first
term is the average fund performance where w i is changing through time as
recommendations change. The second term represents a return on a constant weight strategy
where the constant is just the average market exposure of the fund. However, (1) is just
the definition of the covariance between weights and market returns. A positive covariance
defines successful market timing [weights increasing (decreasing) during market rallies
(declines)]. By definition, a positive covariance implies that the variable weight fund
strategy has a higher average return than the constant weight strategy. Thus, the component
of Measure 1 that measures the covariance between equity weights and market returns is a
direct measure of market timing.
The second component of Measure 1 penalizes managers for changes in equity weights that
do not time the market. To see why this makes sense, notice that the variance of a
manager's
returns has two sources (when returns and weights are uncorrelated): the variance of equity
returns and the variance of the weights. That is, a manager that randomly changes weights
induces volatility into its portfolio returns simply by changing the equity weights. The
component of Measure 1 that penalizes strategies that are changing weights for the wrong
reasons essentially says, "if you are changing weights and given that it is obvious that random
weight changes contribute to variance, then you better be changing weights to achieve a
higher return, i.e. you better be timing the market."
What could cause a fund to change equity weights? Given a level of risk aversion,
weights would change if (i) the manager believes there are time-varying expected returns;
and/or (ii) if the manager believes there are time-varying market volatilities. Let's
concentrate on (i). With time-varying expected returns, the manager would increase
(decrease) weights when the expected market returns is above (below) the average expected
return. A random shift in the weight may increase volatility. However, a carefully planned
shift in the weights to time the market, may not. Further, expected returns should increase if
the investor has some ability to detect time-varying expected market returns. That is, if you
are changing weights in a way which successfully times the market, then it is possible to
increase average returns and perhaps even decrease variance.
In terms of the familiar mean-variance analysis, the successful market timer should be above
the efficient frontier when he increases market exposure before upturns and decreases market
exposure during downturns. Remember, that the mean-variance frontier is traditionally drawn
with fixed investment weights. So even though changing weights are contributing to variance,
the positive covariance between the weights and the market returns could actually decrease
fund volatility for the successful market timer. Thus, a successful market timer will have
a positive value for Measure 1, which indicates that he lies above the efficient frontier. The
successful timer will lie above the frontier because 1) there is a positive covariance between his
equity weights and the market, and/or he is not unduly penalized for changes in equity
weights that do not time the market.
We have applied our ideas to evaluate the strategies of market timing investment newsletters. In
principle, our metrics could be applied to any well diversified managed portfolio.
Forbes featured an analysis of our measure applied to 326 mutual funds.
Using our measure 1, Forbes identified a collection of funds that outearn the market
and yield a lower volatility. Also, Forbes' funds dominate those selected by using just the
Sharpe ratio.This evidence suggests that performance is persistent. In addition, there appears
to be a benefit to using our measures to assess future performance based on past performance.
We propose two measures for evaluating the performance of asset allocation recommendations
from investment funds. The idea of our first measure is to compare the fund's
return to a portfolio of S&P 500 futures and cash which has the same volatility over
the evaluation period. The benchmark has fixed investment weights. Most fund strategies
have variable weights. Presumably, if the manager is successfully timing the market (increase
weights before market upticks and decrease weights before market downticks), the manager
should be able to outperform this passive benchmark.
A second measure volatility-adjusts the fund's strategy. We construct a portfolio of the
fund return and a Treasury bill that has exactly the same volatility as the S&P 500. The
difference between the returns on the volatility-adjusted strategy and the S&P 500 defines
Measure 2.
New Performance Measures vs. Traditional Measures
How are our new performance measures related to traditional measures? Consider the alpha
from the Capital Asset Pricing Model (CAPM) of Sharpe. In the CAPM environment, the
manager's excess return is regressed on the market excess return. Roughly, the beta picks up
the average level of market exposure. The alpha represents the extra return that the manager
earns over and above a position with a (fixed) average market exposure. This is very much
related to GH1 where we adjust the market variance to have exactly the same variance as the
fund. The difference is the following. In GH1, the benchmark (market index and cash)
will be constructed to have exactly the same volatility as the fund. In the CAPM,
the benchmark portfolio (beta times the market index) will have a different volatility than the
fund. Using the CAPM, the fund volatility equals beta times the standard deviation
of the market index return (the benchmark) plus the standard deviation of the idiosyncratic
return. In contrast, GH1 exactly matches the total volatility of the fund.
An Economic Interpretation of the Graham-Harvey Performance Measures
You Should Consider Using GH1 and GH2
ORDER EXCEL VERSION OF PERFORMANCE MEASURES
Note students and Duke alumni need to send me a separate email verifying their academic status to receive the software.
However, the assumption that M-square make about about the zero correlation of the interest return and the asset being evaluated is only true if the maturity of the cash instrument exactly coincides with the evaluation period. Indeed, it is reasonably well known that there is a negative correlation between the interest rate changes and both stock and bond returns. Further relative to GH2, the assumption could result in misleading inference about the performance of low volatility funds where substantial leverage is needed to achieve the S&P 500 volatility. In a sample of well diversified funds, this is not an issue. However, in applying GH2 to a broader class of asset returns, certain issues arise. For example, substantial leverage would have to be employed to lever a money market fund to achieve the volatility of the S&P 500.
The following movie demonstrates the differences between the GH2 and the M-squared measure. Depending on the level of correlation, the M-squared measure could be highly misleading. View video.