How to choose forecasting models
Steps in
choosing a forecasting model
Forecasting flow chart
Data transformations and forecasting models: what to use
and when
Automatic forecasting software
Political and
ethical issues in forecasting
How to avoid trouble: principles of good data analysis
By its
very nature, forecasting is often a politically charged activity. Strategic
decisions, resource allocations, and careers may hinge on the way things are predicted
rather than the way they actually turn out. The forecaster is often placed in
the uncomfortable position of being the one who bears bad news or who
contradicts the official positions of others in the organization. Furthermore,
as you are well aware by now, it is often possible to bias a forecast in one
direction or another by unduly restricting the class of models or the set of
explanatory variables which are investigated, by varying the length of the
sample which is fitted, by deciding to include or suppress influential
observations, by focusing on short-term trends rather than long-term trends or
vice versa, and so on. This is precisely why, in this course, we have
emphasized practices designed to ferret out biases in model selection, such as
using naive (e.g., random-walk) models as reference points, paying close
attention to residual diagnostics, using out-of-sample validation, determining
the relevant forecasting horizon, and taking into account the plausibility of
the assumptions that underlie a given forecasting model. However, the fact
remains that an analyst who is clever and unscrupulous--or merely careless and
easily intimidated--can consciously or unconsciously steer a forecast in a
politically convenient direction. ("There are lies, damned lies, and
statistics.") The following scenarios might be used as a basis for class
discussions. Read them and think about how you would respond in each
situation, and what justification you would offer for your position.
Scenario
1.
In preparation for an upcoming meeting to discuss the corporation's five-year
strategic plan, you have been asked to prepare a forecast for the sales growth
in certain product lines. You undertake elaborate statistical data analysis,
fitting time series models and regression models to examine the effects of
industry trends, product life cycles, market share, demographics, and so on.
Finally you come up with a model you feel you can trust, which shows that sales
will increase at the rate of about 10 percent in each of the next few years.
The 95% confidence interval ranges from 6 percent to 14 percent. Two days
before the meeting, you learn that the Vice President for Sales has conducted
her own field study, in which individual sales representatives were asked to
give their own best estimates of sales in the coming year. Aggregation of these
results has led to a prediction that sales will be up by only about 5 percent
next year. She has just gotten wind of your study and calls you in to complain
that your forecasts may be used by management to set unrealistic quotas for the
sales force.
Scenario
2.
You are a risk analyst for a large casualty insurance company which handles
workers' compensation and liability insurance for numerous Fortune 100
corporations. Such corporations, by virtue of their size, do not need to pool
their insurance risks with other organizations. (They can appeal to the Law of
Large Numbers.) Hence, they are covered by "retrospectively rated"
insurance plans which are tantamount to self-insurance but offer significant
tax advantages. Under such a plan, the insured corporation pays the insurance
company for its actual losses, with a markup for the insurance company's
claims-handling costs and profit. These plans are complicated by the fact that
casualty losses "develop" over time. For example, the corporation's
actual losses for the 2010 calendar year will not be precisely known for many
years: weeks or months may elapse between the time an accident occurs and the
time a claim is filed. More weeks or months may elapse before a claims analyst
makes an initial estimate of the expected loss to the company, and much more
time may elapse before the claim is settled: large claims often go into
litigation, where they may remain for years. Even after a claim is settled
(i.e., after a schedule of compensation to the injured party is agreed on), the
actual amount of monetary compensation may be uncertain, depending on the
success of surgical operations, rehabilitation, retraining, and so on. At the
end of the insured period , the insurance company prepares a forecast of the
total amount of losses which will eventually have to be paid for accidents
which occurred during that period. Such a forecast is typically based on the
data actually in hand at that point (i.e., claims actually filed and/or
settled) together with historical information on "loss development"
patterns. The insured corporation must at this point pay the forecasted
losses, plus markup. The forecast will be readjusted every year thereafter as
more data accumulates, and the insured corporation will then pay more money in
or get some money back depending on whether the forecast is revised up or down.
Suppose
that you are a recent graduate who has just been hired as an analyst by the
insurance company, and your first task is to prepare a revised forecast for the
losses which one of their major clients incurred in 2010 and 2011. Your
analysis, which takes into account some recent and unfavorable legal precedents
pertaining to outstanding liability claims, shows that the combined losses for
those two years had been underestimated by 15 million dollars by your
predecessor in this job. In other words, based on your new forecast, the
client must immediately hand over an additional 15 million dollars (plus
markup). The account executive is furious: "We can't tell them that!
They'll cancel the account. Our product-safety programs were supposed to be
improving their liability rating. Look, most of these big claims are in litigation.
We can't be sure how they're all going to turn out. Why don't we just go ahead
and stick with the industry-average loss development factors we used last
year?"
Scenario
3.
You recently supervised a project to build a sales forecasting model for one of
your consulting firm's major clients. The model is hierarchical in structure,
and produces forecasts at the corporate level, the division level, the regional
level, and the store level: it uses seasonal adjustment and exponential
smoothing, with adjustment and smoothing factors estimated separately for
different regions. It was developed using a statistical modeling language linked
to a large database of sales data supplied by the client. Last week you flew to
the client's headquarters and presented the model to an assembly of regional
vice presidents and managers. Many of them were upset with your results, which
disagree with their own private estimates and are likely to affect their budgets
adversely. The client's top management insisted that they would back you up.
However, this morning when you arrive at work you find a crowd of people
outside your office. They are a team of auditors hired by the client to audit
your forecasting model. They tell you they'd like a conference room with an internet
connection in which they can hang out for a few days, and they'd like to see
your notes documenting your model-selection process, printouts of your
statistical reports, the computer files containing your modeling code, and the
files containing the raw data.
(Scenario 3A: You are an auditor working for an accounting firm that has
just been asked to review a forecasting model used by one of your clients. What should you look for?)
Scenario
4.
The advertising agency for which you work is trying to renew its contract with
an important client. The client has been balking, claiming that your ad
campaigns have been less successful than promised, and that sales growth has
been disappointing. Your boss gives you this mandate: "Run me some numbers
to show these guys that our ads are really working. Their slow-down in sales
could be part of an industry-wide pattern. We have a bunch of comparative data
on what's been happening in markets where our ads been shown and markets where
they haven't." You analyze the data and discover that the effect of your
ads on sales has on the whole been insignificant. However, there is one market
in which sales showed a huge upward spike in the week in which your ads began
showing there. Privately, you
believe this was due to the NFL playoff game which just happened to be held
there at the same time. However, if this market is simply aggregated with all
the others without calling attention to that fact....
Scenario
5. You
are a professor who teaches statistical forecasting at a leading business
school. One day you receive a phone call from a former student saying "my
boss has asked me to run some numbers that show our ad campaign is working, but
I've having trouble finding a statistically significant relationship. What
trick can I use to get the p-value below 0.05.....?" (This really happened to me. Don’t let the caller be you next
time!)