How to choose forecasting models

Steps in choosing a forecasting model
Forecasting flow chart
Data transformations and forecasting models: what to use and when
Automatic forecasting software
Political and ethical issues in forecasting
How to avoid trouble: principles of good data analysis

Political and ethical issues in forecasting

By its very nature, forecasting is often a politically charged activity. Strategic decisions, resource allocations, and careers may hinge on the way things are predicted rather than the way they actually turn out. The forecaster is often placed in the uncomfortable position of being the one who bears bad news or who contradicts the official positions of others in the organization. Furthermore, as you are well aware by now, it is often possible to bias a forecast in one direction or another by unduly restricting the class of models or the set of explanatory variables which are investigated, by varying the length of the sample which is fitted, by deciding to include or suppress influential observations, by focusing on short-term trends rather than long-term trends or vice versa, and so on. This is precisely why, in this course, we have emphasized practices designed to ferret out biases in model selection, such as using naive (e.g., random-walk) models as reference points, paying close attention to residual diagnostics, using out-of-sample validation, determining the relevant forecasting horizon, and taking into account the plausibility of the assumptions that underlie a given forecasting model. However, the fact remains that an analyst who is clever and unscrupulous--or merely careless and easily intimidated--can consciously or unconsciously steer a forecast in a politically convenient direction. ("There are lies, damned lies, and statistics.") The following scenarios might be used as a basis for class discussions. Read them and think about how you would respond in each situation, and what justification you would offer for your position.

Scenario 1. In preparation for an upcoming meeting to discuss the corporation's five-year strategic plan, you have been asked to prepare a forecast for the sales growth in certain product lines. You undertake elaborate statistical data analysis, fitting time series models and regression models to examine the effects of industry trends, product life cycles, market share, demographics, and so on. Finally you come up with a model you feel you can trust, which shows that sales will increase at the rate of about 10 percent in each of the next few years. The 95% confidence interval ranges from 6 percent to 14 percent. Two days before the meeting, you learn that the Vice President for Sales has conducted her own field study, in which individual sales representatives were asked to give their own best estimates of sales in the coming year. Aggregation of these results has led to a prediction that sales will be up by only about 5 percent next year. She has just gotten wind of your study and calls you in to complain that your forecasts may be used by management to set unrealistic quotas for the sales force.

Scenario 2. You are a risk analyst for a large casualty insurance company which handles workers' compensation and liability insurance for numerous Fortune 100 corporations. Such corporations, by virtue of their size, do not need to pool their insurance risks with other organizations. (They can appeal to the Law of Large Numbers.) Hence, they are covered by "retrospectively rated" insurance plans which are tantamount to self-insurance but offer significant tax advantages. Under such a plan, the insured corporation pays the insurance company for its actual losses, with a markup for the insurance company's claims-handling costs and profit. These plans are complicated by the fact that casualty losses "develop" over time. For example, the corporation's actual losses for the 2010 calendar year will not be precisely known for many years: weeks or months may elapse between the time an accident occurs and the time a claim is filed. More weeks or months may elapse before a claims analyst makes an initial estimate of the expected loss to the company, and much more time may elapse before the claim is settled: large claims often go into litigation, where they may remain for years. Even after a claim is settled (i.e., after a schedule of compensation to the injured party is agreed on), the actual amount of monetary compensation may be uncertain, depending on the success of surgical operations, rehabilitation, retraining, and so on. At the end of the insured period , the insurance company prepares a forecast of the total amount of losses which will eventually have to be paid for accidents which occurred during that period. Such a forecast is typically based on the data actually in hand at that point (i.e., claims actually filed and/or settled) together with historical information on "loss development" patterns. The insured corporation must at this point pay the forecasted losses, plus markup. The forecast will be readjusted every year thereafter as more data accumulates, and the insured corporation will then pay more money in or get some money back depending on whether the forecast is revised up or down.

Suppose that you are a recent graduate who has just been hired as an analyst by the insurance company, and your first task is to prepare a revised forecast for the losses which one of their major clients incurred in 2010 and 2011. Your analysis, which takes into account some recent and unfavorable legal precedents pertaining to outstanding liability claims, shows that the combined losses for those two years had been underestimated by 15 million dollars by your predecessor in this job. In other words, based on your new forecast, the client must immediately hand over an additional 15 million dollars (plus markup). The account executive is furious: "We can't tell them that! They'll cancel the account. Our product-safety programs were supposed to be improving their liability rating. Look, most of these big claims are in litigation. We can't be sure how they're all going to turn out. Why don't we just go ahead and stick with the industry-average loss development factors we used last year?"

Scenario 3. You recently supervised a project to build a sales forecasting model for one of your consulting firm's major clients. The model is hierarchical in structure, and produces forecasts at the corporate level, the division level, the regional level, and the store level: it uses seasonal adjustment and exponential smoothing, with adjustment and smoothing factors estimated separately for different regions. It was developed using a statistical modeling language linked to a large database of sales data supplied by the client. Last week you flew to the client's headquarters and presented the model to an assembly of regional vice presidents and managers. Many of them were upset with your results, which disagree with their own private estimates and are likely to affect their budgets adversely. The client's top management insisted that they would back you up. However, this morning when you arrive at work you find a crowd of people outside your office. They are a team of auditors hired by the client to audit your forecasting model. They tell you they'd like a conference room with an internet connection in which they can hang out for a few days, and they'd like to see your notes documenting your model-selection process, printouts of your statistical reports, the computer files containing your modeling code, and the files containing the raw data.  (Scenario 3A: You are an auditor working for an accounting firm that has just been asked to review a forecasting model used by one of your clients.  What should you look for?)

Scenario 4. The advertising agency for which you work is trying to renew its contract with an important client. The client has been balking, claiming that your ad campaigns have been less successful than promised, and that sales growth has been disappointing. Your boss gives you this mandate: "Run me some numbers to show these guys that our ads are really working. Their slow-down in sales could be part of an industry-wide pattern. We have a bunch of comparative data on what's been happening in markets where our ads been shown and markets where they haven't." You analyze the data and discover that the effect of your ads on sales has on the whole been insignificant. However, there is one market in which sales showed a huge upward spike in the week in which your ads began showing there.  Privately, you believe this was due to the NFL playoff game which just happened to be held there at the same time. However, if this market is simply aggregated with all the others without calling attention to that fact....

Scenario 5. You are a professor who teaches statistical forecasting at a leading business school. One day you receive a phone call from a former student saying "my boss has asked me to run some numbers that show our ad campaign is working, but I've having trouble finding a statistically significant relationship. What trick can I use to get the p-value below 0.05.....?"   (This really happened to me.  Don’t let the caller be you next time!)