Bashing forecasters from footie to FTSE; is it BBQ time for Paul the Octopus?
Bashing forecasts and forecasting seems to be in vogue lately. Not only have recent years seen a number of political upsets, but the World Cup saw similarly notable surprises (and not just England winning a penalty shootout).
But is the criticism of forecasting fair? Should we junk them and bring back Paul the Octopus, whose football ‘predictions’ brought him global fame as some kind of marine oracle?
Then there was circus chimp Lusha, the Russian stock expert and there was Raven a six year old female chimp that added 79% to he MonkeyDex portfolio in 1999 and a further, whopping 213% the following year making her the 22nd most profitable money manager in the US.
Or is the real problem with the way forecasts are being interpreted?
To answer that, we need to think about what a forecast or a prediction is. Essentially, it’s an assessment of the likelihood of an outcome (or outcomes) at some point in the future.
And how those estimates are communicated and reported can have a big impact on how they are perceived; as in the case of Paul, this can be harmless fun.
Quite often the outcomes will not be binary. To take the World Cup as an example, ahead of the quarter finals, there were eight possible outcomes for the tournament victor, corresponding to the eight remaining teams.
Suppose a model gives World Cup win probabilities for each team of Brazil 25%, France 20%, and so on.
In this case, Brazil would have been favourite because 25% is the highest probability.
But if there’s a 25% chance of something happening, there’s a 75% chance of it not happening.
This means, perhaps counterintuitively, that Brazil might have been favourites to win the World Cup but also that Brazil probably will not win the World Cup; isn’t hindsight a wonderful thing!
But the two statements are not contradictory – where no team has a greater than even probability of victory, the favourite is really the least unlikely.
So when the UBS model made Germany favourites, and the predictions were in one case headlined as ‘Germany will win…’, this was not the case.
‘Will’ implies certainty – the UBS model didn’t even make Germany likely winners – it merely gave them the shortest odds (a 24% probability). In this case, the problem is clearly interpretation.
What about a binary outcome, such as a knockout-stage match or a referendum result? In this case, with only two possible outcomes the favourite is necessarily one with a probability of more than 50% and therefore likelier to happen that not.
But how likely is likely?
The issue of how forecasts are expressed is perhaps the statistical equivalent of the Müller-Lyer illusion (the optical illusion where arrows of the same length appear different lengths depending on the direction of the fins).
Suppose it’s EU referendum night and two on-the-day polls have put Remain on an average of 53% excluding refusers and non-voters.
Based on history, we might assume that the result has an approximately normal distribution, and 95% of the time should be within plus or minus 8 percentage points of what the polls say.
You might choose to express this as a 95% chance that Remain gets between 45% and 61%, which sounds massively hedged.
Alternatively, you might say that there is a 77% chance of Remain winning, which sounds like a confident prediction.
Yet these two ways of expressing a probability are actually saying the same thing. That they sound very different is largely due to the fact that not all points within the range are equally likely (and that that is often overlooked), and also that people seem to misinterpret probabilities between about 60% and 90% – in other words, likelier-than-not, but nowhere near certain, yet often treated as though near certain.
Empirically, the probability of a penalty being scored is right in the middle of this range, at 70% to 80% depending on whether or not it’s in a shootout and which competition it’s in.
This sort of probability is often treated as a very confident prediction of what might happen. But talk in terms of things that actually occur that proportion of the time, and it sounds very different.
For example, you don’t need be Carlos Bacca or have looked at the data or to know that a great many penalties are in fact missed.
It’s clear, then, that a lot of the problems with forecasting can be explained by communication and reporting issues, rather than methodology.
There is, of course, still a question mark over forecast performance in terms of the number of ostensibly low-probability events that have occurred recently.
But that doesn’t mean an individual forecast is necessarily flawed simply because a low-probability event happens.
Forecasts that don’t provide an indication of the range of uncertainly at all, and only give mid points, such as (usually) forecasts of the temperature later this week or GDP growth for
the rest of the year, are much harder to evaluate.
Without further information, it’s hard to say much beyond whether they consistently miss one way or the other.
More generally, forecasts can be improved, but the way they are communicated (both by forecasters themselves and by the media) can be improved substantially.
Forecasts are not useless, as long as their limitations are properly understood. End users should as always exercise due scepticism, but not dismiss forecasting out of hand.
Either way, it’s time to put the octopus on the barbecue.
Matt Singh is the founder of Number Cruncher Politics