Thursday, September 10, 2015

Why Unlikely Events Are Not Unlikely

A recent FaceBook post starts:
Is it really a coincidence that so many unprecedented weather events are happening this year
with a link to a news story about a "once in 50 years" rain in Japan. It is an argument I frequently see made, explicitly or implicitly. Lots of unlikely things are happening and there must be a reason. When the subject is climate change the unlikely things are mostly about climate.

It looks convincing until you think about it. The world is large. There are lots of different places in it where, if an unusual weather event happens, it is likely to show up in the news. There are at least four categories of unusual weather events that could happen—unusually hot, unusually cold, unusually large amount of rain, unusually small amount of rain—and probably a few others I haven't thought of. A year contains four seasons and twelve months and a record in any of them is newsworthy—a recent news story, for example, claimed that this August was the hottest August in the tropics on the record.

For a very rough estimate of how many chances there are each year for an unlikely event to happen and make the news, I calculate:
100 countries prominent enough + 100 cities prominent enough +10 geographic regions (tropics, poles, North America, ...) + 50 U.S. states = 260
x
12 months + 4 seasons=16
4 kinds of events that would qualify 

=16,640 opportunities each year for an unlikely weather event to occur and be reported.

So we would expect more than 300 once in 50 years events to happen each year and about sixteen one in a thousand events.

My guess is that those number are too low—the story about floods in Japan does not make it clear whether the one in fifty years record is for the whole country or only one region. But they at least show why we should expect lots of unlikely things to happen each year.

If you flip a coin ten times and get ten heads, you should be surprised. If you flip sixteen thousand coins ten times each, you can expect to get ten heads about sixteen times—and should not be surprised when you do.

17 comments:

mlorrey said...

Awesome post David. Imma pass this to Anthony watts.

mlorrey said...

Awesome post David. Imma pass this to Anthony watts.

Brandon Berg said...

This also applies on the cosmic scale, and explains how abiogenesis can happen. It may be fantastically unlikely for inorganic chemicals to give rise to self-replicating molecules and thence to life, but given billions of years and 10^29 stars in the universe (and this is assuming only one universe), fantastically unlikely things are pretty much guaranteed to happen sooner or later.

Colombo said...

Well, I respectfully disagree. People have a right to be surprised if suprise is what they are looking for.
Maths seem to freeze romanticism sometimes.
The two main function of lies in society is to prevent boredom and some forms of violence.
Besides, journalists also need to make a living. Otherwise, they would be forced to become novelists, with a high probability of producing horrible material.

Anonymous said...

Quite true: if you have a large enough set of potentially-surprising events, it's almost certain that some of them will be "surprising" (aka "statistically significant").

I took a statistical-thinking class at work a few months ago, in which the main takeaway was "Decide what questions you're trying to ask BEFORE selecting the data with which to answer them. The fewer questions the better, because each additional question you ask uses up some entropy 'pixie dust'. If you're not sure what questions to ask, divide your data into an 'exploration' data set and a 'confirmation' data set: use the former to find potentially significant results and the latter to test them for significance."

The interesting questions, with regard to climate change, are things like "is the number of 'surprising' weather events significantly different from what one would expect?" and "is the number of 'surprising' weather events significantly trending?" (I would expect the number of 'surprising' weather events per year now to be higher than a century ago, due to better data collection and a faster, cheaper news cycle, but perhaps not higher than 30 years ago.) These questions can be answered rigorously: the large number of potentially-surprising weather events also means that the predicted variance in the number of surprising events per year is fairly low, so one might well be able to find significance, if one didn't ask too many different questions of this form. But I have no idea what the actual answer is.

Max said...

hudebnik, unfortunately that procedure gets less and less effective over time for historical data. Because the "exploration" is influenced, consciously or unconsciously, by past "confirmed" results on the same data. And so, only the passage of time provides a true test.

John C. Webb said...
This comment has been removed by the author.
Miko said...

This argument requires quite a few independence assumptions, and I think almost all of them are wrong.

LH said...

@Miko: How does this argument require "quite a few independence assumptions?"

It seems to me the argument only relies on two points -(1) That there is a large set of potential record categories, and (2) in a sufficiently large sample of data one would expect to find at least some instances of low-probability events. That should be true regardless any other properties of the distribution.

Watch professional sports and listen to the commentary. Baseball in particular seems to provide many illustrations of this argument. You won't watch long before a commentator explains how the batter is about to break the record for most consecutive Tuesday afternoon doubles among National League teams for a right-handed batter against a left-handed pitcher born outside of the United States after the Paris Peace Accords. That example depends somewhat on specificity rather than size of the category set, but the reason behind that specificity is to increase the size of the set so that a record might be found and displayed as a point of interest to the fan, just as "extreme" or "unlikely" weather events are frequently used to promote certain lines of thinking. The individual events might be unlikely, but that there should be some unlikely events occurring is not unlikely in the least.

John C. Webb said...
This comment has been removed by the author.
Tibor said...

Concerning the coin tosses - this reminds me of my bachelor thesis which was about the k-geometric distribution, so more or less how many times you have to toss a possibly biased coin so that you get k heads in a row. What makes it interesting from the mathematical point of view is that there are no nice formulas for its distribution function so you have to do with various approximations if you want to use it in practice. What makes it interesting in general is how contraintuitive it is.

If you have people who are not statisticians or probabilists make up 120 tosses of a coin so that it is supposed to simulate actual tossing, they will usually not come up with a series of more than 3 heads or 3 tails in a row. Intuitively, more than that is just unlikely. But what is unlikely in reality is exactly that there is no sequence of at least 4 tosses with the same result in a row and a sequence of 120 toss results which does not have them is in all likelihood false. To be more specific, the probability of at least 4 heads or tails in a row in a sequence of 120 tosses of an unbiased coin is approximately 0.987 (or about 98% if you, like most people, enjoy multiplying things by 100 :) ). A good way to teach this is to have half of the people (say children in a class) do actual tosses and record them on a piece of paper whereas you have the other half make the results up. Unless they have a very good intuitive grasp of statistics, it will then be very easy for the lecturer to spot the false series. I should mention that this experiment was probably (according to Paul Révész) first done by someone called T. Varga.

Of course these real world events are not really independent (they can be made equally distributed if you only consider "1 in x years" events with a fixed x). On the other hand the correlation between a heavy rain in one region in Japan and a drought in Italy (for example) is probably not too high so this can serve as good sanity check.

Colombo said...

John C. Webb, Do you have a sense of humor?

John C. Webb said...

Colombo- Obviously not. Sorry.

James Picone said...

http://www.researchgate.net/publication/234114398_Global_increase_in_record-breaking_monthly-mean_temperatures seems relevant here. Also http://onlinelibrary.wiley.com/doi/10.1029/2009GL040736/abstract .

People have done stats on this. Extreme stuff is happening more often than would be expected in the no-warming scenario.

I vaguely recall seeing a WUWT post where they were crowing that there was no trend in hot records, and being amused that they didn't spot the problem with that...

David Friedman said...

James: "Extreme stuff" or extreme high temperatures? Given that temperatures have been trending up for about a century, I would expect extreme highs to be more common, extreme lows less common, which I think is what has happened. The IPCC also claims an increase in extreme heavy rainfall.

But I see a lot of claims about extreme events in general. I believe the current IPCC report doesn't claim increased drought or hurricane frequency or strength so far.

In any case, the point of my post was not whether climate change of one sort or another was happening, it was what did or didn't count for evidence, on that question or others. In my observation, both sides of the climate debate offer arguments most of which are weak.

James Picone said...

Extreme high temperatures there.

IIRC hurricanes are trendless at present, which I think is broadly what the IPCC says. My impression is there's significant disagreement on what the likely effects on hurricanes are.

I was under the impression that there does seem to be slight increase in drought, can't point to studies offhand.

Tamino has a number of posts on wildfire in the US that show weak upwards trends, don't know how global that is.

I vaguely recall seeing graphs of 'weather disasters' produced by insurance companies that seemed to show upwards trends.

I"m reasonably certain I've seen some discussions on tornadoes that had weak upwards trends.

Floods might be another disaster of interest to look into, given the expected effects on rainfall and sea level. Can't think of any studies offhand.

I agree that discussing such events without statistics is a Bad Argument, with the exception of events that are so out of the ordinary that it just wouldn't happen without climate change (heatwaves that would have been six-sigma in the preindustrial, etc.)

David Friedman said...

High temperatures are also a straightforward prediction, whereas all of the other increases require a more complicated argument.

I wouldn't expect sea level to have much effect on flooding so far, given how small the increase has been. I'm not sure what the evidence is on heavy rains, but that's at least one of the IPCC predictions.