### Teaching Statistics with World of Warcraft

In an earlier post I proposed an economics course built around World of Warcraft. I have much less experience teaching statistics than teaching economics and I suspect the game is less suited for the former than the latter purpose. But it does occur to me that it provides quite a lot of opportunities for observing data and trying to infer patterns from it and so could be used to both explain and apply statistical inference. And I suspect that, as in the case of economics, application to a world with which the student was familiar and involved and to problems of actual interest to him would have a significant positive effect on attention and understanding.

Consider the question of whether a process is actually random. Human beings have very sensitive pattern recognition software—so sensitive that it often sees patterns that are not there. There is a tradeoff, as any statistician knows, between type 1 and type 2 errors, between seeing something that isn't there and failing to see something that is. In the environment humans evolved in, there were good reasons to prefer the first sort of error to the second. Mistaking a tree branch for a lurking predator is a less costly mistake than misidentifying a lurking predator as a tree branch. One result is that gamblers routinely see patterns in random events—"hot dice," a "loose" slot machine, or the like.

Players in World of Warcraft see such patterns too. But in that case, the situation is made more complicated and more interesting by the fact that the "random" events might not be random, might be the deliberate result of programming. In the real world it is usually safe to assume that the dice which you have used in the past will continue to produce the same results, about a 1/6 chance of each of the numbers 1-6, in the future. But in the game it is always possible that the odds have changed, that the latest update increased the drop rate for the items you are questing for from one in four to one in two, even one in one. It is even possible, although not I think likely, that some mischievous programmer has introduced serial correlation into otherwise random events, that the dice really are sometimes hot and sometimes cold.

A few days ago I was on a quest which required me to acquire five copies of an item. The item was dropped by a particular sort of creature. Past experience suggested a drop rate of about one in four. I killed four creatures, got four drops, and began to wonder if something had changed.

It occurred to me that the question was one to which statistics, specifically Bayesian statistics, was applicable. Many students, indeed many people who use statistics, have a very imperfect idea of what statistical results mean, a point that recently came up in the comment thread to a post here when someone quoted the report of the IPCC explaining the meaning of its confidence results and getting it wrong. My recent experience in World of Warcraft provided a nice example of how one should go about getting the information that people mistakenly believe a confidence result provides.

The null hypothesis is that the drop rate has not changed—each creature I kill has one chance in four of dropping what I want. The alternative hypothesis is that the latest update has raised the rate to one in one. A confidence result tells us how likely it is that, if the null hypothesis is true, the evidence for the alternative hypothesis will be at least as good as it is. Elementary probability theory tells us that, if the null hypothesis is correct, the chance of getting four drops out of four is only one in 256. Hence my experiment confirms the alternative hypothesis at (better than ) the .01 level.

Does that mean that the odds that the drop rate has been raised to one in one are better than 99 to 1? That is how, in my experience, people commonly interpret such results—as when the IPCC report explained that "very high confidence represents at least a 9 out of 10 chance of being correct; high confidence represents about an 8 out of 10 chance of being correct."

It does not. 1/256 is not the probability that the drop rate has changed, it is the probability that I would get four drops out of four if it had not changed. To get from there to the probability that it had—the probability that would be relevant if, for example, I wanted to bet someone that the fifth kill would give me my final drop—I need some additional information. I need to know how likely it is, prior to my doing the experiment, that the drop rate has been changed. That prior probability, plus the result of my experiment, plus Bayes Theorem, gives me the posterior probability that I want.

Suppose we determine by reading the patch notes of past patches or by getting a Blizzard programmer drunk and interrogating him, that any particular drop rate has a one in ten thousand chance of being changed in any particular patch. The probability of getting my result via a change in the drop rate is then .0001 (the probability of the change) times 1 (the probability of the result if the changed occurred--for simplicity I am assuming that if there was a change it raised the drop rate to 1). The probability of getting it without a change by random chance is .9999 (the probability that there was no change) x 1/256 (the probability of the result if there was no change). The second number is about forty times as large as the first, so the odds that the drop rate is still the same are about forty to one.

And I suspect, although I may be mistaken, that the odds that a student who spent his spare time playing World of Warcraft would find the explanation interesting and manage to follow it are higher than if I were making the same argument in the context of an imaginary series of coin tosses, as I usually do.

Consider the question of whether a process is actually random. Human beings have very sensitive pattern recognition software—so sensitive that it often sees patterns that are not there. There is a tradeoff, as any statistician knows, between type 1 and type 2 errors, between seeing something that isn't there and failing to see something that is. In the environment humans evolved in, there were good reasons to prefer the first sort of error to the second. Mistaking a tree branch for a lurking predator is a less costly mistake than misidentifying a lurking predator as a tree branch. One result is that gamblers routinely see patterns in random events—"hot dice," a "loose" slot machine, or the like.

Players in World of Warcraft see such patterns too. But in that case, the situation is made more complicated and more interesting by the fact that the "random" events might not be random, might be the deliberate result of programming. In the real world it is usually safe to assume that the dice which you have used in the past will continue to produce the same results, about a 1/6 chance of each of the numbers 1-6, in the future. But in the game it is always possible that the odds have changed, that the latest update increased the drop rate for the items you are questing for from one in four to one in two, even one in one. It is even possible, although not I think likely, that some mischievous programmer has introduced serial correlation into otherwise random events, that the dice really are sometimes hot and sometimes cold.

A few days ago I was on a quest which required me to acquire five copies of an item. The item was dropped by a particular sort of creature. Past experience suggested a drop rate of about one in four. I killed four creatures, got four drops, and began to wonder if something had changed.

It occurred to me that the question was one to which statistics, specifically Bayesian statistics, was applicable. Many students, indeed many people who use statistics, have a very imperfect idea of what statistical results mean, a point that recently came up in the comment thread to a post here when someone quoted the report of the IPCC explaining the meaning of its confidence results and getting it wrong. My recent experience in World of Warcraft provided a nice example of how one should go about getting the information that people mistakenly believe a confidence result provides.

The null hypothesis is that the drop rate has not changed—each creature I kill has one chance in four of dropping what I want. The alternative hypothesis is that the latest update has raised the rate to one in one. A confidence result tells us how likely it is that, if the null hypothesis is true, the evidence for the alternative hypothesis will be at least as good as it is. Elementary probability theory tells us that, if the null hypothesis is correct, the chance of getting four drops out of four is only one in 256. Hence my experiment confirms the alternative hypothesis at (better than ) the .01 level.

Does that mean that the odds that the drop rate has been raised to one in one are better than 99 to 1? That is how, in my experience, people commonly interpret such results—as when the IPCC report explained that "very high confidence represents at least a 9 out of 10 chance of being correct; high confidence represents about an 8 out of 10 chance of being correct."

It does not. 1/256 is not the probability that the drop rate has changed, it is the probability that I would get four drops out of four if it had not changed. To get from there to the probability that it had—the probability that would be relevant if, for example, I wanted to bet someone that the fifth kill would give me my final drop—I need some additional information. I need to know how likely it is, prior to my doing the experiment, that the drop rate has been changed. That prior probability, plus the result of my experiment, plus Bayes Theorem, gives me the posterior probability that I want.

Suppose we determine by reading the patch notes of past patches or by getting a Blizzard programmer drunk and interrogating him, that any particular drop rate has a one in ten thousand chance of being changed in any particular patch. The probability of getting my result via a change in the drop rate is then .0001 (the probability of the change) times 1 (the probability of the result if the changed occurred--for simplicity I am assuming that if there was a change it raised the drop rate to 1). The probability of getting it without a change by random chance is .9999 (the probability that there was no change) x 1/256 (the probability of the result if there was no change). The second number is about forty times as large as the first, so the odds that the drop rate is still the same are about forty to one.

And I suspect, although I may be mistaken, that the odds that a student who spent his spare time playing World of Warcraft would find the explanation interesting and manage to follow it are higher than if I were making the same argument in the context of an imaginary series of coin tosses, as I usually do.

## 16 Comments:

I'm actually taking a statistics class now, and I had forgotten how that whole type I/II error worked. Now I think I will remember - thanks.

David wrote: "... The second number is about forty times as large as the first, so the odds that the drop rate is still the same are about forty to one."

Well, as I understand it, the odds that the drop rate is still the same *compared to changing to that particular changed rate* (1:1, or 1/2) is about forty to one, but there are many rates that it could have changed to. Unfortunately they have a low probability individually of being chosen by the programmers (1/10,000).

If confronted with the four drops in a row, I wouldn't be comparing a particular changed rate to the current one, but maybe for example a range, such as "those rates from 1/4 to 1/2 with some fixed increments between them").

I've never really understood Bayesian statistics, and I think this business of prior probabilities is a big reason for that. What's an objective basis for assigning prior probabilities? If we can just assign any subjective assumption we like, then, for example, Pascal's wager looks a lot better: He had a very high prior probability that Catholicism was true, and a very low prior probability that Judaism, Islam, Lutheranism, Calvinism, or Mithraism was true, so a Bayesian argument might support betting on the Catholic God . . . for Pascal . . . and not be vulnerable to the classic criticism that it provides equally strong proofs of the desirability of worshiping many incompatible gods. But I take Pascal's wager to be an intellectual sucker bet, and any methodology that seems to legitimize it strikes me as suspect. Have I gotten a completely wrong impression about Bayesian methodology in some way, or does it actually lead down this road?

I'd have to say that in the WOW example given, the prior probability argument is really screwed up. Simple reason: one thing you know has changed: you've gone on a quest for that particular item.

What are the chances that there was already something in the program that makes creatures more likely to drop something you're questing for? Probably pretty good.

What are the chances that there may be a source of change that you didn't consider when calculating prior probabilities? As the previous paragraph demonstrates, 1:1.

So what are the chances that your prior probability number is accurate? Must be 0:1.

So how useful is this sample calculation, really?

Chris

This comment has been removed by the author.

David, it seems to me that there is also a selection bias here in that you wouldn't have considered the possibility of a new drop rate if you hadn't gotten four items in a row.

The probabilities you gave are correct if you now go out and kill four monsters to test the drop rate.

If you only take notice of unusual events then, with probability 1, you will find yourself contemplating an unusual event.

Bryan Eastin

What are the chances that there was already something in the program that makes creatures more likely to drop something you're questing for? Probably pretty good.As a former WoW player, I don't remember a single instance of this ever happening (except for items that drop only when on quests). So, I think it is low enough in prior probability to ignore it.

David, it seems to me that there is also a selection bias here in that you wouldn't have considered the possibility of a new drop rate if you hadn't gotten four items in a row.A failure to consider this might lead someone to overestimate the prior. But it does not actually affect the Bayesian calculation so long as the prior is correct.

BTW, if the hypothesis "the drop rate is one in four" and the hypothesis "the drop rate has not changed" are not the same hypothesis.

Your analysis assumes the former hypothesis as the null hypothesis. The latter hypothesis might be more appropriate, though. Also, it might be better to compare it to the hypothesis "there is a new constant drop rate" rather than the hypothesis "the drop rate is now 1".

To test these, you would need a prior distribution over the constant drop rates (probably the same one would work for all). Then for the null hypothesis modify the distribution based on all data received (before and after the possible adjustment) and determine the probability of receiving all that data based on the modified distribution. For the change hypothesis, do that separately both before and after the supposed change, and combine with the penalty to the prior from the unlikeliness of Blizzard changing the drop rate.

... and then, I realize that the hypothesis you are interested in is probably not that the drop rate was changed, but that it was raised. Which means that the assumption that the drop rates before and after the change were independent, if it ever was a good one, is now bad. The most general (but not very helpful) approach would be to just have some distribution over the (before,after) drop rate pairs.

...but it won't change the outcome much to just use the "change" hypothesis instead of the "increase" hypothesis, because p(change)=p(increase)+p(decrease) and p(decrease) is low.

Quick pointer to outsiders: presumably the quest required to loot 4 items that are quest_only and after getting the 4th the quest completed, preventing any more from dropping. Perhaps, he was comparing drop rate on that quest with drop rate he used to get on a previous character doing the same quest.

Increasing drop rates is not that unusual. Some quests that are notoriously "out of line" with the rest in terms of drop rate get a bump to appeal to casual players. I can think of five separate quests in Lotro that went from <1/8 to >1/2 over a span of several major patches (several months). Quests that are "in line" are unlikely to be fixed. The bigger the deviation, the more likely the programmers are to step in (whether they are playing the game themselves or are tired of bug reports).

Just as a followup:

The quest was a daily, so I had done it lots of times before, giving me a pretty good estimate of the drop rate. I actually needed five drops, and the fifth try didn't yield one, which eliminates the hypothesis that the rate is now one in one--but I was describing the calculation as it would have been done before that.

After making the post, it occurred to me that there was another explanation that I should have considered. The daily quest is done in order to get reputation with a particular group. Perhaps when your reputation level goes up from honored to revered, the drop rate on that quest goes up too.

I'm not sure if my four out of four result was just after my reputation went up or not, since that possibility hadn't occurred to me at that point. I'll be watching drop rates for a while to see if they have indeed sharply increased for that quest.

William Stoddard asks about how you get your priors. In the post I suggested some possibilities. There isn't a general answer--the point is that without a prior you can't get a posterior probability from the experiment.

If there's no general answer on how to get a prior... and if you can't compute a posterior without a prior... then it seems you can never be sure you have a meaningful posterior, except perhaps in very limited cases.

(In the above example, someone who was not familiar with WOW might easily compute a prior several order of magnitude different from someone who was familiar with WOW.)

What does this say about the use of "statistically significant" in scientific research? So far, my takeaway message is that you have to know how likely the null hypothesis is before you can tell whether you've found significant evidence for a deviation from it.

Or in other words, "Extraordinary claims require extraordinary evidence."

Chris

"So far, my takeaway message is that you have to know how likely the null hypothesis is before you can tell whether you've found significant evidence for a deviation from it."

Sort of.

The significance level tells you how good the evidence is. But to reach a conclusion, you need to know both how strong the evidence is and how strong the evidence has to be to make you accept the conclusion.

The prior is something of a problem, and gets you deep into Knight's uncertainty versus risk. If you want to convince [sensible, mathematically literate] people of something, you need them to have "reasonable" priors -- ones that aren't too close to zero anywhere you need them not to be too close to zero -- and/or you need a lot of data to overwhelm low prior values.

With many precise relationships among uncertain variables, I often try to solve for the one I have the least confidence in. In this case, if, as a practical matter, what I want to know is whether the probability that it's still 1/4 is more or less than 60%, I can figure out that that requires a prior of around .997. In some situations, you'll find that this is unreasonably high or low, and it won't matter what its exact value is. (If you want a precise value, though, you're subject to a different set of biases when you operate this way.)

clarification of post#2:

If confronted with the four drops in a row, I wouldn't compare a particular changed rate to the current rate, but would compare maybe for example a range, such as "those rates from 1/4 to 1/2 with some fixed increments between them") to the current rate.

Wow! (non pun intended.)

First Off, Mr. Friedman, i have just discovered your blog by the glorious "unschooling" method you talk about: I was watching a lecture by Dr. Murray Rothbard at Mises.org (for fun!), and through a string of links found a wiki about you and your Anarcho-capitalist theories. What a blessing it is to see an intelligent professor who understands how MMOs are fantastic teaching tools!

I have always wanted to read a study that compares the economy of WoW to that of the United States. Do you know of any such studies? its amazing how "free Market" people tend to be in a simulation, compared to their actual political stance on economic policy in the "real world". I always wanted to see a study showing differing behavior in economic choices in WoW compared to real life.

I wonder how "scarcity" and money inflation play into effect in WoW (since the only scarce resource seems to be personal time, and labor. All gold, loot, mobs, etc. respawn faster than consumption.

Here is a great example of how an oppressive monetary policy fails without a monopolization of force:

I played Warhammer Online for a spell, and in the game guilds could collect a "tax" from its members. well the guild i joined had a 100% tax! after a few kills and loots, i asked why the guild bank (The central government) got all of the loot i worked for. They replied that they would distribute all the gold evenly, so i left!

Its amazing to see how coercive, non-voluntary taxes can only last with a monopolization on the legal use of force. The guild had no way of punishing me for seceding. Imagine if i didn't want to pay the government the tax they want. I would be coerced into complying, or suffer some consequence.

I know this was long, but i would like to thank you (and this community) for proving to me that MMO players can be logical intelligent people, and that education CAN be achieved outside the status quo.

I would be happy to hear any response or get links to economic research done comparing video games to real life.

I may be wrong, I sometimes am, but I don't see how the analysis works out:

P(four drops in a row) =

P(four drops in a row | change in software) * P(change in software)

+

P(four drops in a row | no change in software) * P(no change in software)

= (1)*(1/10,000) +

(1/256)(9999/10,000)

which is approximately 0.004

which is about 1/256.

Another way to look at the phenomenon is to ask the following probabilistic question:

If the monster is killed 100 times what is the probability that you never get 4 drops of the object in a row?

It reminds me of Ramsey theory. The more times you do the action the more likely it is that you'll get what looks apparently to be an unlikely string of consecutive occurrences -- although this is in fact not that unlikely.

Post a Comment

<< Home