Just for a change from politics and geekishness, an interesting puzzle:

We do ten experiments. A scientist observes the results, constructs a theory consistent with them, and uses it to predict the results of the next ten. We do them and the results fit his predictions. A second scientist now constructs a theory consistent with the results of all twenty experiments.

The two theories give different predictions for the next experiment. Which do we believe? Why?

In case the puzzle isn't obvious, let me offer the straightforward argument for what I believe is the wrong answer:

Imagine a large room filled with barrels, each of which contains a lot of boxes, each of which contains several pieces of paper; each piece of paper has a scientific theory written on it. The first ten experiments let us eliminate all but one barrel, the one containing the theories consistent with the first ten experiments. The first scientist has reached into that barrel, pulled out a box, opened it, pulled out a piece of paper, and offered its theory as his.

The second ten experiments narrow the possibilities down to one box in the barrel--the one containing theories consistent with the results of the second set of experiments. The second scientist, having the results of both sets of experiments, knows which box to go to; he opens it and pulls out a piece of paper. Both pieces of paper, both theories, were selected from the same box, the one that we know contains the correct theory, since the correct theory must be consistent with all the experiments. Having been pulled from the same box, both theories should have the same probability of being the right one.

What is wrong with this argument?

(Hint: I find the concept of "false contagion" in statistics useful in making sense of the puzzle)

We do ten experiments. A scientist observes the results, constructs a theory consistent with them, and uses it to predict the results of the next ten. We do them and the results fit his predictions. A second scientist now constructs a theory consistent with the results of all twenty experiments.

The two theories give different predictions for the next experiment. Which do we believe? Why?

In case the puzzle isn't obvious, let me offer the straightforward argument for what I believe is the wrong answer:

Imagine a large room filled with barrels, each of which contains a lot of boxes, each of which contains several pieces of paper; each piece of paper has a scientific theory written on it. The first ten experiments let us eliminate all but one barrel, the one containing the theories consistent with the first ten experiments. The first scientist has reached into that barrel, pulled out a box, opened it, pulled out a piece of paper, and offered its theory as his.

The second ten experiments narrow the possibilities down to one box in the barrel--the one containing theories consistent with the results of the second set of experiments. The second scientist, having the results of both sets of experiments, knows which box to go to; he opens it and pulls out a piece of paper. Both pieces of paper, both theories, were selected from the same box, the one that we know contains the correct theory, since the correct theory must be consistent with all the experiments. Having been pulled from the same box, both theories should have the same probability of being the right one.

What is wrong with this argument?

(Hint: I find the concept of "false contagion" in statistics useful in making sense of the puzzle)

## 39 comments:

obviously we believe the ten experiment theory since it has successfully predicted ten experiments. The twenty experiment theory has no successful predictions behind it.

I agree with the answer Russ gives, but he offers no justification. The question is

whywe prefer prediction to explanation. Don't both simply tell us "this theory is consistent with the evidence?"Yes, but you can always come up with an ad hoc theory (in fact, an infinite number of them) that explains the evidence you already have. You can only have confidence in a theory if it predicts well out of sample, with data that it hasn't been rigged to fit.

Consider this analogy: Alex shoots at the side of a barn, then walks over to it and paints a target centered on where the bullet hit. Beth paints her target first, then fires at it and hits it dead center. Both got a bulls-eye, but whom do we conclude is the better masrksman?

masrksman=marksman

Apply Bayes Theorem. Both predictions agreed with all the evidence as of their formulation.

Theory 1 correctly predicted 10 experiments, which might have disproved it, so that provides strength to the hypothesis that Theory 1 is correct.

Theory 2 hasn't done anything yet.

So the a posteriori probability of Theory 1 is higher than the a priori probability. For Theory 2, a priori is a posteriori.

Alex is the better marksman, he hit a target that didn't even exist yet :-)

A sample ad-hoc theory:

"My theory (which is mine, and I made it up) is that Experiment 1 will show Result A, Experiment 2 will show Result B, ... and Experiment 20 will show Result T. All other experiments will show Result U."

This can obviously be made to match the data observed to date, yet there is no reason whatsoever to expect the untested 21st clause to make correct predictions.

This is related in some way to Occam's Razor. If we had a

simple, eleganttheory that was consistent with the data observed to date, wewouldhave some faith in its predictive power -- not as much as for a theory that has already made correct predictions, but some, because Occam's Razor would exclude from the barrel grossly ad-hoc theories like the one above.This has a sinister resemblance to the Monty Hall paradox... which if the analogy were perfect, might suggest 'switching' to the newer theory would offer better results.

However, Monty with his inside knowledge can always open a door that fails to falsify the original choice, while presumably the second ten experiments had some chance of falsifying the first theory.

In practice I think the decision would depend on details that have been boiled out of the problem: if one theory were 'more compact' I would slightly prefer it until disproven.

If both are equally 'compact', I don't think any preference between the two is justified. If the second theory had been thought up after the first 10 experiments -- and it may have only been overlooked due to random and arbitrary factors -- it would have also had a good prediction history on the second 10 experiments.

In Texas, the sharpshooter who shoots first and paints the target later is the better marksman. :-)

I agree with the first theory, but for a completely different reason.

To use the barrel metaphor, the papers must have not scientific theories, but *possible* scientific theories. (If some papers can be eliminated, then some are not theories that stand up to real world evidence.)

Thus, I postulate that first scientist's theory, since it was created with half the experimental data, proposes a simpler model. And all else being equal, the simpler model seems more likely to be correct.

(Disclaimer: I'm not a scientist, economist, *or* lawyer.)

We believe (tentatively) the one that gives the right answer for the next series of experiments. Obviously.

Before the next series of experiments, we will probably choose the one which we like better, for whatever reason.

Note that the first theory could well be Ptolemaic epicycles, and the second Newtonian mechanics - which would you choose?

Given only the information you've presented, we prefer the first theory.

We presume that the space of all theories is large, the space of theories that will predict the first 10 results is smaller, and the space of theories that will predict the first and subsequent 10 results smaller still. We have evidence that the first scientist had some insight into the problem - the theory she selected from the first 10 results was close enough to true to get the next 10 results right. We have no reason to expect that she had *exactly* enough insight into the problem to get only the second 10 results right.

See Einstein's Arrogance.

The second scientist has not demonstrated any more discrimination over the search space than a theory that would predict only the available evidence.

It seems unlikely that there wouldn't be other, more powerful tests than this that would allow us to choose a preferred theory, such as Occam's Razor.

M@ offers essentially my answer. The reason we prefer prediction is that it provides evidence about the person who created the theory. We have some reason to believe that the person who predicted has some sort of insight, some ability to see real world patterns.

For those not familiar with false contagion ...

You have an urn with both black and white balls. You draw a ball--it's black. What does that tell you about the probability that the next ball will be black?

The obvious answer is nothing--the draws are independent. That's true if you start out knowing what percentage of the balls are black. But if you start with some probability distribution over the percentage, then the black ball causes you to revise that distribution in the direction of a larger fraction black, which increases the probability that the next ball will be black.

Similarly here. If we thought the scientists were simply picking theories at random,then my wrong argument would be right and explanation would be as good as prediction. But if scientists vary in their ability to create correct theories, then we have reason to think the scientist who predicted has more such ability than the one who didn't, making his next prediction more likely to be correct.

David, so your answer depends on two scientists being involved?

Suppose a single scientist wants to search for patterns in a data set. Two options:

1) Divide the data in two. Look for patterns in the first half. When you find one, test against the second half; reject theory if it doesn't fit equally well with the second half.

2) Look for patterns in the entire data set.

Which is better?

It's late and I'm sleepy, this may sound silly in the morning but...

There's a set of theories with a probability prior on each. The first scientist finds the most likely theory given experiments 1-10, and it is also happens to be the most likely given 10-20 (confirmation of prediction)

The second scientist finds the most likely given 1-20, but it may not be the most likely given 1-10 as well.

There is more evidence in the first case that in the second case.

This is a question of science, not some game show statistics puzzle. Each scientist is in possession of a theory that correctly explains twenty experiments. All else being equal there is no reason to prefer one theory over the other until an experiment to distinguish them is performed.

I cannot agree with the conclusion that the first theory is perferable.

I suppose in most nontrivial cases, not all consistent hypotheses that explain the experiments are equal; some are more likely than others. A competent scientist would be able to use the additional information present in the second batch of experiments to choose a more likely hypothesis.

If the scientists in question were replaced with average people, perhaps I could agree. To me, a scientist is a person with some basic competence in the scientific method.

The problem here is that no-one has defined what 'consistent' means. Does it mean P(data | model) is sufficiently high? If we marginalise any uncertainty in the parameters of each model then we can simply calculate P(x1,x2 | model) for each theory, either directly in the case of M2 or via P(x1,x2|M1) = P(x2|x1,M1)P(x1|M1). Whichever is greater wins, arguments about model complexity are redundant if the complexity is properly integrated out.

We have some reason to believe that the person who predicted has some sort of insight, some ability to see real world patterns.Suppose you're now told that the second scientist is the same person as the first scientist. Do you change your preference? It seems like you would, because now it is the scientist who demonstrated ability that came up with the second theory.

Assuming you did change your preference, now suppose you

arethat scientist. Are you certain you would prefer your second theory to your first one? Maybe you would think "Hmm, sure, I had more data to work with the second time, and the hindsight of my work on the first theory. As a result, my second theory seems simpler/more robust/more fundamental, because otherwise I would have just stayed with the first one. I wouldlikethe next experiment to confirm my second theory, not my first one. But do I really put more trust in the second one? No; it hasn't so far withstood any test at all."There is not enough information to answer the question. To answer the question, I would want this additional information:

How was the system being experimented on chosen? For example, if it was chosen by the second scientist so he could win the contest, then of course we should believe the second scientist's prediction, which might imply believing his theory.

Do we know the physical system they are experimenting with, or is it a black box with an unknown process occurring on the inside?

Am I allowed to use information about the two theories to decide which prediction to believe?

Case 1: The people involved are playing games. Maybe the person picking the system is trying to get a specific outcome, or maybe the scientists are not doing their best to make correct predictions. In this case, base my actions on game-playing with the people involved, and ignore science, the "scientists", and the theories.

For the remaining cases, I'll assume that it's a randomly chosen system and that the scientists are trying to make correct predictions.

Case 2: I understand the situation better than the scientists. Ignore the scientists and believe my best theory consistent with the observations.

For the remaining cases, I'll assume that I'm not up to that and it's better for me to believe the theory from one of the scientists. I'll also assume I have no other information about the competence of the scientists.

Case 3: I don't get to look at the theories before picking one. Go with the first scientist. I can model a scientist as either producing a theory that makes useful predictions with probability p, or a garbage theory consisting of a list of the past observations + random picks about the future with probability 1-p. The 10 correct predictions increases the a-posteriori probability that scientist #1 made a theory that makes useful predictions. (So the requirement that you predict not-yet-known data in science is required when you can't evaluate the quality of a theory and you have to guess how the scientists work. This is from Tyrell McAllister's post to this question reposted to Overcoming Bias.)

Case 4: It's a known naturally occuring physical system, and I can look at the theories. Choose the theory T such that the explanation of (everything else I have observed in life + theory T being true for the experimental apparatus) is simplest. In practice, this probably means to pick the theory that makes the most sense, given what else I know. (This is the real answer, given the required assumptions to make the question interesting. I'm sorry it's ugly.)

Case 5: It's a black box, with a good isolation between the rest of my life and the contents of the box, and I can look at the theories. Assume that each theory is the simplest theory that makes its predictions. That is, the simplest explanation of (T is true) is T. Pick the theory that seems simpler. This doesn't quite specify what I pick because it doesn't say what language I use to represent theories and how I measure complexity of statements in that language. I hope that in practice most reasonable specification languages will give similar results. (This case shows what additional assumptions you have to make to get the classic result that simple explanations make better predictions.)

Interesting question. Thanks for asking.

The second hypothesis set could trivially fit the data by repeating it and making random predictions for the future.

Random predictions are unlikely to be correct. The first set has made correct predictions -- this weakens the potential that it fit by chance.

The potential of the first set being trivially right more excludable than the potential of the second being trivial. The first is better.

A scientist observes a normally distributed variable A with standard deviation 10 and mean either -1 or 1. There are two theories, the mean is -1 or the mean is 1. A priori they are equally likely.

The first scientist makes 10 experiment and draws a conclusion. He'll be right 62.41% of time. He makes 10 more experiment and his theory is confirmed. It is now 85.87% likely that he is right.

The second scientist makes 20 experiments and draws a conclusion. There's only 67.26% that he is right.

That is assuming that we don't know if the scientist are looking at the same experiments.

Now assume they are looking at the same experiment. Since experiments 10-20 confirmed the theory, it means the theory is still the most likely theory among all possible theory. The second scientist should then pick the same theory to describe the data set. In this case there is no difference.

maysonicwrites: For me, I just used the number of experiments as a barometer for the simplicity of the proposed theory.

If you started explaining Ptolemaic epicycles to me, I'd probably say "holy crap that's complicated!", and look for something -- anything -- simpler. :-)

Let's fast forward this scenario to the present information age. Suppose we're building a system (let's say a website) which has to respond automatically to different user behavior. In our initial small-scale study, we gather one thousand data points and construct a model of user behavior that is consistent with all the data.

Suppose we then launch our system and subsequently gather one million data points. Our existing model is consistent with 75% of the new data.

In our space of models, we can find one that is consistent with 80% of the new data. Which would you prefer? At what point does matching the collected data coincide with predictive power?

You evaluate theories based on your priors about the domain, the fit with experimental evidence, and metarules like Occam's Razor.

The origin of the theories isn't a relevant factor.

Since your puzzle explicitly didn't provide the theories themselves, there is no correct answer. One theory or the other might be better, depending on what the theories actually are.

At best, you can use their origin stories to indirectly evaluate the likelihood that, if you actually had been able to evaluate the theories directly, you'd prefer A over B (or vis versa). (For example, theory A only using 10 points, and correctly predicting the next 10, suggests that it isn't overfitting the data.) But that's only very weak evidence.

(Also: your sample explanation is misleading because there are an infinite number of theories consistent with any set of data, not merely a small box with a few slips of paper.)

The answer cannot be determined without further assumptions. To give a principled answer, we need a prior distribution on how likely we think various theories are to be correct, and we need information about the scientists and how they arrive at theories.

For example, suppose the experiment consists of flipping a coin. Based on previous research with such coins, we are sure the probability of heads is either 99% or 99.9999%, but we consider it 50% likely to be either one.

Suppose the first ten results are HTHHHHHHHH, and the first scientist predicts that it is the 99.9999% heads coin. The next ten results are all heads, which is in agreement with this theory. Then the second scientist claims it is far more likely to be the 99% heads coin. Of course this second scientist is more likely to be right.

You might object that this is just because the first scientist made a mistake, but that's the point. Of course the second scientist is saying the first made a mistake, and that in fact given his/her prior another theory is more likely. In order to evaluate such a claim, we need to know more about the priori distribution and where it came from. This cannot be answered in the abstract.

The fallacy is in the idea that we should 'believe' either theory. Because this is science, and as long as both theories explain the facts, they are equally valid.

If you want to dismiss a theory, do another experiment first and see which one correctly predicted the result. Until you do, they're both good.

Shadowfirebird said

"as long as both theories explain the facts, they are equally valid."That's not correct. Consistency with observations is merely the first, very weak, step towards developing a good predictive theory. You're underestimating how many (infinite!) theories are consistent with any data set.

The hypothesis space of possible theories has priors on it (e.g. Occam's Razor). These are important.

(Real world example: heliocentric solar system was not better at astronomical predictions than epicycles. It was just much, much simpler.)

There are an almost(?) infinite number of hypotheses, sure, but not all of them are equal.

I admit that given the limited information we had, I assumed that both theories were plausible, rational, and equal. For example, if one theory was a superset of the other, then presumably occams razor would tell us something about which theory to trust.

(Are you saying that there is no experiment that can disprove an epicyclic solar system? I confess I find that hard to believe...)

Why is everyone getting philosophical? I think this is an economics question.

But first, some philosophy:

Neither theory has been falsified, so, a priori, there's no reason for preferring one over the other.

It seems, intuitively, that the predictive success of the first theory lends it more credence, but that's illusory. Just as the second theory could make completely random predictions, so could the first. Maybe it just got lucky with the first 10, or maybe it makes random predictions after the first 20 results. There's no way of knowing, so the set of possible predictions of theory 1 is the same as the set of possible predictions of theory 2.

Therefore, they are a priori equal.

Again, it seems that theory 1 is preferable, because if it were a lesser theory it would have had a good chance of being weeded out by the experiments. However, this also applies to theory 2- if it were a lesser theory, it wouldn't be compatible with all twenty bits of evidence. In fact, theory 1 could be completely random, while theory 2 is based on a pattern in the first ten results that repeats in the second ten. There's no way of knowing.

Let's say there are two possible results of the experiment. Therefore only 1/1024 hypotheses will correctly predict results 11-20 (2^10 = 1024). So it seems that theory 1 has a better chance than theory 2, because 1023 bad theories have been weeded out. But why should only predictions be subject to this? Theory 2 is consistent with 20 results, making it the one out of 2^20 who does. (And theory 1 is also one out of 2^20 -- in fact, the same one as theory 2, because they're the same theory as far as the first 20 experiments are concerned.)

I don't see how your false contagion example, David, would affect this. Both say the same thing about the first 20 experiments, and both could predict anything after these 20.

Now, to answer the question:

To find the preferable answer, you offer a $100 award to each of the scientists if the theory they choose correctly predicts the results. (Each scientist, of course, has the option of choosing the alternate theory.) (You can increase or decrease the reward depending on how much you care.)

If both scientists choose the same theory, that's the one you believe.

If they don't, you're back to square one- there's no reason to prefer one over the other.

See -- no philosophy required. : )

Shadowfirebird wrote:

"I assumed that both theories were plausible, rational, and equal."That would sure make a huge difference. Yet the puzzle, as stated, said nothing about this. What if the puzzle said: "both scientists use exactly the same search process through the hypothesis space, only one had only 10 data points while the other had 20"? These sorts of things would tell us something about the theories themselves. Which, in the end, is all that matters.

But all the puzzle says is that the theories are "consistent" with the 20 points. That describes too big a space to say much of anything reasonable about which theory is better.

Shadowfirebird wrote:

"(Are you saying that there is no experiment that can disprove an epicyclic solar system? I confess I find that hard to believe...)"I meant only to note that, at the time Copernicus proposed his theory, it did not yield substantially better predictions than the existing (geocentric) one. The initial argument was

not: you should adopt this new theory, in order to better match astronomical predictions. (It's a different story in the modern era. We have much more data now than Copernicus had to work with.)Î£ÎºÎµÏ€Ï„Î¹ÎºÏŒÏ‚ said:

"But first, some philosophy: Neither theory has been falsified, so, a priori, there's no reason for preferring one over the other."Alas, not at all true. Lack of falsification (aka consistency with data) is only the very first step in evaluating the quality of a theory. In fact, there

arepriors on the space of possible theories (e.g. Occam's Razor). So there areplentyof reasons for preferring one theory over another -- even if they make identical predictions.But you have to see the actual theories, in order to judge them correctly.

Was I too succinct in my justification? A theory is supposed to predict. A theory that has already predicted is obviously better than one with no evidence of success.

And that's why we still use the Ptolemaic system in astronomy, right, Russ?

Forget that Galileo guy. He really thinks his theory is better just because it fits the facts, when the Ptolemaic system has such a long history of correct predictions? Psh- what a loser.

I'd like to draw an analogy using mathematics.

Let's suppose I look at a trend in the data, some function y(x), and recognize that it resembles an arctan function. I perform a 4 parameter curve fit, scaling and shifting the arctan curve in both x and y directions. Since the arctan function fits the data so well, I have some amount of confidence that it's the correct theory.

Now the same curve fits the next 10 data points, which reinforces my original hypothesis. My arctan assumption was no fluke.

Now a second scientist may come along and knowing that any set of points can be fit with a polynomial, naively fits a 20-term polynomial through the data. It's obvious that such a polynomial is likely to be poorly behaved outside of the data set from which it was derived and has virtually no chance of being a good predictive model.

If we know nothing about the two functions other than that the first one successfully predicted 10 results, we must have more confidence in that model, because there are infinitely many solutions that can be derived post-hoc with no capacity whatsoever to predict a future result. Granted, there could be some merit to that second theory, but we have nothing to support it.

Perhaps there can be a set of 'elegance' criteria that can be applied to a theory, such as how few parameters need to be invented to make the model fit. But given the question as posed, you can't make any such judgment.

This is not just another instance of the Monty Hall problem?

(Since we dont know anything about the scientist or the solution, the safest bet is to treat the selection process as random, etc..)

I agree with the guy using a Greek-character 'nym: the two theories are a priori equal at the present moment, and there is no reason to consider preferring one over the other until an experimental result occurs that fits one theory better than the other.

If that fact bothered me, then my next step would be to try to devise an experiment that can be expected to produce different results if theory A is correct than if theory B is correct.

If I can't come up with any such experiment, then the two theories may, in fact, be synonymous. (If someone can prove that no such experiment is ever possible, then they ARE synonymous.)

[Sorry, I wasn't quite finished.]

Doug D.'s "who is the better marksman" misses the point. The fact that theory A was shared with others before the second set of 10 experiments were conducted, does not mean it is a better theory than B. It merely means that its author has gone through more of the peer-review process than has the author of B.

The peer-review process does not constitute validity or a source of validity; it is merely a rhetorical exercise (an act of preparing for future argumentation, if you will) intended to demonstrate to others (if they're willing to accept the fallacy of argument from authority) that a theory is valid.

None of this is intended to cast doubt on the worth of the peer-review process as part of a group process of discovering and testing scientific fact. But the person who "reasons" as Doug does is confusing the attempt to demonstrate validity with validity itself.

i have a serious reality conundrum that has to do with history and energy. Do you seriously know why the steam engine has become obsolete? My take is that it is because it is an "external combustion heat recycling engine", istead of an "internal combustion heat discarding engine". Think about it. We never see updated working examples of steam engines. The two kinds of engines are similar enough so that they could actually toggle back and forth within one engine cycle. Is the myth of the enhanced efficiency heat recycling engine true or not, and what would it mean if it were true.

It's a really big question and would change the world quite a big deal. That's wierd..

Post a Comment