Ideas: King Tut, Statistics, and a Pet Peeve

Tuesday, August 31, 2010

King Tut, Statistics, and a Pet Peeve

I recently came across a National Geographic article describing the results from DNA analysis of a number of mummies, including King Tut. It was an interesting article, but there was one thing in it that annoyed me. In describing their results, the author said that the DNA analysis showed that there was a 99.9% probability of a particular relationship between two of the mummies.

That statement was false. I know it was false because that sort of analysis cannot produce that sort of result. Like many other people who use statistics without understanding it, the author was confusing the information the statistical analysis produced with the information he wanted it to produce.

A confidence result in classical statistics tells you how likely it is that you would get the result you got if the assumption you were testing was false—more precisely, if a particular alternative assumption, called the null hypothesis, was true. His 99.9% means that if the null hypothesis (presumably that the two mummies were not closely related—the article doesn't say) was true there is no more than a .1% chance that the genetic evidence that they were related would be as good as it is.

Unfortunately for the author of that article and many others, the probability of getting their result if their assumption is false is not the same thing as the probability that their assumption is false, given that they got their result. The latter is what they want, and what the assertion of a 99.9% probability for the relationship claimed. But it wasn't what they got.

To see the difference, consider a much simpler experiment. I pull a coin out of my pocket without looking at it. My theory is that it is a two headed coin; the null hypothesis is that it is a fair coin.

I flip it twice and it comes up heads both times. If it is a fair coin, the probability of that outcome is only 25%. If it is a two headed coin, it's 100%. If the probability of the result given the null assumption was the same thing as the probability of the null assumption given the result, that would mean that the odds were now three to one—75% probability—that the coin was double headed. I don't think so.

Readers interested in what it takes to actually generate a probability estimate for an assumption being true are invited to read up on Bayesian probability.

After writing this post, I discovered that I had mentioned the same point some time back in a different context.

34 comments:

jimbino said...: David, what do you think of the fashionable statement that, in your town, X% of the traffic accidents were "alcohol related"?

And if that were shown to be true, what does it mean that 100% of the accidents were "water related"?; 3:24 PM, August 31, 2010
DR said...: The inferential estimate is trivially equivalent to the Bayesian if your prior on the null hypothesis is 50%, which is not a bad estimate when you don't have much a prior to begin with. Even if your prior on the null is 99% then a 4-sigma inferential event will still be 2.3 sigma Bayesian event.

Yes Bayesian estimates are theoretically how things should be done in statistics, but the Internet rationality community throws around the word way too much. In practical statistics Bayesian estimates tend to be computationally intractable as well as overly noisy due to sensitivity to prior assumptions.

For most cases unless one has strong prior non-Bayesian methods will work sufficiently well. And even so there are often corrections that are a lot more practical in real statistics.; 4:46 PM, August 31, 2010
Darf Ferrara said...: Why couldn't the author be giving a bayesian point estimation?; 5:42 PM, August 31, 2010
Henry Troup said...: I recall someone doing the equivalent calculation for the
usual sort of forensic DNA and showing that if the defense
sets up the correct question, what a "bare" DNA match gives
is about a 1 in 3 probability, not the much different number
likely to be quoted by the prosecution.; 7:00 PM, August 31, 2010
RKN said...: Do you have a link to the paper?

My hunch is that the significance cutoff of their DNA similarity measure was one in a thousand (p<0.001 = 0.1/100). When the observed measure of the mummy pair turned up less than p, they concluded, "Look! - there's less than a one in a thousand chance these two could be unrelated."

Technically speaking, that's not statistically equivalent to the statement, "The mummies are probably (99.9%) related", but I think I understand what they meant. Assuming my hypothesis correct ;-); 8:35 PM, August 31, 2010
SheetWise said...: Jimbino,

Do we know what percentage of drivers, at any one time, are "alcohol impaired" by legal standards? Also, I believe most states consider an accident alcohol related even if the impaired driver was the victim, and not legally at fault -- which skews the numbers a bit ...

On to your question. It means that water is a gateway drug, and responsible for (100-X)% of traffic accidents.; 4:13 AM, September 01, 2010
jimbino said...: Right StreetWise:

If a sober driver kills a drunk sleeping on a park bench it's an "alcohol related" accident.

99% of lies are promulgated by our government; the moon is 1/4 the size of the earth and further away; once every minute an American woman has an abortion -- we should find her and stop her!; 6:49 AM, September 01, 2010
David Friedman said...: "When the observed measure of the mummy pair turned up less than p, they concluded, "Look! - there's less than a one in a thousand chance these two could be unrelated.""

That is not what a significance result of less than .001 means, for the reason I explained in my post. It means that if they were unrelated, there is less than a .001 chance that the evidence they were related would be as good as it is. That's an entirely different thing from the chance that they are unrelated being less than .001. Consider my two headed coin example.; 8:13 PM, September 01, 2010
RKN said...: That is not what a significance result of less than .001 means, for the reason I explained in my post. It means that if they were unrelated, there is less than a .001 chance that the evidence they were related would be as good as it is.

That's right, which would have allowed them to reject the null hypothesis that the two are unrelated. This is not equivalent to the probability that the null is false, or, as I went on to say in my reply, the probability that an alternative hypothesis is true, as the author evidently claimed. Statements of significance in this case are always about the data, not, per se, the hypothesis.

However, rejection of the null is so often used to indicate the likelihood of plausibility of the alternative post-experimental hypothesis, that I was willing to cut the authors some slack, with the caveat that I didn't read the paper.; 12:26 AM, September 02, 2010
RKN said...: That statement was false. I know it was false because that sort of analysis cannot produce that sort of result. Like many other people who use statistics without understanding it, the author was confusing the information the statistical analysis produced with the information he wanted it to produce.

You may want to reconsider what you "think" you know is false and retract your accusation of the author.

Here's the link to the NatGeo article I think you referred to:

http://ngm.nationalgeographic.com/2010/09/tut-dna/hawass-text/1

Like you I was puzzled how a mistaken statistical claim could have passed peer review. A little googling turned up the research published in JAMA that was the basis for the "pop" science in NatGeo:

http://jama.ama-assn.org/cgi/content/full/303/7/638?home

Note the name of the first author on the paper is the same name of the author of the NatGeo article.

A reading of the research Methods prompted more googling. Turns out the "sort" of statistical analysis you assumed they used was not what they used.

The claim of 99.99% relatedness (page 5 of NatGeo link) is based on the "combined paternity index" achieved by measuring base pair identities at multiple (16 in the paper) genetic loci between sample and samples obtained from mother and alleged father. The measurement at each loci is an independent event. The reported value of "99.99%" is the convention used for reporting a "non-exclusion" event. The actual percent likelihood of relatedness is higher yet, and can be found in the JAMA paper.

They did not misinterpret p-values in classical statistics and make a false claim. Evidently, the used well-established standards in DNA paternity testing for calculating and reporting their values, an overview of which can be found here:

http://www.orchidcellmark.ca/site/info/interpreting-results.html; 5:11 PM, September 04, 2010
David Friedman said...: Some comments are getting to me but not appearing here, including a long one from RKN defending the article; I don't know why. In response to that:

I've looked at the webbed JAMA piece

http://jama.ama-assn.org/cgi/content/full/303/7/638

and things it links to, and so far as I can tell my initial reaction was correct. One of the links is to a piece from Orchid Cellmark on interpreting paternity results. It contains:

"The prior probability in a paternity test is 0.50 which is 50%. That means that, without testing any of the parties, there is a 50% chance that any untested man is the father and a 50% chance that he is not the father."

That may describe their analysis, but it does not describe the real world. The piece also has:

"Please note that all DNA paternity reports issued by Orchid Cellmark assume that any alternative alleged fathers are unrelated to the tested man."

Taken literally, that's impossible,since all humans are related. It's particularly inappropriate when analyzing the DNA of a bunch of egyptian mummies, very likely royal, since they are quite likely to be related to each other.

I also looked at the eSupplement,and again found nothing to contradict my conclusion that the research they did could not have produced a probability that A was the father of B.; 5:16 PM, September 04, 2010
RKN said...: "The prior probability in a paternity test is 0.50 which is 50%. That means that, without testing any of the parties, there is a 50% chance that any untested man is the father and a 50% chance that he is not the father."

That may describe their analysis, but it does not describe the real world.

That wasn't the basis of your original accusation. It was that the author didn't understand statistical analysis and was confused by what the results meant.

Taken literally, that's impossible,since all humans are related.

Of course, but what does that have to do with a genetic test for paternity, or the related conclusion in the paper you pointed to and said was false because, according to you, the authors didn't understand statistics?

It's particularly inappropriate when analyzing the DNA of a bunch of egyptian mummies, very likely royal, since they are quite likely to be related to each other.

Yes, they are related, and some are fathers of others, which is one question the research addressed. Which still leaves you to explain why the "combined paternity index" score and the subsequent percent chance of relatedness, widely used in paternity determinations, is incorrect, and why anyone who utilizes it doesn't understand statistics.

I also looked at the eSupplement,and again found nothing to contradict my conclusion that the research they did could not have produced a probability that A was the father of B.

A better question would be did you find anything to support your accusation? I didn't.

It rather seems to me you didn't and still don't understand the basis for a DNA paternity test, the relevant molecular genetics, or the specifics of how the statistical analysis is conducted.

You could e-mail the author or even the journal if you actually thought you were correct, and cause the paper to be retracted, since one of the principle findings of the research is the very thing you're saying is false.

Incidentally, I wonder why didn't post my entire reply, which I e-mailed to you? For inlookers with nothing better to do, you can find my reply in it's entirety here:

http://tinyurl.com/3ao2fhf

..; 7:38 PM, September 04, 2010
David Friedman said...: RKN sent another post, which also didn't appear; I still don't know why. It's possible that blogger treats all comments containing links as spam. Here it is,minus the link at the end--I'll post that separately and see what happens.

---

"The prior probability in a paternity test is 0.50 which is 50%. That means that, without testing any of the parties, there is a 50% chance that any untested man is the father and a 50% chance that he is not the father."

That may describe their analysis, but it does not describe the real world.

That wasn't the basis of your original accusation. It was that the author didn't understand statistical analysis and was confused by what the results meant.

Taken literally, that's impossible,since all humans are related.

Of course, but what does that have to do with a genetic test for paternity, or the related conclusion in the paper you pointed to and said was false because, according to you, the authors didn't understand statistics?

It's particularly inappropriate when analyzing the DNA of a bunch of egyptian mummies, very likely royal, since they are quite likely to be related to each other.

Yes, they are related, and some are fathers of others, which is one question the research addressed. Which still leaves you to explain why the "combined paternity index" score and the subsequent percent chance of relatedness, widely used in paternity determinations, is incorrect, and why anyone who utilizes it doesn't understand statistics.

I also looked at the eSupplement,and again found nothing to contradict my conclusion that the research they did could not have produced a probability that A was the father of B.

A better question would be did you find anything to support your accusation? I didn't.

It rather seems to me you didn't and still don't understand the basis for a DNA paternity test, the relevant molecular genetics, or the specifics of how the statistical analysis is conducted.

You could e-mail the author or even the journal if you actually thought you were correct, and cause the paper to be retracted, since one of the principle findings of the research is the very thing you're saying is false.

Incidentally, I wonder why didn't post my entire reply, which I e-mailed to you? For inlookers with nothing better to do, you can find my reply in it's entirety here:; 8:30 PM, September 04, 2010
David Friedman said...: I tried to post RKN's latest comment, but got an error message:

"URI too large"

Here is the first half of it:

"The prior probability in a paternity test is 0.50 which is 50%. That means that, without testing any of the parties, there is a 50% chance that any untested man is the father and a 50% chance that he is not the father."

That may describe their analysis, but it does not describe the real world.

That wasn't the basis of your original accusation. It was that the author didn't understand statistical analysis and was confused by what the results meant.

Taken literally, that's impossible,since all humans are related.

Of course, but what does that have to do with a genetic test for paternity, or the related conclusion in the paper you pointed to and said was false because, according to you, the authors didn't understand statistics?

It's particularly inappropriate when analyzing the DNA of a bunch of egyptian mummies, very likely royal, since they are quite likely to be related to each other.; 8:31 PM, September 04, 2010
David Friedman said...: And the second half:

Yes, they are related, and some are fathers of others, which is one question the research addressed. Which still leaves you to explain why the "combined paternity index" score and the subsequent percent chance of relatedness, widely used in paternity determinations, is incorrect, and why anyone who utilizes it doesn't understand statistics.

I also looked at the eSupplement,and again found nothing to contradict my conclusion that the research they did could not have produced a probability that A was the father of B.

A better question would be did you find anything to support your accusation? I didn't.

It rather seems to me you didn't and still don't understand the basis for a DNA paternity test, the relevant molecular genetics, or the specifics of how the statistical analysis is conducted.

You could e-mail the author or even the journal if you actually thought you were correct, and cause the paper to be retracted, since one of the principle findings of the research is the very thing you're saying is false.

Incidentally, I wonder why didn't post my entire reply, which I e-mailed to you? For inlookers with nothing better to do, you can find my reply in it's entirety here:

http://tinyurl.com/3ao2fhf; 8:32 PM, September 04, 2010
David Friedman said...: To RKN:

If you believe they have correctly calculated the probability that the two people are related, perhaps you could explain how--by what series of steps? That should be a good deal simpler than my explaining why they didn't.

You ask what the fact that all humans are related has to do with the calculation. It's relevant to the probability that the DNA evidence will look as good as it does if the two people are not (say) father and son. That probability will depend on the relationships among the population from which the people are being drawn--it can't be determined a priori. The higher it is, the weaker the evidence for paternity provided by the test.

I think I already explained why anyone who reports the results as a probability that two people have a particular relationship is misunderstanding the meaning of the statistical result.; 8:36 PM, September 04, 2010
RKN said...: If you believe they have correctly calculated the probability that the two people are related, perhaps you could explain how--by what series of steps? That should be a good deal simpler than my explaining why they didn't.

The series of steps used to establish paternity can be found in the Methods section of the JAMA paper under "Kinship Analysis".

Incidentally, I find it curious that you place the onus on me to explicate the details of their analytical method when you have already said "they're statement is false and their analysis can't produce that sort of result."

Should I interpret this latest as a tacit admission that you may have been wrong?

You ask what the fact that all humans are related has to do with the calculation. It's relevant to the probability that the DNA evidence will look as good as it does if the two people are not (say) father and son. That probability will depend on the relationships among the population from which the people are being drawn--it can't be determined a priori. The higher it is, the weaker the evidence for paternity provided by the test.

I think you misunderstand the experiment that was performed. They wanted to verify the paternity relationship, if any, of the individual in KV55. You don't have to compare the observed values measured at each genetic loci of the sample to those measured in a large, random population of males in order to do that. The y-chromosome markers (micro-satellite repeats) are sufficiently unique to establish paternity to the confidence level reported. At least that's my understanding. I sent an e-mail to the corresponding author asking for clarification on this.; 11:28 PM, September 04, 2010
Jonathan said...: I've looked into this matter of comments not appearing, because it happened to me too. Blogger has recently implemented a new spam-blocking system (see Blogger Buzz), which like all such systems is fallible. However, you can check and correct it manually.

When you log into Blogger, click Comments, and you will see a list of published comments. If you look carefully, you will see a sub-tab called Spam. If you click that, you see a list of comments not published under suspicion of being spam. You can choose to publish or delete each suspect comment. I hope this helps.; 1:53 AM, September 05, 2010
David Friedman said...: Following Jonathan's very helpful comment, I'm marking three posts by RKN and "not spam." Hopefully they will then appear.; 9:46 AM, September 05, 2010
David Friedman said...: I've also rescued one early post by DR. With regard to which ...

Note that in my coin flip example, the prior on the null is much higher than 99%, since many fewer than one coin in a hundred is double headed. The same would be true if you picked two people at random and tested to see if one was the father of the other.

So, in the Egyptian case, the argument depends on how likely the relationship was before the DNA testing was done—which isn't something established by the research in question.; 9:52 AM, September 05, 2010
RKN said...: In e-mail to me David asked:

How do you believe they did the calculation?"

They may have used a Bayesian formulation which would allow them to assign a probability of correctness to the hypothesis, or they may have used a frequentist approach which, along with the combined paternity index, can be converted to a similar statement of probability. DNA paternity
testing is not my area of expertise, but what I gathered from googling around is that both methods are used. I have an e-mail in to the author asking for clarification.

The point of my recent comments is that you made a rather supercilious statement referring to a popular science account of research, based on what appears to me to have been an assumption, one which so far as I can tell from reading the actual scientific paper and related material, was very likely wrong.

I find it a little difficult to believe that the statistical methods underlying DNA paternity testing, a method that has enjoyed wide application in a number of forensic areas including the paper we're discussing, are fundamentally unsound, or that the usual interpretations of the
calculations are false.

Asking how they did the calculation is a question you might have asked before posting to your blog that it had to be wrong, and that the authors don't understand statistics.

Incidentally, a 50/50 prior seems to me to be very conservative, but without understanding the foundations of DNA paternity testing I'm less willing than you to conclude it's "bizarre".; 10:32 AM, September 05, 2010
RKN said...: The same would be true if you picked two people at random and tested to see if one was the father of the other.

Except the relationship of the unknown mummy in KV55 was very likely not random with respect to the others. In other words, their hypothesis is not analogous to a coin flip.

So, in the Egyptian case, the argument depends on how likely the relationship was before the DNA testing was done—which isn't something established by the research in question.

If they used a Bayesian approach to calculate the probability of paternity, and adhered to what appears to be a conventional prior in this type of test, it would have been 50/50, which, as DR agreed, is not a bad estimate, and one I suspect, being it's so conservative, would only improve confidence in the result.; 10:50 AM, September 05, 2010
RKN said...: In e-mail to me David asked:

How do you believe they did the calculation?

They may have used a Bayesian formulation which would allow them to assign a probability of correctness to the hypothesis, or they may have used a frequentist approach which, along with the combined paternity index, can be converted to a similar statement of probability. DNA paternity testing is not my area of expertise, but I gathered from googling that both methods are used. I have an e-mail in to the corresponding author asking for clarification.

The point of my recent comments is that you made a rather supercilious statement referring to a popular science account of research, based on what appears to me to have been an assumption, one which so far as I can tell from reading the actual scientific paper and related material, was very likely wrong.

I find it a little difficult to believe that the statistical methods underlying DNA paternity testing, a method that has enjoyed wide application in a number of forensic areas including the paper under discussion, are fundamentally flawed, or that the usual interpretations of the calculations are false.

Asking how the calculations were done is a question you might have asked before posting to your blog that they had to be wrong, and the authors don't understand statistics.; 11:17 AM, September 05, 2010
David Friedman said...: I wrote:

"How do you believe they did the calculation?"

RKN replied (his statements are in italics, my response not):

They may have used a Bayesian formulation which would allow them to assign a probability of correctness to the hypothesis, or they may have used a frequentist approach which, along with the combined paternity index, can be converted to a similar statement of probability. DNA paternity testing is not my area of expertise, but what I gathered from googling around is that both methods are used. I have an e-mail in to the author asking for clarification.

Both of those approaches require a prior probability, which their research as they described it doesn't produce. It's clear from the article that different scholars held different views about who was related to whom how.

The point of my recent comments is that you made a rather supercilious statement referring to a popular science account of research, based on what appears to me to have been an assumption, one which so far as I can tell from reading the actual scientific paper and related material, was very likely wrong.

Or in other words, you don't know how they solved the problem that I claimed they didn't solve. The misinterpretation of confidence results that I described is quite common in scientific work using statistics done by non-statisticians.

I find it a little difficult to believe that the statistical methods underlying DNA paternity testing, a method that has enjoyed wide application in a number of forensic areas including the paper we're discussing, are fundamentally unsound, or that the usual interpretations of the calculations are false.

So can you explain how a statement such as "the chance that the defendant is not the source of the blood stains is less than one in a million" can be deduced from DNA evidence alone? That's how it is routinely stated.

There is nothing wrong with using DNA testing in forensics—my complaint is about how the results are commonly misrepresented. It's easy to confuse "the chance that the DNA would match this closely if the defendant was not the source of the blood is less than one in a million" with "the chance that the defendant was not the source of the blood is less than one in a million." But the latter statement does not follow from the former.

If there were no other reason to think the defendant the source—if he were simply a random person tested—the odds would be heavily against, since the prior probability would be much less than one in a million.

Asking how they did the calculation is a question you might have asked before posting to your blog that it had to be wrong, and that the authors don't understand statistics.

I don't agree. They described the form of the data they were using and claimed that that data gave their result. Since such data cannot give such a result, my conclusion followed.

Incidentally, a 50/50 prior seems to me to be very conservative, but without understanding the foundations of DNA paternity testing I'm less willing than you to conclude it's "bizarre".

You think assuming that a random A has a fifty percent chance of being the father of a random B is not bizarre?

If they aren't random, the number has to depend on the particular circumstances that led to the test. If you know that either A or B is the father of C and you have no information on which, then 50/50 is appropriate. If not, mostly not.

So far as I can tell, the only reason to assume 50/50 is that, as a previous poster pointed out, that assumption makes the probability actually produced by the statistics—the probability of the evidence condition on the null assumption—equal to the probability that you want—the probability of the null assumption given the evidence. So it lets the person making the assumption—on no grounds at all—obscure the fact that he is confusing the two probabilities.; 11:48 AM, September 05, 2010
RKN said...: Posting in 2 parts b/c the comment sw complained my html was too long?

Both of those approaches require a prior probability, which their research as they described it doesn't produce.

"Produce"? The Bayesian formulation for the probability of paternity that I've found on the web is:

W = CPI*p/[CPI*p + (1-p)] * 100

where p is the prior, CPI is the genetic evidence (combined paternity index), and W is probability of paternity. The supplemental doesn't detail if the analytical software they used (GenoProof) implements this approach, but assuming it does, then in fact they did use a prior. They would have had to.

Or in other words, you don't know how they solved the problem that I claimed they didn't solve.

I don't see how what I wrote could be reworded that way.

The misinterpretation of confidence results that I described is quite common in scientific work using statistics done by non-statisticians.

So? It's also quite common that statistics are correctly interpreted in papers published by non-statisticians. Hopefully mine, for instance.

In this paper, of the over dozen authors listed, four were listed as responsible for statistical analysis. Are you claiming that none of them are statisticians? And how would you know that?

There is nothing wrong with using DNA testing in forensics—my complaint is about how the results are commonly misrepresented. It's easy to confuse "the chance that the DNA would match this closely if the defendant was not the source of the blood is less than one in a million" with "the chance that the defendant was not the source of the blood is less than one in a million." But the latter statement does not follow from the former.

The former is a statement about the quality of the data (as I've already pointed out), and the latter a statement about the culpability of the defendant. I'm not confused about the distinction, and there's no evidence any of the authors of the paper are either.; 8:42 PM, September 05, 2010
RKN said...: Part 2:

I don't agree. They described the form of the data they were using and claimed that that data gave their result. Since such data cannot give such a result, my conclusion followed.

What "form of the data" are you referring to? Data don't "give results", analytical methods implemented on the data produce results. So your complaint must be that the analytical method used in this paper cannot conclude a probability of paternity. Is that correct?

You think assuming that a random A has a fifty percent chance of being the father of a random B is not bizarre?

Offhand, I agree with you, that prior doesn't seem reasonable. However, reading around a bit, assuming a much smaller (more reasonable) prior evidently doesn't change the significance of paternity probability.

But the valuation of the prior isn't the real basis of your objection to the research, so why dwell on it?

If they aren't random, the number has to depend on the particular circumstances that led to the test. If you know that either A or B is the father of C and you have no information on which, then 50/50 is appropriate. If not, mostly not.

If you have anecdotal evidence that C is the child of B, as I would argue this study did, then p is a way of admitting this evidence into the prior. Note that when p=0.5 then,

W=CPI/(CPI+1).

So even implementations that don't value p explicitly, implicitly value it at 0.5, if they use the Bayesian formulation I gave above.

So far as I can tell, the only reason to assume 50/50 is that, as a previous poster pointed out, that assumption makes the probability actually produced by the statistics—the probability of the evidence condition on the null assumption—equal to the probability that you want—the probability of the null assumption given the evidence. So it lets the person making the assumption—on no grounds at all—obscure the fact that he is confusing the two probabilities.

I suppose that's one interpretation, but I don't think it's warranted here. You don't need to "know" that KV55 was Tutankhamun's father (or the son of Amenhotep) to justify using 0.5. There was other anecdotal evidence presented in the paper (inscriptions, morphology, etc.) indicating the possibility of a paternal relationship that could justify a prior that high.

In any case, your original objection wasn't that the prior they used was wrong, it was that the "sort of analysis" they did would not allow them to state a probability of paternity.; 8:43 PM, September 05, 2010
RKN said...: My replies to your latest can be found here:

http://www.rknibbe.com/KingTut2.html

You in italics, me not.

Got annoyed with the comment software - "Your HTML cannot be accepted, >4096 chars." So I posted in two parts, and both posts disappeared, again.

Sheesh.

My personal pet peeve, trying to conduct discussion via blog comments instead of USENET.

I probably won't have much else to say until/if I hear back from the paper's author.; 9:05 PM, September 05, 2010
David Friedman said...: The data they analyzed was DNA data. DNA data can't give a probability of relationship without your also having a prior probability, as I think I have already explained.

I wrote:

"Both of those approaches require a prior probability, which their research as they described it doesn't produce."

RKN Replied:

""Produce"? The Bayesian formulation for the probability of paternity that I've found on the web is:

W = CPI*p/[CPI*p + (1-p)] * 100

where p is the prior, CPI is the genetic evidence (combined paternity index), and W is probability of paternity. The supplemental doesn't detail if the analytical software they used (GenoProof) implements this approach, but assuming it does, then in fact they did use a prior. They would have had to."

Yes. That's my point. In order to get the result they would have to use a prior, the research they did doesn't produce a prior--the prior being the probability before the research was done--hence they couldn't derive the result they reported from their research.

In the formula you cite, where does p come from? Without it you can't solve for W. But you aren't getting it from the genetic evidence, because that's CPI.

"In any case, your original objection wasn't that the prior they used was wrong, it was that the "sort of analysis" they did would not allow them to state a probability of paternity."

That is still my objection. They can't derive a probability of paternity without having a prior, and pulling a prior out of the air—announcing for no good reason that it is 50/50--doesn't qualify.

Enough.; 10:55 PM, September 05, 2010
RKN said...: Yes. That's my point. In order to get the result they would have to use a prior, the research they did doesn't produce a prior--the prior being the probability before the research was done--hence they couldn't derive the result they reported from their research.

p is the prior. The valuation of p used - the authors didn't say, specifically, so your claim they pulled it from a hat is w/o merit - will depend on certain *prior* anecdotal evidence relevant to this case, which there was plenty of, as I already mentioned. The presumed relationship among the mummies was anything but random.

In the formula you cite, where does p come from? Without it you can't solve for W. But you aren't getting it from the genetic evidence, because that's CPI.

Again, p is the prior.

Your original accusation implied they used the "sort" of conventional frequentist analysis that does not allow - and I agreed with you - a conclusion of probability of paternity. Because in that sort of approach conclusions are statements about the quality of the data against the null, *not*, as you have reminded us, probabilistic statements about the correctness of the hypothesis.

But the evidence indicates (it's still a bit unclear - waiting on reply from author) they did not use that sort of approach, in fact they used the sort of approach that *does* allow the calculation of probability of paternity, indeed the very sort you indicated to your readers at the end of your post they should have used!

I occasionally stop by your blog b/c I often enjoy what you have to say about matters I also find interesting. This won't change that, but it will give me pause when I read you criticizing other things outside your area of expertise.; 10:30 AM, September 06, 2010
Andrew said...: This could have ended so much more peacefully had David simply admitted that this was a poor example of a common mistake.

National Geographic FTW!; 2:40 PM, September 08, 2010
Cristian Vimer said...: I just finished reading The Machinery of Freedom and I found most of the arguments in favor of anarcho-capitalism nothing short of a splendid display of logic. I agree with you that a society like the one described in the book can exist (I don't think it will exist, but I will do my best to promote the idea, using same logical arguments, of course). I was wondering how does the arguments in the book change after more than 20 years. Are you planning to publish a new edition? I think it would be a great idea, I believe that more and more people are open to the idea nowadays.; 8:38 PM, September 12, 2010
Cristian Vimer said...: And I apologize for posting my previous comment on what is, obviously a poste with no connection to the idea in my comment....; 8:42 PM, September 12, 2010
David Friedman said...: To Cristian:

I am planning a third edition of Machinery. I discussed the project on this blog some time back:

http://daviddfriedman.blogspot.com/2010/06/possible-new-edition-of-my-machinery-of.html; 2:05 PM, September 13, 2010
Cristian Vimer said...: Sorry for bombarding you with comments, but I noticed you have posted a while ago a Romanian cookbook from the 17th century. If you are still interested and you need it translated, I could help, I'm Romanian.; 10:19 AM, September 17, 2010