I recently came across a talk by Roderick Long in which he criticizes my father's methodological position, in particular the argument in his essay "The Methodology of Positive Economics," an essay which defends the use of unrealistic models in economics, such
as perfect competition, on the grounds that the ultimate test of a
model is not its descriptive accuracy but its ability to make correct
predictions. The talk struck me as an attempt to make sense of the position without understanding it. Hence this post, in which I will attempt to explain the Chicago school methodology practiced by (among many others) both my father and myself.

Roderick starts his argument by imagining a theory of Harry Potter movies according to which some invisible force builds up over time to produce a Harry Potter movie every year. That theory predicts that each year there will be a new movie. For some years the prediction is correct, but eventually it fails. Someone with a more realistic theory would have produced a more correct prediction—that the series of movies would end, probably at the point when it had covered all of the books.

The first theory successfully predicts a reasonably likely event, a successful movie having a sequel, several times. That is evidence that it is a good theory, but not very much evidence. A theory built on a more realistic model of the process, in which successful movies are likely to have sequels but a series of movies based on a popular series of books is likely to end when it runs out of books, successfully predicts more facts, so is a superior theory by the criterion of prediction. Roderick's own example is one where the criterion of prediction and the criterion of realism lead to the same result—the more realistic theory is also the better predictor, so is to be preferred on either criterion.

His fundamental mistake, if I understand it correctly, is to imagine that all that is going on in the Chicago approach is blind curve fitting, looking for patterns in the observed data and assuming that those patterns will continue. The problem with that approach is that a body of data can be fitted with an infinite number of different curves. In selecting among the possible patterns that could explain the data, one uses whatever information is available to form a theory. The theory cannot be entirely realistic, since that would require including every feature of the situation that could conceivably be relevant. The test of whether one has done a good job of figuring out what simplified model includes the important factors and excludes the unimportant ones is the ability of the model to make correct predictions.

Crucial to this view of the process is the distinction between explaining facts you already know and predicting facts you do not know, a point that is emphasized in my father's essay but, I think, entirely ignored in Roderick's lecture. Explanation of known facts can be blind curve fitting—but unless you have succeeded in choosing the right model, your predictions of facts that did not go into constructing it are unlikely to be correct. The crucial assumption that distinguishes prediction from explanation is that humans have some ability to correctly perceive patterns, making correct predictions evidence of something more than a lucky guess. You can find a more detailed explanation here.

Roderick offers an elaborate philosophical explanation of why my father rejects what Roderick views as the correct approach to doing economics, the

*a priori*approach associated with Ludwig Von Mises and some of his followers. There is a much simpler explanation. The problem with that approach, at least in its extreme version, is that pure*a priori*argument is unable to predict anything of economic interest. If one is completely agnostic about the facts, including both utility functions and production technology, any physically possible pattern of human behavior is consistent with the theory. As I put it long ago in my Price Theory, explaining why the assumption of rationality is empty unless combined with some knowledge of what humans value:Why did I stand on my head on the table while holding a burning $1,000 bill between my toes? I wanted to stand on my head on the table while holding a burning $1,000 bill between my toes.

I conclude that the correct way of doing economics combines

*a priori*theory with evidence. You form plausible conjectures on the basis of theory and evidence, where part of forming them is deciding what simplifications, what unrealistic features of the model, assume away inessential complications while retaining the essential features of what you are trying to understand. You find out how good a job you have done by using the conjectures to make predictions and seeing whether the predictions are correct. An added benefit of that process, as I discovered in the course of writing my first published journal article in economics, is that finding real world predictions of your model may force you to think through the model itself more clearly.
That is the Chicago School methodology as I understand and practice it.

## 7 comments:

I believe that the concept that you're trying to explain is called the bias-variance tradeoff in statistics:

http://en.wikipedia.org/wiki/Supervised_learning#Bias-variance_tradeoff

When trying to explain some set of data you can either build models with fewer parameters that explain the data not as well or models with more parameters that explain the data better (i.e. higher R^2 or lower mis-classification rate).

In almost every problem there's some middle optimal tradeoff between more explanatory complexity or less. The very simple intuition behind this is that simpler models are a priori more likely to be right, so you'll accept some penalty of not fitting as well for increased simplicity.

In other words Occam's razor + Bayesian reasoning tells you to put a high prior weight on the simpler models. If you've only seen a few data points keep the simpler weaker model, if you've seen tons of data points that support the more complex model than that might be it.

That's why for any group of X-Y points we usually fit a straight line or smooth convex curve. You could fit a curve that would cover every point exactly but it would be extremely jagged. Unless you have a ton of data the smooth approximated curve usually outperforms the perfect jagged curve on unseen data. (Jagged curves are more complex because they require more parameters to specify).

Roderick Long's position is the statistical equivalency of putting a certain or near certain weight on the a priori deduced hypothesis. It is valid from a Bayesian perspective to reject mountains of evidence, but only if you're certain of the a priori belief to a very very high degree.

For example I would never accept the hypothesis that demons sneak into my house undetected and move my car keys, no matter how many times I found my car keys somewhere that I didn't remember putting them.

Most economics hypothesis' from my view fall below this level of a priori certainty. They're certainly higher certainty than indicated by the behavior of VAR wielding MIT data-mining macro-economists. For this reason the Austrians make a very good counter-balance.

Many common Praexology prior assumptions, like universal preference for present consumption over future consumption, should get very heavy prior weights. But they should be rejected with some threshold of evidence that falls short of the demons are not moving my car keys prior belief.

Haven't listened to Long's talk yet, but I hope his fully elaborated a-priori movie theory can tell me where the 19 missing

Master and Commandersequels went.To Paul:

The fully elaborated theory was my suggestion for the implication of his argument, so you can't blame him for it. Possibly the theory should distinguish between a movie based on the first book of a series, such as the Lord of the Rings or Harry Potter movie, and one based on elements from multiple novels in a series, such as Master and Commander.

His fundamental mistake, if I understand it correctly, is to imagine that all that is going on in the Chicago approach is blind curve fitting, looking for patterns in the observed data and assuming that those patterns will continue. The problem with that approach is that a body of data can be fitted with an infinite number of different curves. In selecting among the possible patterns that could explain the data, one uses whatever information is available to form a theory. The theory cannot be entirely realistic, since that would require including every feature of the situation that could conceivably be relevant. The test of whether one has done a good job of figuring out what simplified model includes the important factors and excludes the unimportant ones is the ability of the model to make correct predictions.The problem with the Chicago School is that it tries to look at a complex system in which many things are interconnected in some way by making simplifying assumptions that cannot be justified in reality. The a priori methodology works much better because everything is based using logic and as long as your assumptions are valid there isn't that much of an opportunity to go wrong without discovering the error. The Chicago School cannot accept such a methodology because it leads to conclusions that its practitioners do not favour. As such, they ignore the approach for methodology that permits a greater role for narrative.

James Bond movies seem to keep coming at roughly one a year despite running out of books years ago!

The major point that Roderick made that I thought made a lot of sense is that there is a difference between modelling a horse that has no color and a horse whose color we don't care about. If your model rests on the assumption that the horse has no color, it is wrong. Saying "well, the horse's color is irrelevant, and we can't include every possible color in our model" is not a good justification.

In Neuroscience, I run into this problem all the time, when people model neurons, for example, as simple integrate-and-fire circles without dendrites, declaring that they cannot possibly model every single piece of neuronal morphology. The problem is that neurons do have dendrites, and those contribute (both passively and actively, as it turns out) to how neurons integrate incoming signals. So, most models in Computational Neuroscience are based on fallacious assumptions and are good fits only for those data which were used to create them. Their predictive power is rather bad.

Robert Murphy gave an example of this fallacious approach in his recent talk "Why Capitalism Needs Losses" (available on YouTube) when he gave an example of wrong models that lead to the housing market collapse. Apparently, some models assumed that every city's housing market was independent from every other city's market. (Presumably because it's unreasonable to try to model every single realistic interaction of every city's market with every other city's market.)

But, based on this model, if the probability of housing prices going down in Atlanta and Boston were 1% each, then the probability of them doing this together was 0.01%, and the probability of the housing prices going down simultaneously across the US was infinitesimally small. Which was of course erroneous, because, in fact, all cities' housing markets are interconnected.

So, this is a good example of how easy it is to go from ignoring the horse's color in your model and assuming that the horse has no color and including that in your calculations.

As you may well by now have discovered, there's a more formal version of the argument here:

http://mises.org/journals/qjae/pdf/qjae9_3_1.pdf

It doesn't, obviously, follow that your evaluation ought to be any different.

Post a Comment