Models? We don’t need no stinking models!
Tim Finin, 1:23pm 26 June 2008Wired has an interesting article, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, that discusses the data driven revolution that computers and the Web have unleashed. Science used to rely on developing models to explain and organize the world and make predictions. Now much of that can be done by correlating large amounts of data. It applies equally well to other disciplines (e.g., Linguistics) as well as businesses (think Google).
“All models are wrong, but some are useful.” So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.
Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.
Update: And then there is this counterpoint: Why the cloud cannot obscure the scientific method .


June 26th, 2008 at 3:25 pm
I think the issues isn’t the usual “All or Nothing” as per usual ,with articles such as these (where provocation is the prime driver), but the need for “model dexterity”. In a sense, we are revisting “Closed World” and “Open World” models via the backdoor
The whole premise of the Semantic Web vision, comes down to infrastructure built around an inherently dexterious data model i.e RDF .
Kingsley Idehen
June 27th, 2008 at 3:52 am
Actually, to some extent, the idea is that a model can only take you so far is not at all new. Historically, there has been many times where scientists have abandoned a physical model for a purely, applied, mathematical one. If you forget the Kuhnian view of what happened in what is called the “Copernican revolution”, it becomes quite clear that what happened was that allready Ptolemy ignored the physics of his day, which to him was flawed, for a mathematical model, which pragmatically was decent, but wrong. Copernicus, realising that the physics was wrong, but wishing to do something about it, came up with a heliocentric model, which was also wrong, but pointed to a new direction. Most of his contemporaries was perfectly fine with the mathematical model, which served them well, but a few proceeded to establish a new model, and it took a few centuries to get to the Newtonian universe, where the physical model was again clear.
Now, that too had shortcomings, and modern physics is, IMHO, in a situation where mathematical pragmatism rather than deep, physical understanding, dominates.
But history has shown that this is a situation of crisis, rather than a novel and useful way to look at the world. That’s the thing with Google to. It isn’t very good, actually. We have customers who work with people and helping them find stuff, and that’s the story they’re telling: Google isn’t always very helpful.
I believe that this is a time of crisis, in the scientific sense, and the solution isn’t to abandon models, it is to establish new theory and new models. And I believe that we have a pretty decent first iteration in the Semantic Web. Perhaps we’re just being Copernicus, but at least we’re better than Ptolemy (which is Google’s counterpart in this
)