Data Citation, Peer Review and Provenance

February 8th, 2011

In today’s ebiquity meeting, Curt Tilmes showed an interesting figure showing the how often a particular dataset (MODIS snow cover data) was mentioned in a paper vs. how often it was formally cited. It’s a good example of how far we still need to go w.r.t. formally capturing the provenance of data and information derived from it.

Data Citation and Peer Review

The figure is from:

Parsons, Mark A.; Duerr, Ruth; Minster, Jean-Bernard. Data Citation and Peer Review. Eos, Transactions American Geophysical Union, Volume 91, Issue 34, p. 297-298. 2010.

How to choose the right chart for your data

January 25th, 2009

There are lots of good systems, including excel and other spreadsheet tools, that can visualize your data in various kinds of graphs. it can sometimes by a little daunting, however, to figure out which kind of chart to use. The version of excel running on my laptop, for example, asks me to choose from more than 70 kinds of charts. Of course, many of the variations are obviously stylistic — 2D vs 3D bar charts — but there are still a lot of options.

A link to a great data visualization cheat sheet on How to choose a chart is doing well on Hacker News today. The graphic was created by Andrew Abela and posted on his blog in Choosing a good chart over three years ago.

“Here’s something we came up with to help you consider which chart to use. It was inspired by the table in Gene Zelazny’s classic work Saying It With Charts (p. 27 in the 4th. ed)”

How to choose the right chart for your data

Abela developed this aid as part of his Extreme Presentation method for “designing presentations that drive action”. Viewing his Extreme Presentation blog you can find versions of this chart aide that have been translated into other languages

Models? We don’t need no stinking models!

June 26th, 2008

Wired has an interesting article, The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, that discusses the data driven revolution that computers and the Web have unleashed. Science used to rely on developing models to explain and organize the world and make predictions. Now much of that can be done by correlating large amounts of data. It applies equally well to other disciplines (e.g., Linguistics) as well as businesses (think Google).

“All models are wrong, but some are useful.” So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.

Update: And then there is this counterpoint: Why the cloud cannot obscure the scientific method .