Semantic Web in Provenance Management Workshop

December 25th, 2011

The Third International Workshop on the role of the Semantic Web in Provenance Management will be held in conjunction with the Ninth Extended Semantic Web Conference (ESWC-2012) on May 27 or 28 in Heraklion, Greece. The workshop’s objectives are to explore opportunities offered by the Semantic Web technologies in the context of the management and exploitation of provenance and document the role of provenance in real-world Semantic Web applications.

The one day workshop will include presentations of full research papers, short position papers, a panel on the W3C provenance working group proposals, and demonstrations of prototypes and working systems. Submit papers and demonstration proposals by 4 March 2012.

Data Citation, Peer Review and Provenance

February 8th, 2011

In today’s ebiquity meeting, Curt Tilmes showed an interesting figure showing the how often a particular dataset (MODIS snow cover data) was mentioned in a paper vs. how often it was formally cited. It’s a good example of how far we still need to go w.r.t. formally capturing the provenance of data and information derived from it.

Data Citation and Peer Review

The figure is from:

Parsons, Mark A.; Duerr, Ruth; Minster, Jean-Bernard. Data Citation and Peer Review. Eos, Transactions American Geophysical Union, Volume 91, Issue 34, p. 297-298. 2010.

Provenance Tracking in Science Data Processing Systems

May 28th, 2008

Maybe we should think of data provenance as being like a recipe. Recipes for preparing food are more than just a list of ingredients and specify, often in great detail, how the ingredients are combined, cooked and served and also specify the cooking implements and their settings.

Curt Tilmes presented his PhD dissertation proposal yesterday on “Provenance Tracking in Science Data Processing Systems”. Curt works at at the NASA Goddard Spaceflight Center and is responsible for managing the data processing of earth science climate research data. Curt has some very good ideas about how to capture all of the relevant provenance data for sophisticated scientific data. He’s using, of course, the Semantic Web languages (RDF and OWL) to express and share the provenance data.

Part of the problem is that you have to capture not just the inputs to a dataset, but how the inputs were processed to produce the dataset, including (ideally) the algorithms, software and hardware. As an easily grasped example to illustrate this, he referred to a recent post by Ray Pierre on the RealClimate blog, How to cook a graph in three easy lessons. This post demonstrates how Roy Spencer processes inputs from two common climate datasets (the Southern Oscillation and Pacific Decadal Oscillation indexes) to get the results that support the conclusion that global warming is due to natural causes and not human activity.