Data Provenance Management for Earth Science Reproducibility


Wednesday, March 24, 2010, 12:00pm - Wednesday, March 24, 2010, 13:30pm

325b ITE

A fundamental aspect of all science is reproducibility. In the past few decades, Earth Science has been increasingly based on remote sensing (aircraft, satellites, ocean buoy sensors, etc.) that have produced tremendous volumes of data. There is often a long chain of complex processing steps that ultimately lead to published science. Understanding the processing chain, and maintaining scientific reproducibility of results is a major challenge.

We are constructing a model of scientific data processing that captures and maintains the provenance of all of the artifacts of processing. These include the data transformation algorithms and all data in the system, both inputs from external sources and data produced within the system. Other artifacts include the hardware and software of the processing framework, the source instruments and satellites, scientific literature and documentation, and people and organizations. The origin of any data or algorithms is recorded and the entire history of the processing chains are stored such that a researcher can understand the entire data flow. Provenance is captured in a form suitable for the system to provide basic scientific reproducibility of any data product it distributes even in cases where the physical data products themselves have been deleted due to space constraints.

OWL Tweet

UMBC ebiquity