Data Provenance Management for Earth Science Reproducibility
by Curt Tilmes
Wednesday, March 24, 2010, 12:00pm - Wednesday, March 24, 2010, 13:30pm
We are constructing a model of scientific data processing that captures and maintains the provenance of all of the artifacts of processing. These include the data transformation algorithms and all data in the system, both inputs from external sources and data produced within the system. Other artifacts include the hardware and software of the processing framework, the source instruments and satellites, scientific literature and documentation, and people and organizations. The origin of any data or algorithms is recorded and the entire history of the processing chains are stored such that a researcher can understand the entire data flow. Provenance is captured in a form suitable for the system to provide basic scientific reproducibility of any data product it distributes even in cases where the physical data products themselves have been deleted due to space constraints.