Provenance Tracking in an Earth Science Data Processing System
by Curt Tilmes
Wednesday, September 30, 2009, 11:00am - Wednesday, September 30, 2009, 12:00pm
NASA GFSC, Building 3 Auditorium
Tremendous volumes of data have been captured, archived and analyzed. Sensors, algorithms and processing systems for transforming and analyzing the data are evolving over time. Web Portals and Services can create transient data sets on-demand. Data are transferred from organization to organization with additional transformations at every stage. Provenance in this context refers to the source of data and a record of the process that led to its current state. Provenance is important for understanding and using scientific datasets, and critical for independent confirmation of scientific results. Managing provenance throughout scientific data processing has gained interest lately and there are a variety of approaches. Large scale scientific datasets consisting of thousands to millions of individual data files and processes offer particular challenges. This talk will introduce the general area of provenance tracking and describe its application to earth science data processing.