| UMBC ebiquity |
Provenance Tracking in Climate Science Data Processing SystemsTweetSpeaker: Curt Tilmes Start: Tuesday, March 04, 2008, 10:00AM Location: 325 ITE Abstract: NASA, NOAA, ESA and other organizations involved with climate research
have captured huge archives of earth observations. Over time, the
sensors, spacecraft, science algorithms for transforming and analyzing
the data and the processing frameworks have all evolved. Tracking the
complete provenance information in concert with the science data used
in research and ultimately, policy decisions is a tremendously
complicated problem. Data are stored in multiple archives across
multiple agencies. Since the data volume is so large, previous
generations of the data are often discarded in favor of newer
versions. Systems often aren't capable of reproducing data that were
once provided to the public. Tracing the provenance of a product is
generally a very manual process, since it is stored in so many
different ways (or not stored at all). It often involves reading
science papers, or calling up the researchers. In next generation
processing system, data can be transformed by on-demand processing in
new ways resulting in transient data sets that are returned to a user
or layered application but not archived at all. Our goal is complete
scientific reproducibility of all data.
I will briefly present the general area and challenges of provenance
tracking for science data processing systems and the requirements for
scientific reproducibility. I will discuss some existing techniques
and proposals including metadata standards and representation of
provenance through standard ontologies on the semantic web.
Tags: semantic web, provenance, scientific data, data , |