UMBC ebiquity
Earth science

Archive for the 'Earth science' Category

new paper: Discovering Scientific Influence using Cross-Domain Dynamic Topic Modeling

November 17th, 2017, by Tim Finin, posted in Data Science, Earth science, KR, Machine Learning, NLP

Discovering Scientific Influence using Cross-Domain Dynamic Topic Modeling

Jennifer Sleeman, Milton Halem, Tim Finin and Mark Cane, Discovering Scientific Influence using Cross-Domain Dynamic Topic Modeling, International Conference on Big Data, IEEE, December 2017.

We describe an approach using dynamic topic modeling to model influence and predict future trends in a scientific discipline. Our study focuses on climate change and uses assessment reports of the Intergovernmental Panel on Climate Change (IPCC) and the papers they cite. Since 1990, an IPCC report has been published every five years that includes four separate volumes, each of which has many chapters. Each report cites tens of thousands of research papers, which comprise a correlated dataset of temporally grounded documents. We use a custom dynamic topic modeling algorithm to generate topics for both datasets and apply crossdomain analytics to identify the correlations between the IPCC chapters and their cited documents. The approach reveals both the influence of the cited research on the reports and how previous research citations have evolved over time. For the IPCC use case, the report topic model used 410 documents and a vocabulary of 5911 terms while the citations topic model was based on 200K research papers and a vocabulary more than 25K terms. We show that our approach can predict the importance of its extracted topics on future IPCC assessments through the use of cross domain correlations, Jensen-Shannon divergences and cluster analytics.

Jennifer Sleeman dissertation defense: Dynamic Data Assimilation for Topic Modeling

June 27th, 2017, by Tim Finin, posted in Big data, Earth science, Machine Learning, NLP, Ontologies, Semantic Web

Ph.D. Dissertation Defense

Dynamic Data Assimilation for Topic Modeling

Jennifer Sleeman
9:00am Thursday, 29 June 2017, ITE 325b, UMBC

Understanding how a particular discipline such as climate science evolves over time has received renewed interest. By understanding this evolution, predicting the future direction of that discipline becomes more achievable. Dynamic Topic Modeling (DTM) has been applied to a number of disciplines to model topic evolution as a means to learn how a particular scientific discipline and its underlying concepts are changing. Understanding how a discipline evolves, and its internal and external influences, can be complicated by how the information retrieved over time is integrated. There are different techniques used to integrate sources of information, however, less research has been dedicated to understanding how to integrate these sources over time. The method of data assimilation is commonly used in a number of scientific disciplines to both understand and make predictions of various phenomena, using numerical models and assimilated observational data over time.

In this dissertation, I introduce a novel algorithm for scientific data assimilation, called Dynamic Data Assimilation for Topic Modeling (DDATM), which uses a new cross-domain divergence method (CDDM) and DTM. By using DDATM, observational data in the form of full-text research papers can be assimilated over time starting from an initial model. DDATM can be used as a way to integrate data from multiple sources and, due to its robustness, can exploit the assimilating observational information to better tolerate missing model information. When compared with a DTM model, the assimilated model is shown to have better performance using standard topic modeling measures, including perplexity and topic coherence. The DDATM method is suitable for prediction and results in higher likelihood for subsequent documents. DDATM is able to overcome missing information during the assimilation process when compared with a DTM model. CDDM generalizes as a method that can also bring together multiple disciplines into one cohesive model enabling the identification of related concepts and documents across disciplines and time periods. Finally, grounding the topic modeling process with an ontology improves the quality of the topics and enables a more granular understanding of concept relatedness and cross-domain influence.

The results of this dissertation are demonstrated and evaluated by applying DDATM to 30 years of reports from the Intergovernmental Panel on Climate Change (IPCC) along with more than 150,000 documents that they cite to show the evolution of the physical basis of climate change.

Committee Members: Drs. Tim Finin (co-advisor), Milton Halem (co-advisor), Anupam Joshi, Tim Oates, Cynthia Matuszek, Mark Cane, Rafael Alonso

Persistent Identifiers for Earth Science Provenance

February 23rd, 2009, by Tim Finin, posted in Earth science, Semantic Web

In this week’s ebiquity meeting (10:00am EDT Wed 2/25, ITE 325), Curt Tilmes will talk on “Persistent Identifiers for Earth Science Provenance“.

Historically, published scientific research could include a description of an experiment that an independent party could use to reproduce the experiment with the same results, confirming the research. Modern research in the field of earth science often depends on terrabytes of data captured from remote sensing instruments, complex computer algorithms that undergo numerous changes over the year. A single result could be the result of the work of hundreds of individuals over decades. The representation of the measurements, algorithms and all the other artifacts of experimentation leading to that result becomes a daunting problem. A key to handling this representation is a good scheme for persisent identifiers.

Persistent identifiers seem like a simple problem. Just make a good URL and don’t change it [1]. This sounds good in theory, but is difficult to maintain forever. Many other schemes have been proposed to attack various aspects of the problem of identification, with various advantages and disadvantages. I will introduce this topic and briefly describe some of the concerns with using identifiers specifically in the context described above, and some of the characteristics of various identifier schemes.

The presentation will be streamed live via ustream.tv

References and some identifier schemes

[1] Cool URIs Don’t Change
[2] Naming and Addressing: URIs, URLs, …
[3] Object Identifer (OID)
[4] The Digital Object Identifier (DOI) System
[5] Persistent Uniform Resource Locator
[6] A Universally Unique IDentifier (UUID) URN Namespace
[7] XRI (Extensible Resource Identifier)

Chapman: Gridding Earth Sensing Scanning Instruments, 10am 10/5, ITE 325

May 3rd, 2008, by Tim Finin, posted in Earth science, High performance computing, MC2

David Chapman will defend his MS thesis, A General Algorithm for Gridding Earth Sensing Scanning Instruments, at 10:00am Monday May 5 in room 325 ITE. The abstract is below.

Gridding in remote sensing must re-project observations from their original coordinate system based on satellite orbit and attitude to a grid defined by Earth coordinates. Primitive methods assume that observations are located at points on Earth and typically average observations in grid cells, or interpolate geolocated observations. These approaches are inaccurate, because they do not make use of the instrument’s footprint geometry, and spatial response. Observation Coverage (Obscov) gridding techniques make use of the satellite optics and geometry to more accurately describe coverage of a footprint on within each grid cell. Obscov gridding provides significant accuracy improvements exceeding 1 Kelvin Brightness Temperature over most regions on Earth for a 12 micron window channel on-board the Atmospheric Infrared Sounder (AIRS). Existing Obscov algorithms are only applicable to specific instruments and depend heavily on implicitly defined spatial response functions. We make use of raycasting and adaptive grid numerical integration to compute Obscov for the spatial response function of any instrument while processing streaming satellite observation data faster than 400 Megabits/second on a 6 machine cluster. We discuss the quality benefits of our algorithm by analyzing the results of gridded AIRS infrared sensor data with 324 operational spectral channels. We also address parallel processing issues to integrate AIRS Obscov gridding with SOAR, an on demand climate processing system built on a 122 processor blade server.

You are currently browsing the archives for the Earth science category.

  Home | Archive | Login | Feed