December 12th, 2017
Jennifer Sleeman receives AI for Earth grant from Microsoft
Visiting Assistant Professor Jennifer Sleeman (Ph.D. ’17) has been awarded a grant from Microsoft as part of its ‘AI for Earth’ program. Dr. Sleeman will use the grant to continue her research on developing algorithms to model how scientific disciplines such as climate change evolve and predict future trends by analyzing the text of articles and reports and the papers they cite.
AI for Earth is a Microsoft program aimed at empowering people and organizations to solve global environmental challenges by increasing access to AI tools and educational opportunities, while accelerating innovation. Via the Azure for Research AI for Earth award program, Microsoft provides selected researchers and organizations access to its cloud and AI computing resources to accelerate, improve and expand work on climate change, agriculture, biodiversity and/or water challenges.
UMBC is among the first grant recipients of AI for Earth, first launched in July 2017. The grant process was a competitive and selective process and was awarded in recognition of the potential of the work and power of AI to accelerate progress.
As part of her dissertation research, Dr. Sleeman developed algorithms using dynamic topic modeling to understand influence and predict future trends in a scientific discipline. She applied this to the field of climate change and used assessment reports of the Intergovernmental Panel on Climate Change (IPCC) and the papers they cite. Since 1990, an IPCC report has been published every five years that includes four separate volumes, each of which has many chapters. Each report cites tens of thousands of research papers, which comprise a correlated dataset of temporally grounded documents. Her custom dynamic topic modeling algorithm identified topics for both datasets and apply cross-domain analytics to identify the correlations between the IPCC chapters and their cited documents. The approach reveals both the influence of the cited research on the reports and how previous research citations have evolved over time.
Dr. Sleeman’s award is part of an inaugural set of 35 grants in more than ten countries for access to Microsoft Azure and AI technology platforms, services and training. In an post on Monday, AI for Earth can be a game-changer for our planet, Microsoft announced its intent to put $50 million over five years into the program, enabling grant-making and educational trainings possible at a much larger scale.
More information about AI for Earth can be found on the Microsoft AI for Earth website.
November 17th, 2017
Discovering Scientific Influence using Cross-Domain Dynamic Topic Modeling
We describe an approach using dynamic topic modeling to model influence and predict future trends in a scientific discipline. Our study focuses on climate change and uses assessment reports of the Intergovernmental Panel on Climate Change (IPCC) and the papers they cite. Since 1990, an IPCC report has been published every five years that includes four separate volumes, each of which has many chapters. Each report cites tens of thousands of research papers, which comprise a correlated dataset of temporally grounded documents. We use a custom dynamic topic modeling algorithm to generate topics for both datasets and apply crossdomain analytics to identify the correlations between the IPCC chapters and their cited documents. The approach reveals both the influence of the cited research on the reports and how previous research citations have evolved over time. For the IPCC use case, the report topic model used 410 documents and a vocabulary of 5911 terms while the citations topic model was based on 200K research papers and a vocabulary more than 25K terms. We show that our approach can predict the importance of its extracted topics on future IPCC assessments through the use of cross domain correlations, Jensen-Shannon divergences and cluster analytics.
June 27th, 2017
Ph.D. Dissertation Defense
Dynamic Data Assimilation for Topic Modeling
9:00am Thursday, 29 June 2017, ITE 325b, UMBC
Understanding how a particular discipline such as climate science evolves over time has received renewed interest. By understanding this evolution, predicting the future direction of that discipline becomes more achievable. Dynamic Topic Modeling (DTM) has been applied to a number of disciplines to model topic evolution as a means to learn how a particular scientific discipline and its underlying concepts are changing. Understanding how a discipline evolves, and its internal and external influences, can be complicated by how the information retrieved over time is integrated. There are different techniques used to integrate sources of information, however, less research has been dedicated to understanding how to integrate these sources over time. The method of data assimilation is commonly used in a number of scientific disciplines to both understand and make predictions of various phenomena, using numerical models and assimilated observational data over time.
In this dissertation, I introduce a novel algorithm for scientific data assimilation, called Dynamic Data Assimilation for Topic Modeling (DDATM), which uses a new cross-domain divergence method (CDDM) and DTM. By using DDATM, observational data in the form of full-text research papers can be assimilated over time starting from an initial model. DDATM can be used as a way to integrate data from multiple sources and, due to its robustness, can exploit the assimilating observational information to better tolerate missing model information. When compared with a DTM model, the assimilated model is shown to have better performance using standard topic modeling measures, including perplexity and topic coherence. The DDATM method is suitable for prediction and results in higher likelihood for subsequent documents. DDATM is able to overcome missing information during the assimilation process when compared with a DTM model. CDDM generalizes as a method that can also bring together multiple disciplines into one cohesive model enabling the identification of related concepts and documents across disciplines and time periods. Finally, grounding the topic modeling process with an ontology improves the quality of the topics and enables a more granular understanding of concept relatedness and cross-domain influence.
The results of this dissertation are demonstrated and evaluated by applying DDATM to 30 years of reports from the Intergovernmental Panel on Climate Change (IPCC) along with more than 150,000 documents that they cite to show the evolution of the physical basis of climate change.
Committee Members: Drs. Tim Finin (co-advisor), Milton Halem (co-advisor), Anupam Joshi, Tim Oates, Cynthia Matuszek, Mark Cane, Rafael Alonso
February 23rd, 2009
In this week’s ebiquity meeting (10:00am EDT Wed 2/25, ITE 325), Curt Tilmes will talk on “Persistent Identifiers for Earth Science Provenance“.
Historically, published scientific research could include a description of an experiment that an independent party could use to reproduce the experiment with the same results, confirming the research. Modern research in the field of earth science often depends on terrabytes of data captured from remote sensing instruments, complex computer algorithms that undergo numerous changes over the year. A single result could be the result of the work of hundreds of individuals over decades. The representation of the measurements, algorithms and all the other artifacts of experimentation leading to that result becomes a daunting problem. A key to handling this representation is a good scheme for persisent identifiers.
Persistent identifiers seem like a simple problem. Just make a good URL and don’t change it . This sounds good in theory, but is difficult to maintain forever. Many other schemes have been proposed to attack various aspects of the problem of identification, with various advantages and disadvantages. I will introduce this topic and briefly describe some of the concerns with using identifiers specifically in the context described above, and some of the characteristics of various identifier schemes.
The presentation will be streamed live via ustream.tv
References and some identifier schemes
 Cool URIs Don’t Change
 Naming and Addressing: URIs, URLs, …
 Object Identifer (OID)
 The Digital Object Identifier (DOI) System
 Persistent Uniform Resource Locator
 A Universally Unique IDentifier (UUID) URN Namespace
 XRI (Extensible Resource Identifier)
May 3rd, 2008
David Chapman will defend his MS thesis, A General Algorithm for Gridding Earth Sensing Scanning Instruments, at 10:00am Monday May 5 in room 325 ITE. The abstract is below.
Gridding in remote sensing must re-project observations from their original coordinate system based on satellite orbit and attitude to a grid defined by Earth coordinates. Primitive methods assume that observations are located at points on Earth and typically average observations in grid cells, or interpolate geolocated observations. These approaches are inaccurate, because they do not make use of the instrumentâ€™s footprint geometry, and spatial response. Observation Coverage (Obscov) gridding techniques make use of the satellite optics and geometry to more accurately describe coverage of a footprint on within each grid cell. Obscov gridding provides significant accuracy improvements exceeding 1 Kelvin Brightness Temperature over most regions on Earth for a 12 micron window channel on-board the Atmospheric Infrared Sounder (AIRS). Existing Obscov algorithms are only applicable to specific instruments and depend heavily on implicitly defined spatial response functions. We make use of raycasting and adaptive grid numerical integration to compute Obscov for the spatial response function of any instrument while processing streaming satellite observation data faster than 400 Megabits/second on a 6 machine cluster. We discuss the quality benefits of our algorithm by analyzing the results of gridded AIRS infrared sensor data with 324 operational spectral channels. We also address parallel processing issues to integrate AIRS Obscov gridding with SOAR, an on demand climate processing system built on a 122 processor blade server.