January 27th, 2018
Cleaning Noisy Knowledge Graphs
My dissertation research is developing an approach to identify and explain errors in a knowledge graph constructed by extracting entities and relations from text. Information extraction systems can automatically construct knowledge graphs from a large collection of documents, which might be drawn from news articles, Web pages, social media posts or discussion forums. The language understanding task is challenging and current extraction systems introduce many kinds of errors. Previous work on improving the quality of knowledge graphs uses additional evidence from background knowledge bases or Web searches. Such approaches are diffuclt to apply when emerging entities are present and/or only one knowledge graph is available. In order to address the problem I am using multiple complementary techniques including entitylinking, common sense reasoning, and linguistic analysis.
October 26th, 2017
W3C Recommendation: Time Ontology in OWL
The Spatial Data on the Web Working Group has published a W3C Recommendation of the Time Ontology in OWL specification. The ontology provides a vocabulary for expressing facts about relations among instants and intervals, together with information about durations, and about temporal position including date-time information. Time positions and durations may be expressed using either the conventional Gregorian calendar and clock, or using another temporal reference system such as Unix-time, geologic time, or different calendars.
September 12th, 2017
2018 Ontology Summit: Ontologies in Context
The OntologySummit is an annual series of online and in-person events that involves the ontology community and communities related to each year’s topic. The topic chosen for the 2018 Ontology Summit will be Ontologies in Context, which the summit describes as follows.
“In general, a context is defined to be the circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood and assessed. Some examples of synonyms include circumstances, conditions, factors, state of affairs, situation, background, scene, setting, and frame of reference. There are many meanings of “context” in general, and also for ontologies in particular. The summit this year will survey these meanings and identify the research problems that must be solved so that contexts can succeed in achieving the full understanding and assessment of an ontology.”
Each year’s Summit comprises of a series of both online and face-to-face events that span about three months. These include a vigorous three-month online discourse on the theme, and online panel discussions, research activities which will culminate in a two-day face-to-face workshop and symposium.
Over the next two months, there will be a sequence of weekly online meetings to discuss, plan and develop the 2018 topic. The summit itself will start in January with weekly online sessions of invited speakers. Visit the the 2018 Ontology Summit site for more information and to see how you can participate in the planning sessions.
September 10th, 2017
Context-Dependent Privacy and Security Management on Mobile Devices
There are ongoing security and privacy concerns regarding mobile platforms that are being used by a growing number of citizens. Security and privacy models typically used by mobile platforms use one-time permission acquisition mechanisms. However, modifying access rights after initial authorization in mobile systems is often too tedious and complicated for users. User studies show that a typical user does not understand permissions requested by applications or are too eager to use the applications to care to understand the permission implications. For example, the Brightest Flashlight application was reported to have logged precise locations and unique user identifiers, which have nothing to do with a flashlight application’s intended functionality, but more than 50 million users used a version of this application which would have forced them to allow this permission. Given the penetration of mobile devices into our lives, a fine-grained context-dependent security and privacy control approach needs to be created.
We have created Mithril as an end-to-end mobile access control framework that allows us to capture access control needs for specific users, by observing violations of known policies. The framework studies mobile application executables to better inform users of the risks associated with using certain applications. The policy capture process involves an iterative user feedback process that captures policy modifications required to mediate observed violations. Precision of policy is used to determine convergence of the policy capture process. Policy rules in the system are written using Semantic Web technologies and the Platys ontology to define a hierarchical notion of context. Policy rule antecedents are comprised of context elements derived using the Platys ontology employing a query engine, an inference mechanism and mobile sensors. We performed a user study that proves the feasibility of using our violation driven policy capture process to gather user-specific policy modifications.
We contribute to the static and dynamic study of mobile applications by defining “application behavior” as a possible way of understanding mobile applications and creating access control policies for them. Our user study also shows that unlike our behavior-based policy, a “deny by default” mechanism hampers usability of access control systems. We also show that inclusion of crowd-sourced policies leads to further reduction in user burden and need for engagement while capturing context-based access control policy. We enrich knowledge about mobile “application behavior” and expose this knowledge through the Mobipedia knowledge-base. We also extend context synthesis for semantic presence detection on mobile devices by combining Bluetooth, low energy beacons and Nearby Messaging services from Google.
June 27th, 2017
Ph.D. Dissertation Defense
Dynamic Data Assimilation for Topic Modeling
9:00am Thursday, 29 June 2017, ITE 325b, UMBC
Understanding how a particular discipline such as climate science evolves over time has received renewed interest. By understanding this evolution, predicting the future direction of that discipline becomes more achievable. Dynamic Topic Modeling (DTM) has been applied to a number of disciplines to model topic evolution as a means to learn how a particular scientific discipline and its underlying concepts are changing. Understanding how a discipline evolves, and its internal and external influences, can be complicated by how the information retrieved over time is integrated. There are different techniques used to integrate sources of information, however, less research has been dedicated to understanding how to integrate these sources over time. The method of data assimilation is commonly used in a number of scientific disciplines to both understand and make predictions of various phenomena, using numerical models and assimilated observational data over time.
In this dissertation, I introduce a novel algorithm for scientific data assimilation, called Dynamic Data Assimilation for Topic Modeling (DDATM), which uses a new cross-domain divergence method (CDDM) and DTM. By using DDATM, observational data in the form of full-text research papers can be assimilated over time starting from an initial model. DDATM can be used as a way to integrate data from multiple sources and, due to its robustness, can exploit the assimilating observational information to better tolerate missing model information. When compared with a DTM model, the assimilated model is shown to have better performance using standard topic modeling measures, including perplexity and topic coherence. The DDATM method is suitable for prediction and results in higher likelihood for subsequent documents. DDATM is able to overcome missing information during the assimilation process when compared with a DTM model. CDDM generalizes as a method that can also bring together multiple disciplines into one cohesive model enabling the identification of related concepts and documents across disciplines and time periods. Finally, grounding the topic modeling process with an ontology improves the quality of the topics and enables a more granular understanding of concept relatedness and cross-domain influence.
The results of this dissertation are demonstrated and evaluated by applying DDATM to 30 years of reports from the Intergovernmental Panel on Climate Change (IPCC) along with more than 150,000 documents that they cite to show the evolution of the physical basis of climate change.
Committee Members: Drs. Tim Finin (co-advisor), Milton Halem (co-advisor), Anupam Joshi, Tim Oates, Cynthia Matuszek, Mark Cane, Rafael Alonso
March 17th, 2017
The Semantics Toolkit
Paul Cuddihy and Justin McHugh
GE Global Research Center, Niskayuna, NY
10:00-11:00 Tuesday, 4 April 2017, ITE 346, UMBC
Paul Cuddihy is a senior computer scientist and software systems architect in AI and Learning Systems at the GE Global Research Center in Niskayuna, NY. He earned an M.S. in Computer Science from Rochester Institute of Technology. The focus of his twenty-year career at GE Research has ranged from machine learning for medical imaging equipment diagnostics, monitoring and diagnostic techniques for commercial aircraft engines, modeling techniques for monitoring seniors living independently in their own homes, to parallel execution of simulation and prediction tasks, and big data ontologies. He is one of the creators of the open source software “Semantics Toolkit” (SemTk) which provides a simplified interface to the semantic tech stack, opening its use to a broader set of users by providing features such as drag-and-drop query generation and data ingestion. Paul has holds over twenty U.S. patents.
Justin McHugh is computer scientist and software systems architect working in the AI and Learning Systems group at GE Global Research in Niskayuna, NY. Justin attended the State University of New York at Albany where he earned an M.S in computer science. He has worked as a systems architect and programmer for large scale reporting, before moving into the research sector. In the six years since, he has worked on complex system integration, Big Data systems and knowledge representation/querying systems. Justin is one of the architects and creators of SemTK (the Semantics Toolkit), a toolkit aimed at making the power of the semantic web stack available to programmers, automation and subject matter experts without their having to be deeply invested in the workings of the Semantic Web.
March 4th, 2017
SADL – Semantic Application Design Language
Dr. Andrew W. Crapo
GE Global Research
10:00 Tuesday, 7 March 2017
The Web Ontology Language (OWL) has gained considerable acceptance over the past decade. Building on prior work in Description Logics, OWL has sufficient expressivity to be useful in many modeling applications. However, its various serializations do not seem intuitive to subject matter experts in many domains of interest to GE. Consequently, we have developed a controlled-English language and development environment that attempts to make OWL plus rules more accessible to those with knowledge to share but limited interest in studying formal representations. The result is the Semantic Application Design Language (SADL). This talk will review the foundational underpinnings of OWL and introduce the SADL constructs meant to capture, validate, and maintain semantic models over their lifecycle.
Dr. Crapo has been part of GE’s Global Research staff for over 35 years. As an Information Scientist he has built performance and diagnostic models of mechanical, chemical, and electrical systems, and has specialized in human-computer interfaces, decision support systems, machine reasoning and learning, and semantic representation and modeling. His work has included a graphical expert system language (GEN-X), a graphical environment for procedural programming (Fuselet Development Environment), and a semantic-model-driven user-interface for decision support systems (ACUITy). Most recently Andy has been active in developing the Semantic Application Design Language (SADL), enabling GE to leverage worldwide advances and emerging standards in semantic technology and bring them to bear on diverse problems from equipment maintenance optimization to information security.
December 9th, 2016
Understanding the Logical and Semantic
Structure of Large Documents
11:00-1:00 Monday, 12 December 2016, ITE325b, UMBC
Up-to-the-minute language understanding approaches are mostly focused on small documents such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents such as legal documents, reports, business opportunities, proposals and technical manuals is still a challenging task. The reason behind this challenge is that the documents may be multi-themed, complex and cover diverse topics.
We aim to automatically identify and classify a document’s sections and subsections, infer their structure and annotate them with semantic labels to understand the semantic structure of a document. This document’s structure understanding will significantly benefit and inform a variety of applications such as information extraction and retrieval, document categorization and clustering, document summarization, fact and relation extraction, text analysis and question answering.
Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Cynthia Matuszek, James Mayfield (JHU)
May 22nd, 2016
With the increase in the number of cloud services and service providers, manual analysis of Service Level Agreements (SLA), comparison between different service offerings and conformance regulation has become a difficult task for customers. Cloud SLAs are policy documents describing the legal agreement between cloud providers and customers. SLA specifies the commitment of availability, performance of services, penalties associated with violations and procedure for customers to receive compensations in case of service disruptions. The aim of our research is to develop technology solutions for automated cloud service management using Semantic Web and Text Mining techniques. In this paper we discuss in detail the challenges in automating cloud services management and present our preliminary work in extraction of knowledge from SLAs of different cloud services. We extracted two types of information from the SLA documents which can be useful for end users. First, the relationship between the service commitment and financial credit. We represented this information by enhancing the existing Cloud service ontology proposed by us in our previous research. Second, we extracted rules in the form of obligations and permissions from SLAs using modal and deontic logic formalizations. For our analysis, we considered six publicly available SLA documents from different cloud computing service providers.
May 2nd, 2016
He’s dead, Jim.
Google recently shut down the query interface to Freebase. All that is left of this innovative service is the ability to download a few final data dumps.
Freebase was launched nine years ago by Metaweb as an online source of structured data collected from Wikipedia and many other sources, including individual, user-submitted uploads and edits. Metaweb was acquired by Google in July 2010 and Freebase subsequently grew to have more than 2.4 billion facts about 44 million subjects. In December 2014, Google announced that it was closing Freebase and four months later it became read-only. Sometime this week the query interface was shut down.
I’ve enjoyed using Freebase in various projects in the past two years and found that it complemented DBpedia in many ways. Although its native semantics differed from that of RDF and OWL, it was close enough to allow all of Freebase to be exported as RDF. Its schema was larger than DBpedia’s and the data tended to be a bit cleaner.
Google generously decided to donate the data to the Wikidata project, which began migrating Freebase’s data to Wikidata in 2015. The Freebase data also lives on as part of Google’s Knowledge Graph. Google recently allowed very limited querying of its knowledge graph and my limited experimenting with it suggests that has Freebase data at its core.
May 1st, 2016
Representing and Reasoning with Temporal
Properties/Relations in OWL/RDF
10:30-11:30 Monday, 2 May 2016, ITE346
OWL ontologies offer the means for modeling real-world domains by representing their high-level concepts, properties and interrelationships. These concepts and their properties are connected by means of binary relations. However, this assumes that the model of the domain is either a set of static objects and relationships that do not change over time, or a snapshot of these objects at a particular point in time. In general, relationships between objects that change over time (dynamic properties) are not binary relations, since they involve a temporal interval in addition to the object and the subject. Representing and querying information evolving in time requires careful consideration of how to use OWL constructs to model dynamic relationships and how the semantics and reasoning capabilities within that architecture are affected.
April 3rd, 2016
Policies For Oblivious Cloud Storage
Using Semantic Web Technologies
10:30am, Monday, 4 April 2016, ITE 346, UMBC
Consumers want to ensure that their enterprise data is stored securely and obliviously on the cloud, such that the data objects or their access patterns are not revealed to anyone, including the cloud provider, in the public cloud environment. We have created a detailed ontology describing the oblivious cloud storage models and role based access controls that should be in place to manage this risk. We have also implemented the ObliviCloudManager application that allows users to manage their cloud data using oblivious data structures. This application uses role based access control model and collection based document management to store and retrieve data efficiently. Cloud consumers can use our system to define policies for storing data obliviously and manage storage on untrusted cloud platforms, even if they are not familiar with the underlying technology and concepts of the oblivious data structure.