November 29th, 2016
Dealing with Dubious Facts
in Knowledge Graphs
1:00-3:00pm Wednesday, 30 November 2016, ITE 325b, UMBC
Knowledge graphs are structured representations of facts where nodes are real-world entities or events and edges are the associations among the pair of entities. Knowledge graphs can be constructed using automatic or manual techniques. Manual techniques construct high quality knowledge graphs but are expensive, time consuming and not scalable. Hence, automatic information extraction techniques are used to create scalable knowledge graphs but the extracted information can be of poor quality due to the presence of dubious facts.
An extracted fact is dubious if it is incorrect, inexact or correct but lacks evidence. A fact might be dubious because of the errors made by NLP extraction techniques, improper design consideration of the internal components of the system, choice of learning techniques (semi-supervised or unsupervised), relatively poor quality of heuristics or the syntactic complexity of underlying text. A preliminary analysis of several knowledge extraction systems (CMU’s NELL and JHU’s KELVIN) and observations from the literature suggest that dubious facts can be identified, diagnosed and managed. In this dissertation, I will explore approaches to identify and repair such dubious facts from a knowledge graph using several complementary approaches, including linguistic analysis, common sense reasoning, and entity linking.
Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Paul McNamee (JHU), Partha Talukdar (IISc, India)
October 17th, 2016
In this weeks ebiquity meeting (11:30am 10/18, ITE346), Sudip Mittal will talk on Knowledge for Cybersecurity.
In the broad domain of security, analysts and policy makers need knowledge about the state of the world to make critical decisions, operational/tactical as well as strategic. This knowledge has to be extracted from different sources, and then represented in a form that will enable further analysis and decision making. Some of this data underlying this knowledge is in textual sources traditionally associated with Open Sources Intelligence (OSINT), others in data that is present in hidden sources like dark web vulnerability markets. Today, this is a mostly manual process. We wish to automate this problem by taking data from a variety of sources, extracting, representing and integrating the knowledge present, and then use the resulting knowledge graph to create various semantic agents that add value to the cybersecurity infrastructure.
May 22nd, 2016
With the increase in the number of cloud services and service providers, manual analysis of Service Level Agreements (SLA), comparison between different service offerings and conformance regulation has become a difficult task for customers. Cloud SLAs are policy documents describing the legal agreement between cloud providers and customers. SLA specifies the commitment of availability, performance of services, penalties associated with violations and procedure for customers to receive compensations in case of service disruptions. The aim of our research is to develop technology solutions for automated cloud service management using Semantic Web and Text Mining techniques. In this paper we discuss in detail the challenges in automating cloud services management and present our preliminary work in extraction of knowledge from SLAs of different cloud services. We extracted two types of information from the SLA documents which can be useful for end users. First, the relationship between the service commitment and financial credit. We represented this information by enhancing the existing Cloud service ontology proposed by us in our previous research. Second, we extracted rules in the form of obligations and permissions from SLAs using modal and deontic logic formalizations. For our analysis, we considered six publicly available SLA documents from different cloud computing service providers.
May 2nd, 2016
He’s dead, Jim.
Google recently shut down the query interface to Freebase. All that is left of this innovative service is the ability to download a few final data dumps.
Freebase was launched nine years ago by Metaweb as an online source of structured data collected from Wikipedia and many other sources, including individual, user-submitted uploads and edits. Metaweb was acquired by Google in July 2010 and Freebase subsequently grew to have more than 2.4 billion facts about 44 million subjects. In December 2014, Google announced that it was closing Freebase and four months later it became read-only. Sometime this week the query interface was shut down.
I’ve enjoyed using Freebase in various projects in the past two years and found that it complemented DBpedia in many ways. Although its native semantics differed from that of RDF and OWL, it was close enough to allow all of Freebase to be exported as RDF. Its schema was larger than DBpedia’s and the data tended to be a bit cleaner.
Google generously decided to donate the data to the Wikidata project, which began migrating Freebase’s data to Wikidata in 2015. The Freebase data also lives on as part of Google’s Knowledge Graph. Google recently allowed very limited querying of its knowledge graph and my limited experimenting with it suggests that has Freebase data at its core.
May 1st, 2016
Representing and Reasoning with Temporal
Properties/Relations in OWL/RDF
10:30-11:30 Monday, 2 May 2016, ITE346
OWL ontologies offer the means for modeling real-world domains by representing their high-level concepts, properties and interrelationships. These concepts and their properties are connected by means of binary relations. However, this assumes that the model of the domain is either a set of static objects and relationships that do not change over time, or a snapshot of these objects at a particular point in time. In general, relationships between objects that change over time (dynamic properties) are not binary relations, since they involve a temporal interval in addition to the object and the subject. Representing and querying information evolving in time requires careful consideration of how to use OWL constructs to model dynamic relationships and how the semantics and reasoning capabilities within that architecture are affected.
December 16th, 2015
Zareen Syed, Ankur Padia, Tim Finin, Lisa Mathews and Anupam Joshi, UCO: Unified Cybersecurity Ontology
, AAAI Workshop on Artificial Intelligence for Cyber Security (AICS), February 2016.
In this paper we describe the Unified Cybersecurity Ontology (UCO) that is intended to support information integration and cyber situational awareness in cybersecurity systems. The ontology incorporates and integrates heterogeneous data and knowledge schemas from different cybersecurity systems and most commonly used cybersecurity standards for information sharing and exchange. The UCO ontology has also been mapped to a number of existing cybersecurity ontologies as well as concepts in the Linked Open Data cloud. Similar to DBpedia which serves as the core for general knowledge in Linked Open Data cloud, we envision UCO to serve as the core for cybersecurity domain, which would evolve and grow with the passage of time with additional cybersecurity data sets as they become available. We also present a prototype system and concrete use cases supported by the UCO ontology. To the best of our knowledge, this is the first cybersecurity ontology that has been mapped to general world ontologies to support broader and diverse security use cases. We compare the resulting ontology with previous efforts, discuss its strengths and limitations, and describe potential future work directions.
June 8th, 2015
Coldstart is a task in the NIST Text Analysis Conference’s Knowledge Base Population suite that combines entity linking and slot filling to populate an empty knowledge base using a predefined ontology for the facts and relations. This paper describes a system developed by the Human Language Technology Center of Excellence at Johns Hopkins University for the 2014 Coldstart task.
Tim Finin, Paul McNamee, Dawn Lawrie, James Mayfield and Craig Harman, Hot Stuff at Cold Start: HLTCOE participation at TAC 2014, 7th Text Analysis Conference, National Institute of Standards and Technology, Nov. 2014.
The JHU HLTCOE participated in the Cold Start task in this year’s Text Analysis Conference Knowledge Base Population evaluation. This is our third year of participation in the task, and we continued our research with the KELVIN system. We submitted experimental variants that explore use of forward-chaining inference, slightly more aggressive entity clustering, refined multiple within-document conference, and prioritization of relations extracted from news sources.
June 8th, 2015
The NSF-sponsored Platys project explored the idea that places are more than just GPS coordinates. They are concepts rich with semantic information, including people, activities, roles, functions, time and purpose. Our mobile phones can learn to recognize the places we are in and use information about them to provide better services.
Laura Zavala, Pradeep K. Murukannaiah, Nithyananthan Poosamani, Tim Finin, Anupam Joshi, Injong Rhee and Munindar P. Singh, Platys: From Position to Place-Oriented Mobile Computing, AI Magazine, v36, n2, 2015.
The Platys project focuses on developing a high-level, semantic notion of location called place. A place, unlike a geospatial position, derives its meaning from a user’s actions and interactions in addition to the physical location where it occurs. Our aim is to enable the construction of a large variety of applications that take advantage of place to render relevant content and functionality and, thus, improve user experience. We consider elements of context that are particularly related to mobile computing. The main problems we have addressed to realize our place-oriented mobile computing vision are representing places, recognizing places, and engineering place-aware applications. We describe the approaches we have developed for addressing these problems and related subproblems. A key element of our work is the use of collaborative information sharing where users’ devices share and integrate knowledge about places. Our place ontology facilitates such collaboration. Declarative privacy policies allow users to specify contextual features under which they prefer to share or not share their information.
June 5th, 2015
New paper: Zareen Syed, Tim Finin, Muhammad Rahman, James Kukla and Jeehye Yun, Discovering and Querying Hybrid Linked Data, Third Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, held in conjunction with the 12th Extended Semantic Web Conference, Portoroz Slovenia, June 2015.
In this paper, we present a unified framework for discovering and querying hybrid linked data. We describe our approach to developing a natural language query interface for a hybrid knowledge base Wikitology, and present that as a case study for accessing hybrid information sources with structured and unstructured data through natural language queries. We evaluate our system on a publicly available dataset and demonstrate improvements over a baseline system. We describe limitations of our approach and also discuss cases where our system can complement other structured data querying systems by retrieving additional answers not available in structured sources.
December 29th, 2014
TABEL — A Domain Independent and Extensible Framework
for Inferring the Semantics of Tables
8:00am Thursday, 8 January 2015, ITE325b
Tables are an integral part of documents, reports and Web pages in many scientific and technical domains, compactly encoding important information that can be difficult to express in text. Table-like structures outside documents, such as spreadsheets, CSV files, log files and databases, are widely used to represent and share information. However, tables remain beyond the scope of regular text processing systems which often treat them like free text.
This dissertation presents TABEL — a domain independent and extensible framework to infer the semantics of tables and represent them as RDF Linked Data. TABEL captures the intended meaning of a table by mapping header cells to classes, data cell values to existing entities and pair of columns to relations from an given ontology and knowledge base. The core of the framework consists of a module that represents a table as a graphical model to jointly infer the semantics of headers, data cells and relation between headers. We also introduce a novel Semantic Message Passing scheme, which incorporates semantics into message passing, to perform joint inference over the probabilistic graphical model. We also develop and explore a “human-in-the-loop” paradigm, presenting plausible models of user interaction with our framework and its impact on the quality of inferred semantics.
We present techniques that are both extensible and domain agnostic. Our framework supports the addition of preprocessing modules without affecting existing ones, making TABEL extensible. It also allows background knowledge bases to be adapted and changed based on the domains of the tables, thus making it domain independent. We demonstrate the extensibility and domain independence of our techniques by developing an application of TABEL in the healthcare domain. We develop a proof of concept for an application to generate meta-analysis reports automatically, which is built on top of the semantics inferred from tables found in medical literature.
A thorough evaluation with experiments over dataset of tables from the Web and medical research reports presents promising results.
Committee: Drs. Tim Finin (chair), Tim Oates, Anupam Joshi, Yun Peng, Indrajit Bhattacharya (IBM Research) and L. V. Subramaniam (IBM Research)
July 15th, 2014
In The trouble with DBpedia, Paul Houle talks about the problems he sees in DBpedia, Freebase and Wikidata and offers up :BaseKB as a better “generic database” that models concepts that are in people’s shared consciousness.
:BaseKB is a purified version of Freebase which is compatible with industry-standard RDF tools. By removing hundreds of millions of duplicate, invalid, or unnecessary facts, :BaseKB users speed up their development cycles dramatically when compared to the source Freebase dumps.
:BaseKB is available for commercial and academic use under a CC-BY license. Weekly versions (:BaseKB Now) can be downloaded from Amazon S3 on a “requester-paid basis”, estimated at $3.00US per download. There are also BaseKB Gold releases which are periodic :BaseKB Now snapshots. These can be downloaded free via Bittorrent or purchased as a Blu Ray disc.
It looks like it’s worth checking out!
January 30th, 2014
Today’s online meeting (Jan 30, 12:30-2:30 EST) in the 2014 Ontology Summit series is part of the Tools, Services, and Techniques track and features presentations by
- Dr. ChrisWelty (IBM Research) on “Inside the Mind of Watson – a Natural Language Question Answering Service Powered by the Web of Data and Ontologies”
- Prof. AlanRector (U. Manchester) on “Axioms & Templates: Distinctions and Transformations amongst Ontologies, Frames, & Information Models
- Professor TillMossakowski (U. Magdeburg) on “Challenges in Scaling Tools for Ontologies to the Semantic Web: Experiences with Hets and OntoHub”
Audio via phone (206-402-0100) or Skype. See the session page for details and access to slides.