May 11th, 2015
Information Extraction from Dirty Notes
for Clinical Decision Support
10:00am Tuesday, 12 May 2015, ITE346
The term clinical decision support refers broadly to providing clinicians or patients with computer-generated clinical knowledge and patient-related information, intelligently filtered or presented at appropriate times, to enhance patient care. It is estimated that at least 50% of the clinical information describing a patient’s current condition and stage of therapy resides in the free-form text portions of the Electronic Health Record (EHR). Both linguistic and statistical natural language processing (NLP) models assume the presence of a formal underlying grammar in the text. Yet, clinical notes are often times filled with overloaded and nonstandard abbreviations, sentence fragments, and creative punctuation that make it difficult for grammar-based NLP systems to work effectively. This research focuses on investigating scalable machine learning and semantic techniques that do not rely on an underlying grammar to extract medical concepts in the text in order to apply them in CDS on commodity hardware and software systems. Additionally, by packaging the extracted data within a semantic knowledge representation, the facts can be combined with other semantically encoded facts and reasoned over to help to inform clinicians in their decision making.
April 27th, 2015
In this weeks ebiquity lab meeting, Ankur Padia will talk about ontology learning and the work he did for his MS thesis at 10:00am in ITE 346 at UMBC.
10:00am Tuesday, Apr. 28, 2015, ITE 346
Ontology Learning has been the subject of intensive study for the past decade. Researchers in this field have been motivated by the possibility of automatically building a knowledge base on top of text documents so as to support reasoning based knowledge extraction. While most works in this field have been primarily statistical (known as light-weight Ontology Learning) not much attempt has been made in axiomatic Ontology Learning (called Formal Ontology Learning) from Natural Language text documents. Presentation will focus on the relationship between Description Logic and Natural Language (limited to IS-A) for Formal Ontology Learning.
April 25th, 2015
Ph.D. Dissertation Defense
A Semantic Resolution Framework for Integrating
Manufacturing Service Capability Data
10:00am Monday 27 April 2015, ITE 217b
Building flexible manufacturing supply chains requires availability of interoperable and accurate manufacturing service capability (MSC) information of all supply chain participants. Today, MSC information, which is typically published either on the supplier’s web site or registered at an e-marketplace portal, has been shown to fall short of interoperability and accuracy requirements. The issue of interoperability can be addressed by annotating the MSC information using shared ontologies. However, this ontology-based approach faces three main challenges: (1) lack of an effective way to automatically extract a large volume of MSC instance data hidden in the web sites of manufacturers that need to be annotated; (2) difficulties in accurately identifying semantics of these extracted data and resolving semantic heterogeneities among individual sources of these data while integrating them under shared formal ontologies; (3) difficulties in the adoption of ontology-based approaches by the supply chain managers and users because of their unfamiliarity with the syntax and semantics of formal ontology languages such as the web ontology language (OWL).
The objective of our research is to address the main challenges of ontology-based approaches by developing an innovative approach that is able to extract MSC instances from a broad range of manufacturing web sites that may present MSC instances in various ways, accurately annotate MSC instances with formal defined semantics on a large scale, and integrate these annotated MSC instances into formal manufacturing domain ontologies to facilitate the formation of supply chains of manufacturers. To achieve this objective, we propose a semantic resolution framework (SRF) that consists of three main components: a MSC instance extractor, a MSC Instance annotator and a semantic resolution knowledge base. The instance extractor builds a local semantic model that we call instance description model (IDM) for each target manufacturer web site. The innovative aspect of the IDM is that it captures the intended structure of the target web site and associates each extracted MSC instance with a context that describes possible semantics of that instance. The instance annotator starts the semantic resolution by identifying the most appropriate class from a (or a set of) manufacturing domain ontology (or ontologies) (MDO) to annotate each instance based on the mappings established between the context of that instance and the vocabularies (i.e., classes and properties) defined in the MDO. The primary goal of the semantic resolution knowledge base (SR-KB) is to resolve semantic heterogeneity that may occur in the instance annotation process and thus improve the accuracy of the annotated MSC instances. The experimental results demonstrate that the instance extractor and the instance annotator can effectively discover and annotate MSC instances while the SR-KB is able to improve both precision and recall of annotated instances and reducing human involvement along with the evolution of the knowledge base.
Committee: Drs. Yun Peng (Chair), Tim Finin, Yaacov Yesha, Matthew Schmill and Boonserm Kulvatunyou
January 14th, 2015
The theme of the 2015 Ontology Summit is Internet of Things: Toward Smart Networked Systems and Societies. The Ontology Summit is an annual series of events (first started by Ontolog and NIST in 2006) that involve the ontology community and communities related to each year’s theme.
The 2015 Summit will hold a virtual discourse over the next three months via mailing lists and online panel sessions augmented conference calls. The Summit will culminate in a two-day face-to-face workshop on 13-14 April 2015 in Arlington, VA. The Summit’s goal is to explore how ontologies can play a significant role in the realization of smart networked systems and societies in the Internet of Things.
The Summit’s initial launch session will take place from 12:30pm to 2:00pm EDT on Thursday, January 15th and will include overview presentations from each of the four technical tracks. See the 2015 Ontology Summit for more information, the schedule and details on how to participate in these free an open events.
December 29th, 2014
TABEL — A Domain Independent and Extensible Framework
for Inferring the Semantics of Tables
8:00am Thursday, 8 January 2015, ITE325b
Tables are an integral part of documents, reports and Web pages in many scientific and technical domains, compactly encoding important information that can be difficult to express in text. Table-like structures outside documents, such as spreadsheets, CSV files, log files and databases, are widely used to represent and share information. However, tables remain beyond the scope of regular text processing systems which often treat them like free text.
This dissertation presents TABEL — a domain independent and extensible framework to infer the semantics of tables and represent them as RDF Linked Data. TABEL captures the intended meaning of a table by mapping header cells to classes, data cell values to existing entities and pair of columns to relations from an given ontology and knowledge base. The core of the framework consists of a module that represents a table as a graphical model to jointly infer the semantics of headers, data cells and relation between headers. We also introduce a novel Semantic Message Passing scheme, which incorporates semantics into message passing, to perform joint inference over the probabilistic graphical model. We also develop and explore a “human-in-the-loop” paradigm, presenting plausible models of user interaction with our framework and its impact on the quality of inferred semantics.
We present techniques that are both extensible and domain agnostic. Our framework supports the addition of preprocessing modules without affecting existing ones, making TABEL extensible. It also allows background knowledge bases to be adapted and changed based on the domains of the tables, thus making it domain independent. We demonstrate the extensibility and domain independence of our techniques by developing an application of TABEL in the healthcare domain. We develop a proof of concept for an application to generate meta-analysis reports automatically, which is built on top of the semantics inferred from tables found in medical literature.
A thorough evaluation with experiments over dataset of tables from the Web and medical research reports presents promising results.
Committee: Drs. Tim Finin (chair), Tim Oates, Anupam Joshi, Yun Peng, Indrajit Bhattacharya (IBM Research) and L. V. Subramaniam (IBM Research)
July 17th, 2014
Varish Mulwad, Tim Finin and Anupam Joshi, Interpreting Medical Tables as Linked Data to Generate Meta-Analysis Reports, 15th IEEE Int. Conf. on Information Reuse and Integration, Aug 2014.
Evidence-based medicine is the application of current medical evidence to patient care and typically uses quantitative data from research studies. It is increasingly driven by data on the efficacy of drug dosages and the correlations between various medical factors that are assembled and integrated through meta–analyses (i.e., systematic reviews) of data in tables from publications and clinical trial studies. We describe a important component of a system to automatically produce evidence reports that performs two key functions: (i) understanding the meaning of data in medical tables and (ii) identifying and retrieving relevant tables given a input query. We present modifications to our existing framework for inferring the semantics of tables and an ontology developed to model and represent medical tables in RDF. Representing medical tables as RDF makes it easier for the automatic extraction, integration and reuse of data from multiple studies, which is essential for generating meta–analyses reports. We show how relevant tables can be identified by querying over their RDF representations and describe two evaluation experiments: one on mapping medical tables to linked data and another on identifying tables relevant to a retrieval query.
July 15th, 2014
In The trouble with DBpedia, Paul Houle talks about the problems he sees in DBpedia, Freebase and Wikidata and offers up :BaseKB as a better “generic database” that models concepts that are in people’s shared consciousness.
:BaseKB is a purified version of Freebase which is compatible with industry-standard RDF tools. By removing hundreds of millions of duplicate, invalid, or unnecessary facts, :BaseKB users speed up their development cycles dramatically when compared to the source Freebase dumps.
:BaseKB is available for commercial and academic use under a CC-BY license. Weekly versions (:BaseKB Now) can be downloaded from Amazon S3 on a “requester-paid basis”, estimated at $3.00US per download. There are also BaseKB Gold releases which are periodic :BaseKB Now snapshots. These can be downloaded free via Bittorrent or purchased as a Blu Ray disc.
It looks like it’s worth checking out!
February 8th, 2014
In the first Ebiquity meeting of the semester, Vlad Korolev will talk about his work on using RDF for to capture, represent and use provenance information for big data experiments.
PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments
10-11:30am, ITE346, UMBC
Reproducibility of computations and data provenance are very important goals to achieve in order to improve the quality of one’s research. Unfortunately, despite some efforts made in the past, it is still very hard to reproduce computational experiments with high degree of certainty. The Big Data phenomenon in recent years makes this goal even harder to achieve. In this work, we propose a tool that aids researchers to improve reproducibility of their experiments through automated keeping of provenance records.
January 30th, 2014
Today’s online meeting (Jan 30, 12:30-2:30 EST) in the 2014 Ontology Summit series is part of the Tools, Services, and Techniques track and features presentations by
- Dr. ChrisWelty (IBM Research) on “Inside the Mind of Watson – a Natural Language Question Answering Service Powered by the Web of Data and Ontologies”
- Prof. AlanRector (U. Manchester) on “Axioms & Templates: Distinctions and Transformations amongst Ontologies, Frames, & Information Models
- Professor TillMossakowski (U. Magdeburg) on “Challenges in Scaling Tools for Ontologies to the Semantic Web: Experiences with Hets and OntoHub”
Audio via phone (206-402-0100) or Skype. See the session page for details and access to slides.
January 23rd, 2014
The first online session of the 2014 Ontology Summit on “Big Data and Semantic Web Meet Applied Ontology” takes place today (Thurday January 23) from 12:30pm to 2:30pm (EST, UTC-5) with topic Common Reusable Semantic Content — The Problems and Efforts to Address Them. The session will include four presentations:
followed by discussion.
Audio connection is via phone (206-402-0100, 141184#) or Skype with a shared screen and participant chatroom. See the session page for more details.
January 14th, 2014
The ninth Ontology Summit starts on Thursday, January 16 with the theme “Big Data and Semantic Web Meet Applied Ontology.” The event kicks off a three month series of weekly online meetings on Thursdays that feature presentations from expert panels and discussions with all of the participants. The series will culminate with a two day symposium on April 28-29 in Arlington VA. The sessions are free and open to all, including researchers, practitioners and students.
The first virtual meeting will be held 12:30-
2:00 2:30 (EST) on Thursday, January 16 and will introduce the nine different topical tracks in the series, their goals and organizers. Audio connection is via phone (206-402-0100, 141184#) or Skype with a shared screen and participant chatroom. See the session page for more details.
This year’s Ontology Summit is an opportunity for building bridges between the Semantic Web, Linked Data, Big Data, and Applied Ontology communities. On the one hand, the Semantic Web, Linked Data, and Big Data communities can bring a wide array of real problems (such as performance and scalability challenges and the variety problem in Big Data) and technologies (automated reasoning tools) that can make use of ontologies. On the other hand, the Applied Ontology community can bring a large body of common reusable content (ontologies) and ontological analysis techniques. Identifying and overcoming ontology engineering bottlenecks is critical for all communities.
The 2014 Ontology Summit is chaired by Michael Gruninger and Leo Obrst.
January 9th, 2014
Computer Science and Electrical Engineering
University of Maryland, Baltimore County
Ph.D. Dissertation Proposal
Functional Reference Ontology Development:
a Design Pattern Approach
1:00pm Friday, January 10, 2014, ITE325b, UMBC
The next generation of smart manufacturing systems will be developed by composing advanced manufacturing components and IT services introducing new technologies. These new technologies can lead to dramatic improvements in the ability to monitor, control, and optimize all aspects of manufacturing. The ability to compose advanced manufacturing components and IT services enhances agility, resiliency, and productivity of a manufacturing system. In order to make the composition possible, functional knowledge of manufacturing components and IT services should be captured and shared explicitly. Recent researches have shown that a semantically precise and rich reference functional ontology enables effective composition. However, since domains of factories and production networks are large, evolving, and heterogeneous, developing a reference functional ontology is a challenging task. Specifically, conceptual functionality modeling that characterizes various features of manufacturing components and IT services at different levels of abstraction is a difficult task. Even if the reference functional ontology is developed successfully, there will certainly be interoperability issues between the reference functional ontology and local proprietary information models. Firstly, the conceptual conflict issues may arise primarily from the fact that the reference functional ontology does not reflect actual users’ or providers’ conceptualizations. Secondly, structural conflict issues may arise from diverse modeling choices in local, proprietary information models.
The objective of our research is to assess utility of design patterns in addressing the issues in the reference functional ontology development, specifically OWL ontology design patterns (ODPs). To achieve the objective, we will assess inductive approaches to identifying the ODPs, and explore development of a methodology for resolving structural differences between the reference functional ontology and local proprietary information models. The key potential contributions of this work include 1) new method to identify information patterns of functionalities in manufacturing components and IT services, 2) new inductive ODP development process which starts with the pattern definition of the specific functionality concepts, with subsequent grouping of these patterns into more general patterns, and 3) ODP-based ontology transformation to resolve structural conflicts between the reference functional ontology and local proprietary information models.
Committee: Drs. Yun Peng (chair), Tim Finin, Yelena Yesha, Milton Halem, Nenad Ivezic (NIST) and Boonserm Kulvatunyou (NIST)