UMBC ebiquity
Semantic Web

Archive for the 'Semantic Web' Category

UMBC Schema Free Query system on ESWC Schema-agnostic Queries over Linked Data

June 7th, 2015, by Tim Finin, posted in Machine Learning, NLP, RDF, Semantic Web

This year’s ESWC Semantic Web Evaluation Challenge track had a task on Schema-agnostic Queries over Linked Data: SAQ-2015. The idea is to support a SPARQL-like query language that does not require knowing the underlying graph schema nor the URIs to use for terms and individuals, as in the follwing examples.

 SELECT ?y {BillClinton hasDaughter ?x. ?x marriedTo ?y.}

 SELECT ?x {?x isA book. ?x by William_Goldman.
            ?x has_pages ?p. FILTER (?p > 300)}

We adapted our Schema Free Querying system to the task as described in the following paper.


Zareen Syed, Lushan Han, Muhammad Mahbubur Rahman, Tim Finin, James Kukla and Jeehye Yun, UMBC_Ebiquity-SFQ: Schema Free Querying System, ESWC Semantic Web Evaluation Challenge, Extended Semantic Web Conference, June 2015.

Users need better ways to explore large complex linked data resources. Using SPARQL requires not only mastering its syntax and semantics but also understanding the RDF data model, the ontology and URIs for entities of interest. Natural language question answering systems solve the problem, but these are still subjects of research. The Schema agnostic SPARQL queries task defined in SAQ-2015 challenge consists of schema-agnostic queries following the syntax of the SPARQL standard, where the syntax and semantics of operators are maintained, while users are free to choose words, phrases and entity names irrespective of the underlying schema or ontology. This combination of query skeleton with keywords helps to remove some of the ambiguity. We describe our framework for handling schema agnostic or schema free queries and discuss enhancements to handle the SAQ-2015 challenge queries. The key contributions are the robust methods that combine statistical association and semantic similarity to map user terms to the most appropriate classes and properties used in the underlying ontology and type inference for user input concepts based on concept linking.

Querying RDF Data with Text Annotated Graphs

June 6th, 2015, by Tim Finin, posted in Big data, Database, Machine Learning, RDF, Semantic Web

New paper: Lushan Han, Tim Finin, Anupam Joshi and Doreen Cheng, Querying RDF Data with Text Annotated Graphs, 27th International Conference on Scientific and Statistical Database Management, San Diego, June 2015.

Scientists and casual users need better ways to query RDF databases or Linked Open Data. Using the SPARQL query language requires not only mastering its syntax and semantics but also understanding the RDF data model, the ontology used, and URIs for entities of interest. Natural language query systems are a powerful approach, but current techniques are brittle in addressing the ambiguity and complexity of natural language and require expensive labor to supply the extensive domain knowledge they need. We introduce a compromise in which users give a graphical “skeleton” for a query and annotates it with freely chosen words, phrases and entity names. We describe a framework for interpreting these “schema-agnostic queries” over open domain RDF data that automatically translates them to SPARQL queries. The framework uses semantic textual similarity to find mapping candidates and uses statistical approaches to learn domain knowledge for disambiguation, thus avoiding expensive human efforts required by natural language interface systems. We demonstrate the feasibility of the approach with an implementation that performs well in an evaluation on DBpedia data.

Discovering and Querying Hybrid Linked Data

June 5th, 2015, by Tim Finin, posted in Big data, KR, Machine Learning, Semantic Web

g6Ibfkd

New paper: Zareen Syed, Tim Finin, Muhammad Rahman, James Kukla and Jeehye Yun, Discovering and Querying Hybrid Linked Data, Third Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, held in conjunction with the 12th Extended Semantic Web Conference, Portoroz Slovenia, June 2015.

In this paper, we present a unified framework for discovering and querying hybrid linked data. We describe our approach to developing a natural language query interface for a hybrid knowledge base Wikitology, and present that as a case study for accessing hybrid information sources with structured and unstructured data through natural language queries. We evaluate our system on a publicly available dataset and demonstrate improvements over a baseline system. We describe limitations of our approach and also discuss cases where our system can complement other structured data querying systems by retrieving additional answers not available in structured sources.

talk: Amit Sheth on Transforming Big data into Smart Data, 11a Tue 5/26

May 17th, 2015, by Tim Finin, posted in Big data, Semantic Web

Transforming big data into smart data:
deriving value via harnessing volume, variety
and velocity using semantics and semantic web

Professor Amit Sheth
Wright State University

11:00am Tuesday, 26 May 2015, ITE 325, UMBC

Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, "How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?" As I will show, Smart Data that gives such personalized and actionable information will need to utilize multimodal data and their metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on Machine Learning and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models. I will present a couple of Smart Data applications in development at Kno.e.sis from the domains of personalized health, health informatics, social data for social good, energy, disaster response, and smart city.

Amit Sheth is an Educator, Researcher and Entrepreneur. He is the LexisNexis Ohio Eminent Scholar, an IEEE Fellow, and the executive director of Kno.e.sis – the Ohio Center of Excellence in Knowledge-enabled Computing a Wright State University. In World Wide Web (WWW), it is placed among the top ten universities in the world based on 10-year impact. Prof. Sheth is a well cited computer scientists (h-index = 87, >30,000 citations), and appears among top 1-3 authors in World Wide Web (Microsoft Academic Search). He has founded two companies, and several commercial products and deployed systems have resulted from his research. His students are exceptionally successful; ten out of 18 past PhD students have 1,000+ citations each.

Host: Yelena Yesha, yeyesha2umbc.edu

Clare Grasso: Information Extraction from Dirty Notes for Clinical Decision Support

May 11th, 2015, by Tim Finin, posted in Machine Learning, NLP, Ontologies, Semantic Web

Information Extraction from Dirty Notes
for Clinical Decision Support

Clare Grasso

10:00am Tuesday, 12 May 2015, ITE346

The term clinical decision support refers broadly to providing clinicians or patients with computer-generated clinical knowledge and patient-related information, intelligently filtered or presented at appropriate times, to enhance patient care. It is estimated that at least 50% of the clinical information describing a patient’s current condition and stage of therapy resides in the free-form text portions of the Electronic Health Record (EHR). Both linguistic and statistical natural language processing (NLP) models assume the presence of a formal underlying grammar in the text. Yet, clinical notes are often times filled with overloaded and nonstandard abbreviations, sentence fragments, and creative punctuation that make it difficult for grammar-based NLP systems to work effectively. This research focuses on investigating scalable machine learning and semantic techniques that do not rely on an underlying grammar to extract medical concepts in the text in order to apply them in CDS on commodity hardware and software systems. Additionally, by packaging the extracted data within a semantic knowledge representation, the facts can be combined with other semantically encoded facts and reasoned over to help to inform clinicians in their decision making.

Ankur Padia on Ontology Learning, 10am ITE346

April 27th, 2015, by Tim Finin, posted in NLP, Ontologies, OWL, RDF, Semantic Web

In this weeks ebiquity lab meeting, Ankur Padia will talk about ontology learning and the work he did for his MS thesis at 10:00am in ITE 346 at UMBC.

Ontology Learning

Ankur Padia

10:00am Tuesday, Apr. 28, 2015, ITE 346

Ontology Learning has been the subject of intensive study for the past decade. Researchers in this field have been motivated by the possibility of automatically building a knowledge base on top of text documents so as to support reasoning based knowledge extraction. While most works in this field have been primarily statistical (known as light-weight Ontology Learning) not much attempt has been made in axiomatic Ontology Learning (called Formal Ontology Learning) from Natural Language text documents. Presentation will focus on the relationship between Description Logic and Natural Language (limited to IS-A) for Formal Ontology Learning.

PhD defense: Semantic Resolution Framework for Integrating Manufacturing Service Capability Data

April 25th, 2015, by Tim Finin, posted in AI, Ontologies, OWL, Semantic Web

Ph.D. Dissertation Defense

A Semantic Resolution Framework for Integrating
Manufacturing Service Capability Data

Yan Kang

10:00am Monday 27 April 2015, ITE 217b

Building flexible manufacturing supply chains requires availability of interoperable and accurate manufacturing service capability (MSC) information of all supply chain participants. Today, MSC information, which is typically published either on the supplier’s web site or registered at an e-marketplace portal, has been shown to fall short of interoperability and accuracy requirements. The issue of interoperability can be addressed by annotating the MSC information using shared ontologies. However, this ontology-based approach faces three main challenges: (1) lack of an effective way to automatically extract a large volume of MSC instance data hidden in the web sites of manufacturers that need to be annotated; (2) difficulties in accurately identifying semantics of these extracted data and resolving semantic heterogeneities among individual sources of these data while integrating them under shared formal ontologies; (3) difficulties in the adoption of ontology-based approaches by the supply chain managers and users because of their unfamiliarity with the syntax and semantics of formal ontology languages such as the web ontology language (OWL).

The objective of our research is to address the main challenges of ontology-based approaches by developing an innovative approach that is able to extract MSC instances from a broad range of manufacturing web sites that may present MSC instances in various ways, accurately annotate MSC instances with formal defined semantics on a large scale, and integrate these annotated MSC instances into formal manufacturing domain ontologies to facilitate the formation of supply chains of manufacturers. To achieve this objective, we propose a semantic resolution framework (SRF) that consists of three main components: a MSC instance extractor, a MSC Instance annotator and a semantic resolution knowledge base. The instance extractor builds a local semantic model that we call instance description model (IDM) for each target manufacturer web site. The innovative aspect of the IDM is that it captures the intended structure of the target web site and associates each extracted MSC instance with a context that describes possible semantics of that instance. The instance annotator starts the semantic resolution by identifying the most appropriate class from a (or a set of) manufacturing domain ontology (or ontologies) (MDO) to annotate each instance based on the mappings established between the context of that instance and the vocabularies (i.e., classes and properties) defined in the MDO. The primary goal of the semantic resolution knowledge base (SR-KB) is to resolve semantic heterogeneity that may occur in the instance annotation process and thus improve the accuracy of the annotated MSC instances. The experimental results demonstrate that the instance extractor and the instance annotator can effectively discover and annotate MSC instances while the SR-KB is able to improve both precision and recall of annotated instances and reducing human involvement along with the evolution of the knowledge base.

Committee: Drs. Yun Peng (Chair), Tim Finin, Yaacov Yesha, Matthew Schmill and Boonserm Kulvatunyou

Access control for a triplestore linked data fragments interface

April 19th, 2015, by Tim Finin, posted in OWL, Privacy, RDF, Security, Semantic Web

In this week’s meeting (10-11am Tue, April 21), Ankur Padia will present work in progress on providing access control to an RDF triple store.

Triple store access control for a linked data fragments interface
Ankur Padia, UMBC

The maturation of Semantic Web standards and associated web-based data representations such as schema.org have made RDF a popular model for representing graph data and semi-structured knowledge. Triple stores are used to store and query an RDF dataset and often expose a SPARQL endpoint service on the Web for public access. Most existing SPARQL endpoints support very simple access control mechanisms if any at all, preventing their use for many applications where fine-grained privacy or data security is important. We describe new work on access control for a linked data fragments interface, i.e. one that accepts queries consisting one or more triple patterns and responds with all matching triples that the authenticated querier can access.

2015 Ontology Summit: Internet of Things: Toward Smart Networked Systems and Societies

January 14th, 2015, by Tim Finin, posted in Agents, AI, Big data, Ontologies, Semantic Web, Web

The Internet of Things (IoT) is the interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure.

The theme of the 2015 Ontology Summit is Internet of Things: Toward Smart Networked Systems and Societies. The Ontology Summit is an annual series of events (first started by Ontolog and NIST in 2006) that involve the ontology community and communities related to each year’s theme.

The 2015 Summit will hold a virtual discourse over the next three months via mailing lists and online panel sessions augmented conference calls. The Summit will culminate in a two-day face-to-face workshop on 13-14 April 2015 in Arlington, VA. The Summit’s goal is to explore how ontologies can play a significant role in the realization of smart networked systems and societies in the Internet of Things.

The Summit’s initial launch session will take place from 12:30pm to 2:00pm EDT on Thursday, January 15th and will include overview presentations from each of the four technical tracks. See the 2015 Ontology Summit for more information, the schedule and details on how to participate in these free an open events.

PhD defense: Varish Mulwad — Inferring the Semantics of Tables

December 29th, 2014, by Tim Finin, posted in KR, Machine Learning, NLP, Ontologies, Semantic Web

vm500

Dissertation Defense

TABEL — A Domain Independent and Extensible Framework
for Inferring the Semantics of Tables

Varish Vyankatesh Mulwad

8:00am Thursday, 8 January 2015, ITE325b

Tables are an integral part of documents, reports and Web pages in many scientific and technical domains, compactly encoding important information that can be difficult to express in text. Table-like structures outside documents, such as spreadsheets, CSV files, log files and databases, are widely used to represent and share information. However, tables remain beyond the scope of regular text processing systems which often treat them like free text.

This dissertation presents TABEL — a domain independent and extensible framework to infer the semantics of tables and represent them as RDF Linked Data. TABEL captures the intended meaning of a table by mapping header cells to classes, data cell values to existing entities and pair of columns to relations from an given ontology and knowledge base. The core of the framework consists of a module that represents a table as a graphical model to jointly infer the semantics of headers, data cells and relation between headers. We also introduce a novel Semantic Message Passing scheme, which incorporates semantics into message passing, to perform joint inference over the probabilistic graphical model. We also develop and explore a “human-in-the-loop” paradigm, presenting plausible models of user interaction with our framework and its impact on the quality of inferred semantics.

We present techniques that are both extensible and domain agnostic. Our framework supports the addition of preprocessing modules without affecting existing ones, making TABEL extensible. It also allows background knowledge bases to be adapted and changed based on the domains of the tables, thus making it domain independent. We demonstrate the extensibility and domain independence of our techniques by developing an application of TABEL in the healthcare domain. We develop a proof of concept for an application to generate meta-analysis reports automatically, which is built on top of the semantics inferred from tables found in medical literature.

A thorough evaluation with experiments over dataset of tables from the Web and medical research reports presents promising results.

Committee: Drs. Tim Finin (chair), Tim Oates, Anupam Joshi, Yun Peng, Indrajit Bhattacharya (IBM Research) and L. V. Subramaniam (IBM Research)

Semantics for Privacy and Shared Context

December 15th, 2014, by Tim Finin, posted in Mobile Computing, OWL, Policy, RDF, Semantic Web

Roberto Yus, Primal Pappachan, Prajit Das, Tim Finin, Anupam Joshi, and Eduardo Mena, Semantics for Privacy and Shared Context, Workshop on Society, Privacy and the Semantic Web-Policy and Technology, held at Int. Semantic Web Conf., Oct. 2014.

Capturing, maintaining, and using context information helps mobile applications provide better services and generates data useful in specifying information sharing policies. Obtaining the full benefit of context information requires a rich and expressive representation that is grounded in shared semantic models. We summarize some of our past work on representing and using context models and briefly describe Triveni, a system for cross-device context discovery and enrichment. Triveni represents context in RDF and OWL and reasons over context models to infer additional information and detect and resolve ambiguities and inconsistencies. A unique feature, its ability to create and manage “contextual groups” of users in an environment, enables their members to share context information using wireless ad-hoc networks. Thus, it enriches the information about a user’s context by creating mobile ad hoc knowledge networks.

Wikidata article in CACM

October 12th, 2014, by Tim Finin, posted in Semantic Web, Web, Wikipedia

Wikidata-logo-en

I just noticed that Denny Vrandecic and Markus Krötzsch have an article on Wikidata in the latest CACM. Good work! Even better, it’s available without subscription.

Wikidata: a free collaborative knowledgebase, Denny Vrandecic and Markus Krötzsch, Communications of the ACM, v57, n10 (2014), pp 78-85.

“This collaboratively edited knowledgebase provides a common source of data for Wikipedia, and everyone else.

Unnoticed by most of its readers, Wikipedia continues to undergo dramatic changes, as its sister project Wikidata introduces a new multilingual “Wikipedia for data” (http://www.wikidata.org) to manage the factual information of the popular online encyclopedia. With Wikipedia’s data becoming cleaned and integrated in a single location, opportunities arise for many new applications.”

You are currently browsing the archives for the Semantic Web category.

  Home | Archive | Login | Feed