UMBC ebiquity
Machine Learning

Archive for the 'Machine Learning' Category

Platys: From Position to Place-Oriented Mobile Computing

June 8th, 2015, by Tim Finin, posted in AI, KR, Machine Learning, Mobile Computing, Ontologies

The NSF-sponsored Platys project explored the idea that places are more than just GPS coordinates. They are concepts rich with semantic information, including people, activities, roles, functions, time and purpose. Our mobile phones can learn to recognize the places we are in and use information about them to provide better services.

Laura Zavala, Pradeep K. Murukannaiah, Nithyananthan Poosamani, Tim Finin, Anupam Joshi, Injong Rhee and Munindar P. Singh, Platys: From Position to Place-Oriented Mobile Computing, AI Magazine, v36, n2, 2015.

The Platys project focuses on developing a high-level, semantic notion of location called place. A place, unlike a geospatial position, derives its meaning from a user’s actions and interactions in addition to the physical location where it occurs. Our aim is to enable the construction of a large variety of applications that take advantage of place to render relevant content and functionality and, thus, improve user experience. We consider elements of context that are particularly related to mobile computing. The main problems we have addressed to realize our place-oriented mobile computing vision are representing places, recognizing places, and engineering place-aware applications. We describe the approaches we have developed for addressing these problems and related subproblems. A key element of our work is the use of collaborative information sharing where users’ devices share and integrate knowledge about places. Our place ontology facilitates such collaboration. Declarative privacy policies allow users to specify contextual features under which they prefer to share or not share their information.

UMBC Schema Free Query system on ESWC Schema-agnostic Queries over Linked Data

June 7th, 2015, by Tim Finin, posted in Machine Learning, NLP, RDF, Semantic Web

This year’s ESWC Semantic Web Evaluation Challenge track had a task on Schema-agnostic Queries over Linked Data: SAQ-2015. The idea is to support a SPARQL-like query language that does not require knowing the underlying graph schema nor the URIs to use for terms and individuals, as in the follwing examples.

 SELECT ?y {BillClinton hasDaughter ?x. ?x marriedTo ?y.}

 SELECT ?x {?x isA book. ?x by William_Goldman.
            ?x has_pages ?p. FILTER (?p > 300)}

We adapted our Schema Free Querying system to the task as described in the following paper.

Zareen Syed, Lushan Han, Muhammad Mahbubur Rahman, Tim Finin, James Kukla and Jeehye Yun, UMBC_Ebiquity-SFQ: Schema Free Querying System, ESWC Semantic Web Evaluation Challenge, Extended Semantic Web Conference, June 2015.

Users need better ways to explore large complex linked data resources. Using SPARQL requires not only mastering its syntax and semantics but also understanding the RDF data model, the ontology and URIs for entities of interest. Natural language question answering systems solve the problem, but these are still subjects of research. The Schema agnostic SPARQL queries task defined in SAQ-2015 challenge consists of schema-agnostic queries following the syntax of the SPARQL standard, where the syntax and semantics of operators are maintained, while users are free to choose words, phrases and entity names irrespective of the underlying schema or ontology. This combination of query skeleton with keywords helps to remove some of the ambiguity. We describe our framework for handling schema agnostic or schema free queries and discuss enhancements to handle the SAQ-2015 challenge queries. The key contributions are the robust methods that combine statistical association and semantic similarity to map user terms to the most appropriate classes and properties used in the underlying ontology and type inference for user input concepts based on concept linking.

Querying RDF Data with Text Annotated Graphs

June 6th, 2015, by Tim Finin, posted in Big data, Database, Machine Learning, RDF, Semantic Web

New paper: Lushan Han, Tim Finin, Anupam Joshi and Doreen Cheng, Querying RDF Data with Text Annotated Graphs, 27th International Conference on Scientific and Statistical Database Management, San Diego, June 2015.

Scientists and casual users need better ways to query RDF databases or Linked Open Data. Using the SPARQL query language requires not only mastering its syntax and semantics but also understanding the RDF data model, the ontology used, and URIs for entities of interest. Natural language query systems are a powerful approach, but current techniques are brittle in addressing the ambiguity and complexity of natural language and require expensive labor to supply the extensive domain knowledge they need. We introduce a compromise in which users give a graphical “skeleton” for a query and annotates it with freely chosen words, phrases and entity names. We describe a framework for interpreting these “schema-agnostic queries” over open domain RDF data that automatically translates them to SPARQL queries. The framework uses semantic textual similarity to find mapping candidates and uses statistical approaches to learn domain knowledge for disambiguation, thus avoiding expensive human efforts required by natural language interface systems. We demonstrate the feasibility of the approach with an implementation that performs well in an evaluation on DBpedia data.

Discovering and Querying Hybrid Linked Data

June 5th, 2015, by Tim Finin, posted in Big data, KR, Machine Learning, Semantic Web


New paper: Zareen Syed, Tim Finin, Muhammad Rahman, James Kukla and Jeehye Yun, Discovering and Querying Hybrid Linked Data, Third Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, held in conjunction with the 12th Extended Semantic Web Conference, Portoroz Slovenia, June 2015.

In this paper, we present a unified framework for discovering and querying hybrid linked data. We describe our approach to developing a natural language query interface for a hybrid knowledge base Wikitology, and present that as a case study for accessing hybrid information sources with structured and unstructured data through natural language queries. We evaluate our system on a publicly available dataset and demonstrate improvements over a baseline system. We describe limitations of our approach and also discuss cases where our system can complement other structured data querying systems by retrieving additional answers not available in structured sources.

Clare Grasso: Information Extraction from Dirty Notes for Clinical Decision Support

May 11th, 2015, by Tim Finin, posted in Machine Learning, NLP, Ontologies, Semantic Web

Information Extraction from Dirty Notes
for Clinical Decision Support

Clare Grasso

10:00am Tuesday, 12 May 2015, ITE346

The term clinical decision support refers broadly to providing clinicians or patients with computer-generated clinical knowledge and patient-related information, intelligently filtered or presented at appropriate times, to enhance patient care. It is estimated that at least 50% of the clinical information describing a patient’s current condition and stage of therapy resides in the free-form text portions of the Electronic Health Record (EHR). Both linguistic and statistical natural language processing (NLP) models assume the presence of a formal underlying grammar in the text. Yet, clinical notes are often times filled with overloaded and nonstandard abbreviations, sentence fragments, and creative punctuation that make it difficult for grammar-based NLP systems to work effectively. This research focuses on investigating scalable machine learning and semantic techniques that do not rely on an underlying grammar to extract medical concepts in the text in order to apply them in CDS on commodity hardware and software systems. Additionally, by packaging the extracted data within a semantic knowledge representation, the facts can be combined with other semantically encoded facts and reasoned over to help to inform clinicians in their decision making.

Mid-Atlantic Student Colloquium on Speech, Language & Learning, Fri. 1/30

January 25th, 2015, by Tim Finin, posted in Machine Learning, NLP

The fourth Mid-Atlantic Student Colloquium on Speech, Language and Learning (MASC-SLL) will he held at JHU this coming Friday, January 30. It’s a good opportunity to sample current research on language technology and machine learning, including the work of a number of UMBC students. The program for the one-day colloquium includes oral presentations, poster sessions, a panel and three breakout sessions.

The event is free and open to all, but registration is requested by Tuesday, January 27. Note that the location has been moved to the Glass Pavilion on the JHU Homewood Campus

Facebook releases GPU-optimized deep learning tools

January 17th, 2015, by Tim Finin, posted in AI, High performance computing, Machine Learning

Facebook’s AI Research (FAIR) group has released open-source, optimized deep-learning modules for their open sourced Torch development environment for numerics, machine learning, and computer vision, with a particular emphasis on deep learning and convolutional nets.

The release includes GPU-optimized modules for large convolutional nets and networks with sparse activations that are commonly used in NLP applications.

See fbcunn for installation instructions, documentation and examples to train classifiers and iTorch for an IPython Kernel for Torch.

PhD defense: Varish Mulwad — Inferring the Semantics of Tables

December 29th, 2014, by Tim Finin, posted in KR, Machine Learning, NLP, Ontologies, Semantic Web


Dissertation Defense

TABEL — A Domain Independent and Extensible Framework
for Inferring the Semantics of Tables

Varish Vyankatesh Mulwad

8:00am Thursday, 8 January 2015, ITE325b

Tables are an integral part of documents, reports and Web pages in many scientific and technical domains, compactly encoding important information that can be difficult to express in text. Table-like structures outside documents, such as spreadsheets, CSV files, log files and databases, are widely used to represent and share information. However, tables remain beyond the scope of regular text processing systems which often treat them like free text.

This dissertation presents TABEL — a domain independent and extensible framework to infer the semantics of tables and represent them as RDF Linked Data. TABEL captures the intended meaning of a table by mapping header cells to classes, data cell values to existing entities and pair of columns to relations from an given ontology and knowledge base. The core of the framework consists of a module that represents a table as a graphical model to jointly infer the semantics of headers, data cells and relation between headers. We also introduce a novel Semantic Message Passing scheme, which incorporates semantics into message passing, to perform joint inference over the probabilistic graphical model. We also develop and explore a “human-in-the-loop” paradigm, presenting plausible models of user interaction with our framework and its impact on the quality of inferred semantics.

We present techniques that are both extensible and domain agnostic. Our framework supports the addition of preprocessing modules without affecting existing ones, making TABEL extensible. It also allows background knowledge bases to be adapted and changed based on the domains of the tables, thus making it domain independent. We demonstrate the extensibility and domain independence of our techniques by developing an application of TABEL in the healthcare domain. We develop a proof of concept for an application to generate meta-analysis reports automatically, which is built on top of the semantics inferred from tables found in medical literature.

A thorough evaluation with experiments over dataset of tables from the Web and medical research reports presents promising results.

Committee: Drs. Tim Finin (chair), Tim Oates, Anupam Joshi, Yun Peng, Indrajit Bhattacharya (IBM Research) and L. V. Subramaniam (IBM Research)

Taming Wild Big Data

September 17th, 2014, by Tim Finin, posted in Database, Datamining, Machine Learning, RDF, Semantic Web

Jennifer Sleeman and Tim Finin, Taming Wild Big Data, AAAI Fall Symposium on Natural Language Access to Big Data, Nov. 2014.

Wild Big Data is data that is hard to extract, understand, and use due to its heterogeneous nature and volume. It typically comes without a schema, is obtained from multiple sources and provides a challenge for information extraction and integration. We describe a way to subduing Wild Big Data that uses techniques and resources that are popular for processing natural language text. The approach is applicable to data that is presented as a graph of objects and relations between them and to tabular data that can be transformed into such a graph. We start by applying topic models to contextualize the data and then use the results to identify the potential types of the graph’s nodes by mapping them to known types found in large open ontologies such as Freebase, and DBpedia. The results allow us to assemble coarse clusters of objects that can then be used to interpret the link and perform entity disambiguation and record linking.

Rapalytics! Where Rap Meets Data Science

September 14th, 2014, by Tim Finin, posted in Machine Learning, NLP

UMBC Ebiquity Research Meeting

Rapalytics! Where Rap Meets Data Science

Abhay Kashyap

10:00am Wednesday, Sept. 17, 2014, ITE 346

For the Hip-Hop Fans: Remember the times when you had those long arguments with your friends about who the better rapper is? Remember how it always ended up in a stalemate because there was no evidence to back your argument? Well, look no further! Rapalytics is a one-stop site dedicated to extracting and presenting all the important analytics from Rap lyrics that separate a good rapper from a great one!

For the Data Science Nerds: Remember how indestructible your trained NLP tools were? Want to see how they act under pressure from text they have never seen before? Come take a look at how traditional NLP tools fair against text as complex as Rap and explore opportunities to design and build systems that handle much more than well-formed English text.

Free copy of Mining Massive Datasets

January 18th, 2014, by Tim Finin, posted in Big data, Datamining, Machine Learning, Semantic Web

A free PDF version of the new second edition of Mining of Massive Datasets by Anand Rajaraman, Jure Leskovec and Jeffey Ullman is available. New chapters on mining large graphs, dimensionality reduction, and machine learning have been added. Related material from Professor Leskovec’s recent Stanford course on Mining Massive Data Sets is also available.

Google knowledge data releases

December 4th, 2013, by Tim Finin, posted in Google, Machine Learning, NLP

A post on Google’s research blog lists the major datasets for NLP and KB processing that Google has released in the past year. They include datasets to help in entity linking, relation extraction, concept spotting and syntactic analysis. Subscribe to the the Knowledge Data Releases mailing list for updates.

You are currently browsing the archives for the Machine Learning category.

  Home | Archive | Login | Feed