UMBC ebiquity
NLP

Archive for the 'NLP' Category

SemTk: The Semantics Toolkit from GE Global Research, 4/4

March 17th, 2017, by Tim Finin, posted in AI, KR, NLP, NLP, Ontologies, OWL, RDF, Semantic Web

The Semantics Toolkit

Paul Cuddihy and Justin McHugh
GE Global Research Center, Niskayuna, NY

10:00-11:00 Tuesday, 4 April 2017, ITE 346, UMBC

SemTk (Semantics Toolkit) is an open source technology stack built by GE Scientists on top of W3C Semantic Web standards.  It was originally conceived for data exploration and simplified query generation, and later expanded to a more general semantics abstraction platform. SemTk is made up of a Java API and microservices along with Javascript front ends that cover drag-and-drop query generation, path finding, data ingestion and the beginnings of stored procedure support.   In this talk we will give a tour of SemTk, discussing its architecture and direction, and demonstrate it’s features using the SPARQLGraph front-end hosted at http://semtk.research.ge.com.

Paul Cuddihy is a senior computer scientist and software systems architect in AI and Learning Systems at the GE Global Research Center in Niskayuna, NY. He earned an M.S. in Computer Science from Rochester Institute of Technology. The focus of his twenty-year career at GE Research has ranged from machine learning for medical imaging equipment diagnostics, monitoring and diagnostic techniques for commercial aircraft engines, modeling techniques for monitoring seniors living independently in their own homes, to parallel execution of simulation and prediction tasks, and big data ontologies.  He is one of the creators of the open source software “Semantics Toolkit” (SemTk) which provides a simplified interface to the semantic tech stack, opening its use to a broader set of users by providing features such as drag-and-drop query generation and data ingestion.  Paul has holds over twenty U.S. patents.

Justin McHugh is computer scientist and software systems architect working in the AI and Learning Systems group at GE Global Research in Niskayuna, NY. Justin attended the State University of New York at Albany where he earned an M.S in computer science. He has worked as a systems architect and programmer for large scale reporting, before moving into the research sector. In the six years since, he has worked on complex system integration, Big Data systems and knowledge representation/querying systems. Justin is one of the architects and creators of SemTK (the Semantics Toolkit), a toolkit aimed at making the power of the semantic web stack available to programmers, automation and subject matter experts without their having to be deeply invested in the workings of the Semantic Web.

Large Scale Cross Domain Temporal Topic Modeling for Climate Change Research

December 23rd, 2016, by Tim Finin, posted in Big data, Machine Learning, NLP

Jennifer Sleeman, Milton Halem, Tim Finin, Mark Cane, Advanced Large Scale Cross Domain Temporal Topic Modeling Algorithms to Infer the Influence of Recent Research on IPCC Assessment Reports (poster), American Geophysical Union Fall Meeting 2016, American Geophysical Union, December 2016.

One way of understanding the evolution of science within a particular scientific discipline is by studying the temporal influences that research publications had on that discipline. We provide a methodology for conducting such an analysis by employing cross-domain topic modeling and local cluster mappings of those publications with the historical texts to understand exactly when and how they influenced the discipline. We apply our method to the Intergovernmental Panel on Climate Change (IPCC) Assessment Reports and the citations therein. The IPCC reports were compiled by thousands of Earth scientists and the assessments were issued approximately every five years over a 30 year span, and includes over 200,000 research papers cited by these scientists.

PhD Proposal: Understanding the Logical and Semantic Structure of Large Documents

December 9th, 2016, by Tim Finin, posted in Machine Learning, NLP, NLP, Ontologies

business documents

Dissertation Proposal

Understanding the Logical and Semantic
Structure of Large Documents 

Muhammad Mahbubur Rahman

11:00-1:00 Monday, 12 December 2016, ITE325b, UMBC

Up-to-the-minute language understanding approaches are mostly focused on small documents such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents such as legal documents, reports, business opportunities, proposals and technical manuals is still a challenging task. The reason behind this challenge is that the documents may be multi-themed, complex and cover diverse topics.

We aim to automatically identify and classify a document’s sections and subsections, infer their structure and annotate them with semantic labels to understand the semantic structure of a document. This document’s structure understanding will significantly benefit and inform a variety of applications such as information extraction and retrieval, document categorization and clustering, document summarization, fact and relation extraction, text analysis and question answering.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Cynthia Matuszek, James Mayfield (JHU)

PhD Proposal: Ankur Padia, Dealing with Dubious Facts in Knowledge Graphs

November 29th, 2016, by Tim Finin, posted in KR, Machine Learning, NLP, NLP, Semantic Web

the skeptic

Dissertation Proposal

Dealing with Dubious Facts
in Knowledge Graphs

Ankur Padia

1:00-3:00pm Wednesday, 30 November 2016, ITE 325b, UMBC

Knowledge graphs are structured representations of facts where nodes are real-world entities or events and edges are the associations among the pair of entities. Knowledge graphs can be constructed using automatic or manual techniques. Manual techniques construct high quality knowledge graphs but are expensive, time consuming and not scalable. Hence, automatic information extraction techniques are used to create scalable knowledge graphs but the extracted information can be of poor quality due to the presence of dubious facts.

An extracted fact is dubious if it is incorrect, inexact or correct but lacks evidence. A fact might be dubious because of the errors made by NLP extraction techniques, improper design consideration of the internal components of the system, choice of learning techniques (semi-supervised or unsupervised), relatively poor quality of heuristics or the syntactic complexity of underlying text. A preliminary analysis of several knowledge extraction systems (CMU’s NELL and JHU’s KELVIN) and observations from the literature suggest that dubious facts can be identified, diagnosed and managed. In this dissertation, I will explore approaches to identify and repair such dubious facts from a knowledge graph using several complementary approaches, including linguistic analysis, common sense reasoning, and entity linking.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Paul McNamee (JHU), Partha Talukdar (IISc, India)

Understanding Large Documents

November 28th, 2016, by Tim Finin, posted in Machine Learning, NLP

business documents

In this week’s ebiquity meeting, Muhammad Mahbubur Rahman will about about his work on understanding large documents, such as business RFPs.

Large Document Understanding

Muhammad Mahbubur Rahman

Up-to-the-minute language understanding approaches are mostly focused on small documents such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents such as legal documents, reports, business opportunities, proposals and technical manuals is still a challenging task. The reason behind this challenge is that the documents may be multi-themed, complex and cover diverse topics.

We aim to automatically identify and classify a document’s sections and subsections, infer their structure and annotate them with semantic labels to understand the semantic structure of a document. This document’s structure understanding will significantly benefit and inform a variety of applications such as information extraction and retrieval, document categorization and clustering, document summarization, fact and relation extraction, text analysis and question answering.

talk: Topic Modeling for Analyzing Document Collection, 11am Mon 3/16

May 12th, 2016, by Tim Finin, posted in Datamining, High performance computing, Machine Learning, NLP

Ogihara

Topic Modeling for Analyzing Document Collection

Mitsunori Ogihara
Computer Science, University of Miami

11:00am Monday, 16 May 2016, ITE 325b, UMBC

Topic modeling (in particular, Latent Dirichlet Analysis) is a technique for analyzing a large collection of documents. In topic modeling we view each document as a frequency vector over a vocabulary and each topic as a static distribution over the vocabulary. Given a desired number, K, of document classes, a topic modeling algorithm attempts to estimate concurrently K static distributions and for each document how much each K class contributes. Mathematically, this is the problem of approximating the matrix generated by stacking the frequency vectors into the product of two non-negative matrices, where both the column dimension of the first matrix and the row dimension of the second matrix are equal to K. Topic modeling is gaining popularity recently, for analyzing large collections of documents.

In this talk I will present some examples of applying topic modeling: (1) a small sentiment analysis of a small collection of short patient surveys, (2) exploratory content analysis of a large collection of letters, (3) document classification based upon topics and other linguistic features, and (4) exploratory analysis of a large collection of literally works. I will speak not only the exact topic modeling steps but also all the preprocessing steps for preparing the documents for topic modeling.

Mitsunori Ogihara is a Professor of Computer Science at the University of Miami, Coral Gables, Florida. There he directs the Data Mining Group in the Center for Computational Science, a university-wide organization for providing resources and consultation for large-scale computation. He has published three books and approximately 190 papers in conferences and journals. He is on the editorial board for Theory of Computing Systems and International Journal of Foundations of Computer Science. Ogihara received a Ph.D. in Information Sciences from Tokyo Institute of Technology in 1993 and was a tenure-track/tenured faculty member in the Department of Computer Science at the University of Rochester from 1994 to 2007.

Representing and Reasoning with Temporal Properties/Relations in OWL/RDF

May 1st, 2016, by Tim Finin, posted in KR, NLP, Ontologies, Semantic Web

Representing and Reasoning with Temporal
Properties/Relations in OWL/RDF

Clare Grasso

10:30-11:30 Monday, 2 May 2016, ITE346

OWL ontologies offer the means for modeling real-world domains by representing their high-level concepts, properties and interrelationships. These concepts and their properties are connected by means of binary relations. However, this assumes that the model of the domain is either a set of static objects and relationships that do not change over time, or a snapshot of these objects at a particular point in time. In general, relationships between objects that change over time (dynamic properties) are not binary relations, since they involve a temporal interval in addition to the object and the subject. Representing and querying information evolving in time requires careful consideration of how to use OWL constructs to model dynamic relationships and how the semantics and reasoning capabilities within that architecture are affected.

Image description using deep neural networks

February 27th, 2016, by Tim Finin, posted in AI, Machine Learning, NLP

Image description using deep neural networks

Sunil Gandhi
10:30 am, Monday, February 29, 2016 ITE 346

With the explosion of image data on the internet, there has been a need for automatic generation of image descriptions. In this project we use deep neural networks for extracting vectors from images and we use them to generate text that describes the image. The model that we built makes use of the pre-trained VGGNET- a model for image classification and a recurrent neural network (RNN) for language modelling. The combination of the two neural networks provides a multimodal embedding between image vectors and word vectors. We trained the model on 8000 images from the Flickr8k dataset and we present our results on test images downloaded from the Internet. We provide a web-service for image description generation that takes the image URL as input and provides image description and image categories as output. Through our service, a user can correct the description automatically generated by the system so that we can improve our model using corrected description.

Sunil Gandhi is a Computer Science Ph.D. student at UMBC who is part of the  Cognition Robotics and Learning Lab (CORAL) research lab.

Alexa, get my coffee: Using the Amazon Echo in Research

December 3rd, 2015, by Tim Finin, posted in AI, NLP, NLP, Semantic Web

“Alexa, get my coffee”:
Using the Amazon Echo in Research

Megan Zimmerman

10:30am Monday, 7 December 2015, ITE 346

The Amazon Echo is a remarkable example of language-controlled, user-centric technology, but also a great example of how far such devices have to go before they will fulfill the longstanding promise of intelligent assistance. In this talk, we will describe the Interactive Robotics and Language Lab‘s work with the Echo, with an emphasis on the practical aspects of getting it set up for development and adding new capabilities. We will demonstrate adding a simple new interaction, and then lead a brainstorming session on future research applications.

Megan Zimmerman is a UMBC undergrad majoring in computer science working on interpreting language about tasks at varying levels of abstraction, with a focus on interpreting abstract statements as possible task instructions in assistive technology.

talk: Introduction to Deep Learning

November 20th, 2015, by Tim Finin, posted in Machine Learning, NLP

Introduction to Deep Learning

Zhiguang Wang and Hang Gao

10:00am Monday, 23 November 2015, ITE 346

Deep learning has been a hot topic and all over the news lately. It is introduced with the ambition of moving Machine Learning closer to Artificial Intelligence, one of its original goals. Since the introduction of the concept of deep learning, various relevant algorithms are proposed and have achieved significant success in their corresponding areas. This talk aims at providing a brief overview of most common deep learning algorithms, along with their application on different tasks.

In this talk, Steve (Zhiguang Wang) will give a brief introduction about the application of deep learning algorithms in computer vision and speech, some basic viewpoints about training methods and attacking the non-convexity in deep neural nets along with some misc about deep learning.

On the other hand, Hang Gao will talk about common application of deep learning algorithms in Natural Language Processing, covering semantic, syntactic and sentiment analysis. He will also give a discussion on the limits of current application of deep learning algorithms in NLP and provide some ideas on possible future trend.

Knowledge Extraction from Cloud Service Level Agreements

November 1st, 2015, by Tim Finin, posted in cloud computing, NLP, Policy

Sudip Mittal, Karuna Pande Joshi, Claudia Pearce, and Anupam Joshi, Parallelizing Natural Language Techniques for Knowledge Extraction from Cloud Service Level Agreements, IEEE International Conference on Big Data, October, 2015.

To efficiently utilize their cloud based services, consumers have to continuously monitor and manage the Service Level Agreements (SLA) that define the service performance measures. Currently this is still a time and labor intensive process since the SLAs are primarily stored as text documents. We have significantly automated the process of extracting, managing and monitoring cloud SLAs using natural language processing techniques and Semantic Web technologies. In this paper we describe our prototype system that uses a Hadoop cluster to extract knowledge from unstructured legal text documents. For this prototype we have considered publicly available SLA/terms of service documents of various cloud providers. We use established natural language processing techniques in parallel to speed up cloud legal knowledge base creation. Our system considerably speeds up knowledge base creation and can also be used in other domains that have unstructured data.

The KELVIN Information Extraction System

October 30th, 2015, by Tim Finin, posted in NLP, NLP, Semantic Web

In this week’s ebiquity lab meeting (10:30am Monday Nov 2), Tim Finin will describe recent work on the Kelvin information extraction system and its performance in two tasks in the 2015 NIST Text Analysis Conference. Kelvin has been under development at the JHU Human Language Center of Excellence for several years. Kelvin reads documents in several languages and extracts entities and relations between them. This year it was used for the Coldstart Knowledge Base Population and Trilingual Entity Discovery and Linking tasks. Key components in the tasks are a system for cross-document coreference and another that links entities to entries in the Freebase knowledge base.

You are currently browsing the archives for the NLP category.

  Home | Archive | Login | Feed