UMBC ebiquity
Machine Learning

Archive for the 'Machine Learning' Category

UMBC Data Science Graduate Program Starts Fall 2017

June 16th, 2017, by Tim Finin, posted in Big data, Data Science, Database, Datamining, KR, Machine Learning, NLP

 

UMBC Data Science Graduate Programs

UMBC’s Data Science Master’s program prepares students from a wide range of disciplinary backgrounds for careers in data science. In the core courses, students will gain a thorough understanding of data science through classes that highlight machine learning, data analysis, data management, ethical and legal considerations, and more.

Students will develop an in-depth understanding of the basic computing principles behind data science, to include, but not limited to, data ingestion, curation and cleaning and the 4Vs of data science: Volume, Variety, Velocity, Veracity, as well as the implicit 5th V — Value. Through applying principles of data science to the analysis of problems within specific domains expressed through the program pathways, students will gain practical, real world industry relevant experience.

The MPS in Data Science is an industry-recognized credential and the program prepares students with the technical and management skills that they need to succeed in the workplace.

For more information and to apply online, see the Data Science MPS site.

Data Science MD: Getting Started with NLP, Sentiment Analysis and OpenNLP

June 15th, 2017, by Tim Finin, posted in Machine Learning, NLP

The topic of this month’s Data Science MD meetup is Getting Started with NLP, Sentiment Analysis and OpenNLP. The meeting will be 6:30-9:00pm, Monday, June 19 in Building 200 Room E100 at the JHU Applied Physics Laboratory. The meeting starts with networking and food and feature talks by two practitioners.

Brian Sacash (Deloitte & Touche): NLP and Sentiment Analysis

Natural Language Processing, the analysis of language, can be challenging if you don’t know where to start. Brian will walk through the Natural Language Tool Kit (NLTK), a Python library built for language analysis, and cover its core functionality. Through live coding he will demonstrate how to build a simple sentiment analysis engine from scratch.

Daniel Russ (NIH): It Takes a Village To Solve A Problem in Data Science

The talk will discuss a scientific case study in data science, computer-based occupational coding of free text job histories taken during epidemiological research studies. Beginning with a rationale for occupational coding, how the coding is performed, and how SOCcer is built on top of Apache OpenNLP. Throughout the talk, I will try to emphasize the importance of working as an interdisciplinary team.

See the meetup announcement to RSVP and get directions and more information.

Modeling and Extracting information about Cybersecurity Events from Text

May 15th, 2017, by Tim Finin, posted in cybersecurity, Machine Learning, NLP, OWL, Semantic Web

Ph.D. Dissertation Proposal

Modeling and Extracting information about Cybersecurity Events from Text

Taneeya Satyapanich

Tuesday, 16 May 2017, ITE 325, UMBC

People rely on the Internet to carry out much of the their daily activities such as banking, ordering food and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data and identity theft. With the large and increasing number of transaction done every day, the frequency of cybercrime events is also increasing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cybersecurity threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.

This dissertation will make two major contributions. The first is to extend our current cyber security ontologies with better models for relevant events, from atomic events like a login attempt, to an extended but related series of events that make up a campaign, to generalized events, such as an increase in denial-of-service attacks originating from a particular region of the world targeted at U.S. financial institutions. The second is the design and implementation of a event extraction system that can extract information about cybersecurity events from text and populated a knowledge graph using our cybersecurity event ontology. We will extend our previous work on event extraction that detected human activity events from news and discussion forums. A new set of features and learning algorithms will be introduced to improve the performance and adapt the system to cybersecurity domain. We believe that this dissertation will be useful for cybersecurity management in the future. It will quickly extract cybersecurity events from text and fill in the event ontology.

Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates and Karuna Joshi

new paper: Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps

May 15th, 2017, by Tim Finin, posted in AI, Machine Learning, NLP, Paper, Semantic Web

Jennifer Sleeman, Milton Halem, Tim Finin, and Mark Cane, Modeling the Evolution of Climate Change Assessment Research Using Dynamic Topic Models and Cross-Domain Divergence Maps, AAAI Spring Symposium on AI for Social Good, AAAI Press, March, 2017.

Climate change is an important social issue and the subject of much research, both to understand the history of the Earth’s changing climate and to foresee what changes to expect in the future. Approximately every five years starting in 1990 the Intergovernmental Panel on Climate Change (IPCC) publishes a set of reports that cover the current state of climate change research, how this research will impact the world, risks, and approaches to mitigate the effects of climate change. Each report supports its findings with hundreds of thousands of citations to scientific journals and reviews by governmental policy makers. Analyzing trends in the cited documents over the past 30 years provides insights into both an evolving scientific field and the climate change phenomenon itself. Presented in this paper are results of dynamic topic modeling to model the evolution of these climate change reports and their supporting research citations over a 30 year time period. Using this technique shows how the research influences the assessment reports and how trends based on these influences can affect future assessment reports. This is done by calculating cross-domain divergences between the citation domain and the assessment report domain and by clustering documents between domains. This approach could be applied to other social problems with similar structure such as disaster recovery.

A hands-on introduction to TensorFlow and machine learning, 10am 3/28

March 18th, 2017, by Tim Finin, posted in events, Machine Learning, meetings

 

A Hands-on Introduction to TensorFlow and Machine Learning

Abhay Kashyap, UMBC ebiquity Lab

10:00-11:00am Tuesday, 28 March 2017, ITE346 ITE325b

As many of you know, TensorFlow is an open source machine learning library by Google which simplifies building and training deep neural networks that can take advantage of computers with GPUs. In this meeting, I will introduce some basic concepts of TensorFlow and machine learning in general. This will be a hands on tutorial where we will sit and code up some basic examples in TensorfFow. Specifically, we will use TensorFlow to implement linear regression, softmax classifiers and feed forward neural networks (MLP). You can find the Python notebooks here. If time permits, we will go over the implementation of the popular word2vec algorithm and introduce LSTMs to build language models.

What you need to know: Python and the basics of linear algebra and matrix operations. While it helps to know basics of machine learning, no prior knowledge will be assumed and there will be a gentle high level introduction to the algorithms we will implement.

What you need to bring: A laptop that has Python and pip installed. Having virtual environments set up on your computer is also a plus. (Warning: Windows-only users might be publicly shamed)

new paper: App behavioral analysis using system calls

March 14th, 2017, by Tim Finin, posted in Datamining, Machine Learning, Mobile Computing, Security

Prajit Kumar Das, Anupam Joshi and Tim Finin, App behavioral analysis using system calls, MobiSec: Security, Privacy, and Digital Forensics of Mobile Systems and Networks, IEEE Conference on Computer Communications Workshops, May 2017.

System calls provide an interface to the services made available by an operating system. As a result, any functionality provided by a software application eventually reduces to a set of fixed system calls. Since system calls have been used in literature, to analyze program behavior we made an assumption that analyzing the patterns in calls made by a mobile application would provide us insight into its behavior. In this paper, we present our preliminary study conducted with 534 mobile applications and the system calls made by them. Due to a rising trend of mobile applications providing multiple functionalities, our study concluded, mapping system calls to functional behavior of a mobile application was not straightforward. We use Weka tool and manually annotated application behavior classes and system call features in our experiments to show that using such features achieves mediocre F1-measure at best, for app behavior classification. Thus leading to the conclusion that system calls were not sufficient features for app behavior classification.

Large Scale Cross Domain Temporal Topic Modeling for Climate Change Research

December 23rd, 2016, by Tim Finin, posted in Big data, Machine Learning, NLP

Jennifer Sleeman, Milton Halem, Tim Finin, Mark Cane, Advanced Large Scale Cross Domain Temporal Topic Modeling Algorithms to Infer the Influence of Recent Research on IPCC Assessment Reports (poster), American Geophysical Union Fall Meeting 2016, American Geophysical Union, December 2016.

One way of understanding the evolution of science within a particular scientific discipline is by studying the temporal influences that research publications had on that discipline. We provide a methodology for conducting such an analysis by employing cross-domain topic modeling and local cluster mappings of those publications with the historical texts to understand exactly when and how they influenced the discipline. We apply our method to the Intergovernmental Panel on Climate Change (IPCC) Assessment Reports and the citations therein. The IPCC reports were compiled by thousands of Earth scientists and the assessments were issued approximately every five years over a 30 year span, and includes over 200,000 research papers cited by these scientists.

PhD Proposal: Understanding the Logical and Semantic Structure of Large Documents

December 9th, 2016, by Tim Finin, posted in Machine Learning, NLP, NLP, Ontologies

business documents

Dissertation Proposal

Understanding the Logical and Semantic
Structure of Large Documents 

Muhammad Mahbubur Rahman

11:00-1:00 Monday, 12 December 2016, ITE325b, UMBC

Up-to-the-minute language understanding approaches are mostly focused on small documents such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents such as legal documents, reports, business opportunities, proposals and technical manuals is still a challenging task. The reason behind this challenge is that the documents may be multi-themed, complex and cover diverse topics.

We aim to automatically identify and classify a document’s sections and subsections, infer their structure and annotate them with semantic labels to understand the semantic structure of a document. This document’s structure understanding will significantly benefit and inform a variety of applications such as information extraction and retrieval, document categorization and clustering, document summarization, fact and relation extraction, text analysis and question answering.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Cynthia Matuszek, James Mayfield (JHU)

PhD Proposal: Ankur Padia, Dealing with Dubious Facts in Knowledge Graphs

November 29th, 2016, by Tim Finin, posted in KR, Machine Learning, NLP, NLP, Semantic Web

the skeptic

Dissertation Proposal

Dealing with Dubious Facts
in Knowledge Graphs

Ankur Padia

1:00-3:00pm Wednesday, 30 November 2016, ITE 325b, UMBC

Knowledge graphs are structured representations of facts where nodes are real-world entities or events and edges are the associations among the pair of entities. Knowledge graphs can be constructed using automatic or manual techniques. Manual techniques construct high quality knowledge graphs but are expensive, time consuming and not scalable. Hence, automatic information extraction techniques are used to create scalable knowledge graphs but the extracted information can be of poor quality due to the presence of dubious facts.

An extracted fact is dubious if it is incorrect, inexact or correct but lacks evidence. A fact might be dubious because of the errors made by NLP extraction techniques, improper design consideration of the internal components of the system, choice of learning techniques (semi-supervised or unsupervised), relatively poor quality of heuristics or the syntactic complexity of underlying text. A preliminary analysis of several knowledge extraction systems (CMU’s NELL and JHU’s KELVIN) and observations from the literature suggest that dubious facts can be identified, diagnosed and managed. In this dissertation, I will explore approaches to identify and repair such dubious facts from a knowledge graph using several complementary approaches, including linguistic analysis, common sense reasoning, and entity linking.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Paul McNamee (JHU), Partha Talukdar (IISc, India)

Understanding Large Documents

November 28th, 2016, by Tim Finin, posted in Machine Learning, NLP

business documents

In this week’s ebiquity meeting, Muhammad Mahbubur Rahman will about about his work on understanding large documents, such as business RFPs.

Large Document Understanding

Muhammad Mahbubur Rahman

Up-to-the-minute language understanding approaches are mostly focused on small documents such as newswire articles, blog posts, product reviews and discussion forum entries. Understanding and extracting information from large documents such as legal documents, reports, business opportunities, proposals and technical manuals is still a challenging task. The reason behind this challenge is that the documents may be multi-themed, complex and cover diverse topics.

We aim to automatically identify and classify a document’s sections and subsections, infer their structure and annotate them with semantic labels to understand the semantic structure of a document. This document’s structure understanding will significantly benefit and inform a variety of applications such as information extraction and retrieval, document categorization and clustering, document summarization, fact and relation extraction, text analysis and question answering.

PhD proposal: Sandeep Nair Narayanan, Cognitive Analytics Framework to Secure Internet of Things

November 26th, 2016, by Tim Finin, posted in cybersecurity, IoT, Machine Learning

cognitive car

Dissertation Proposal

Cognitive Analytics Framework to Secure Internet of Things

Sandeep Nair Narayanan

1:00-3:30pm, Monday, 28 November 2016, ITE 325b

Recent years have seen the rapid growth and widespread adoption of Internet of Things in a wide range of domains including smart homes, healthcare, automotive, smart farming and smart grids. The IoT ecosystem consists of devices like sensors, actuators and control systems connected over heterogeneous networks. The connected devices can be from different vendors with different capabilities in terms of power requirements, processing capabilities, etc. As such, many security features aren’t implemented on devices with lesser processing capabilities. The level of security practices followed during their development can also be different. Lack of over the air update for firmware also pose a very big security threat considering their long-term deployment requirements. Device malfunctioning is yet another threat which should be considered. Hence, it is imperative to have an external entity which monitors the ecosystem and detect attacks and anomalies.

In this thesis, we propose a security framework for IoTs using cognitive techniques. While anomaly detection has been employed in various domains, some challenges like online approach, resource constraints, heterogeneity, distributed data collection etc. are unique to IoTs and their predecessors like wireless sensor networks. Our framework will have an underlying knowledge base which has the domain-specific information, a hybrid context generation module which generates complex contexts and a fast reasoning engine which does logical reasoning to detect anomalous activities. When raw sensor data arrives, the hybrid context generation module queries the knowledge base and generates different simple local contexts using various statistical and machine learning models. The inferencing engine will then infer global complex contexts and detects anomalous activities using knowledge from streaming facts and and domain specific rules encoded in the Ontology we will create. We will evaluate our techniques by realizing and validating them in the vehicular domain.

Committee: Drs. Dr. Anupam Joshi (Chair), Dr. Tim Finin, Dr. Nilanjan Banerjee, Dr. Yelena Yesha, Dr. Wenjia Li, NYIT, Dr. Filip Perich, Google

talk: Topic Modeling for Analyzing Document Collection, 11am Mon 3/16

May 12th, 2016, by Tim Finin, posted in Datamining, High performance computing, Machine Learning, NLP

Ogihara

Topic Modeling for Analyzing Document Collection

Mitsunori Ogihara
Computer Science, University of Miami

11:00am Monday, 16 May 2016, ITE 325b, UMBC

Topic modeling (in particular, Latent Dirichlet Analysis) is a technique for analyzing a large collection of documents. In topic modeling we view each document as a frequency vector over a vocabulary and each topic as a static distribution over the vocabulary. Given a desired number, K, of document classes, a topic modeling algorithm attempts to estimate concurrently K static distributions and for each document how much each K class contributes. Mathematically, this is the problem of approximating the matrix generated by stacking the frequency vectors into the product of two non-negative matrices, where both the column dimension of the first matrix and the row dimension of the second matrix are equal to K. Topic modeling is gaining popularity recently, for analyzing large collections of documents.

In this talk I will present some examples of applying topic modeling: (1) a small sentiment analysis of a small collection of short patient surveys, (2) exploratory content analysis of a large collection of letters, (3) document classification based upon topics and other linguistic features, and (4) exploratory analysis of a large collection of literally works. I will speak not only the exact topic modeling steps but also all the preprocessing steps for preparing the documents for topic modeling.

Mitsunori Ogihara is a Professor of Computer Science at the University of Miami, Coral Gables, Florida. There he directs the Data Mining Group in the Center for Computational Science, a university-wide organization for providing resources and consultation for large-scale computation. He has published three books and approximately 190 papers in conferences and journals. He is on the editorial board for Theory of Computing Systems and International Journal of Foundations of Computer Science. Ogihara received a Ph.D. in Information Sciences from Tokyo Institute of Technology in 1993 and was a tenure-track/tenured faculty member in the Department of Computer Science at the University of Rochester from 1994 to 2007.

You are currently browsing the archives for the Machine Learning category.

  Home | Archive | Login | Feed