UMBC ebiquity
Machine Learning

Archive for the 'Machine Learning' Category

Semantic Interpretation of Structured Log Files

November 21st, 2015, by Tim Finin, posted in Machine Learning, Semantic Web


Piyush Nimbalkar, Semantic Interpretation of Structured Log Files, M.S. thesis, University of Maryland, Baltimore County, August, 2015.

Log files comprise a record of different events happening in various applications, operating systems and even in network devices. Originally they were used to record information for diagnostic and debugging purposes. Nowadays, logs are also used to track events which can be used in auditing and forensics in case of malicious activities or systems attacks. Various softwares like intrusion detection systems, web servers, anti-virus and anti-malware systems, firewalls and network devices generate logs with useful information, that can be used to protect against such system attacks. Analyzing log files can help in pro- actively avoiding attacks against the systems. While there are existing tools that do a good job when the format of log files is known, the challenge lies in cases where log files are from unknown devices and of unknown formats. We propose a framework that takes any log file and automatically gives out a semantic interpretation as a set of RDF Linked Data triples. The framework splits a log file into columns using regular expression-based or dictionary-based classifiers. Leveraging and modifying our existing work on inferring the semantics of tables, we identify every column from a log file and map it to concepts either from a general purpose KB like DBpedia or domain specific ontologies such as IDS. We also identify relationships between various columns in such log files. Converting large and verbose log files into such semantic representations will help in better search, integration and rich reasoning over the data.

talk: Introduction to Deep Learning

November 20th, 2015, by Tim Finin, posted in Machine Learning, NLP

Introduction to Deep Learning

Zhiguang Wang and Hang Gao

10:00am Monday, 23 November 2015, ITE 346

Deep learning has been a hot topic and all over the news lately. It is introduced with the ambition of moving Machine Learning closer to Artificial Intelligence, one of its original goals. Since the introduction of the concept of deep learning, various relevant algorithms are proposed and have achieved significant success in their corresponding areas. This talk aims at providing a brief overview of most common deep learning algorithms, along with their application on different tasks.

In this talk, Steve (Zhiguang Wang) will give a brief introduction about the application of deep learning algorithms in computer vision and speech, some basic viewpoints about training methods and attacking the non-convexity in deep neural nets along with some misc about deep learning.

On the other hand, Hang Gao will talk about common application of deep learning algorithms in Natural Language Processing, covering semantic, syntactic and sentiment analysis. He will also give a discussion on the limits of current application of deep learning algorithms in NLP and provide some ideas on possible future trend.

Lyrics Augmented Multi-modal Music Recommendation

October 29th, 2015, by Tim Finin, posted in Machine Learning, NLP, RDF, Semantic Web

Lyrics Augmented Multi-modal
Music Recommendation

Abhay Kashyap

1:00pm Friday 30 October, ITE 325b

In an increasingly mobile and connected world, digital music consumption has rapidly increased. More recently, faster and cheaper mobile bandwidth has given the average mobile user the potential to access large troves of music through streaming services like Spotify and Google Music that boast catalogs with tens of millions of songs. At this scale, effective music recommendation is critical for music discovery and personalized user experience.

Recommenders that rely on collaborative information suffer from two major problems: the long tail problem, which is induced by popularity bias, and the cold start problem caused by new items with no data. In such cases, they fall back on content to compute similarity. For music, content based features can be divided into acoustic and textual domains. Acoustic features are extracted from the audio signal while textual features come from song metadata, lyrical content, collaborative tags and associated web text.

Research in content based music similarity has largely been focused in the acoustic domain while text based features have been limited to metadata, tags and shallow methods for web text and lyrics. Song lyrics house information about the sentiment and topic of a song that cannot be easily extracted from the audio. Past work has shown that even shallow lyrical features improved audio-only features and in some tasks like mood classification, outperformed audio-only features. In addition, lyrics are also easily available which make them a valuable resource and warrant a deeper analysis.

The goal of this research is to fill the lyrical gap in existing music recommender systems. The first step is to build algorithms to extract and represent the meaning and emotion contained in the song’s lyrics. The next step is to effectively combine lyrical features with acoustic and collaborative information to build a multi-modal recommendation engine.

For this work, the genre is restricted to Rap because it is a lyrics-centric genre and techniques built for Rap can be generalized to other genres. It was also the highest streamed genre in 2014, accounting for 28.5% of all music streamed. Rap lyrics are scraped from dedicated lyrics websites like and while the semantic knowledge base comprising artists, albums and song metadata come from the MusicBrainz project. Acoustic features are directly used from EchoNest while collaborative information like tags, plays, co-plays etc. come from

Preliminary work involved extraction of compositional style features like rhyme patterns and density, vocabulary size, simile and profanity usage from over 10,000 songs by over 150 artists. These features are available for users to browse and explore through interactive visualizations on Song semantics were represented using off-the-shelf neural language based vector models (doc2vec). Future work will involve building novel language models for lyrics and latent representations for attributes that is driven by collaborative information for multi-modal recommendation.

Committee: Drs. Tim Finin (Chair), Anupam Joshi, Pranam Kolari (WalmartLabs), Cynthia Matuszek and Tim Oates

Demystifying Word2Vec: A Hands-on Tutorial

October 16th, 2015, by Tim Finin, posted in Big data, Machine Learning, NLP, NLP

Demystifying Word2Vec – A Hands-on Tutorial

Abhay Kashyap

10:30am Monday, 19 October 2015 **ITE 456**

In the world of NLP, Word2Vec is one of the coolest kids in town! But what exactly is it and how does it work? More importantly, how is it used/useful?

For the first 10-15 minutes, we will go over distributional an distributed representation of words and the neural language model behind Word2Vec. We will also briefly look at doc2vec, the extension of Word2Vec for longer pieces of text.

For the remainder of the time (45-60 minutes), we will get our feet wet by running Word2Vec on a dataset which will then be followed by discussions about potential ways it can be useful for your own work.

What to bring – Any computing machine with Python installed, lots of curiosity and some delicious snacks for me maybe? We will use the excellent gensim package for python to run Word2Vec along with cython to speed things up. If you aren’t familiar with Python or don’t like it, no worries! It’s really just 5-6 lines of code! The training dataset will be provided. If you wish to bring your own, that’s cool too.

NOTE: We will hold this week’s Ebiquity meeting in ITE 456.

talk: Is your personal data at risk? App analytics to the rescue

September 26th, 2015, by Tim Finin, posted in cybersecurity, Machine Learning, Privacy, Security

Is your personal data at risk?
App analytics to the rescue

Prajit Kumar Das

10:30am Monday, 28 September 28 2015, ITE346

According to Virustotal, a prominent virus and malware tool, the Google Play Store has a few thousand apps from major malware families. Given such a revelation, access control systems for mobile data management, have reached a state of critical importance. We propose the development of a system which would help us detect the pathways using which user’s data is being stolen from their mobile devices. We use a multi layered approach which includes app meta data analysis, understanding code patterns and detecting and eventually controlling dynamic data flow when such an app is installed on a mobile device. In this presentation we focus on the first part of our work and discuss the merits and flaws of our unsupervised learning mechanism to detect possible malicious behavior from apps in the Google Play Store.

SVM-CASE: An SVM-based Context Aware Security Framework for Vehicular Ad-hoc Networks

September 1st, 2015, by Tim Finin, posted in cybersecurity, Machine Learning


Wenjia Li, Anupam Joshi and Tim Finin, SVM-CASE: An SVM-based Context Aware Security Framework for Vehicular Ad-hoc Networks, IEEE 82nd Vehicular Technology Conf., Boston, Sept. 2015.

Vehicular Ad-hoc Networks (VANETs) are known to be very susceptible to various malicious attacks. To detect and mitigate these malicious attacks, many security mechanisms have been studied for VANETs. In this paper, we propose a context aware security framework for VANETs that uses the Support Vector Machine (SVM) algorithm to automatically determine the boundary between malicious nodes and normal ones. Compared to the existing security solutions for VANETs, The proposed framework is more resilient to context changes that are common in VANETs, such as those due to malicious nodes altering their attack patterns over time or rapid changes in environmental factors, such as the motion speed and transmission range. We compare our framework to existing approaches and present evaluation results obtained from simulation studies.

Platys: From Position to Place-Oriented Mobile Computing

June 8th, 2015, by Tim Finin, posted in AI, KR, Machine Learning, Mobile Computing, Ontologies

The NSF-sponsored Platys project explored the idea that places are more than just GPS coordinates. They are concepts rich with semantic information, including people, activities, roles, functions, time and purpose. Our mobile phones can learn to recognize the places we are in and use information about them to provide better services.

Laura Zavala, Pradeep K. Murukannaiah, Nithyananthan Poosamani, Tim Finin, Anupam Joshi, Injong Rhee and Munindar P. Singh, Platys: From Position to Place-Oriented Mobile Computing, AI Magazine, v36, n2, 2015.

The Platys project focuses on developing a high-level, semantic notion of location called place. A place, unlike a geospatial position, derives its meaning from a user’s actions and interactions in addition to the physical location where it occurs. Our aim is to enable the construction of a large variety of applications that take advantage of place to render relevant content and functionality and, thus, improve user experience. We consider elements of context that are particularly related to mobile computing. The main problems we have addressed to realize our place-oriented mobile computing vision are representing places, recognizing places, and engineering place-aware applications. We describe the approaches we have developed for addressing these problems and related subproblems. A key element of our work is the use of collaborative information sharing where users’ devices share and integrate knowledge about places. Our place ontology facilitates such collaboration. Declarative privacy policies allow users to specify contextual features under which they prefer to share or not share their information.

UMBC Schema Free Query system on ESWC Schema-agnostic Queries over Linked Data

June 7th, 2015, by Tim Finin, posted in Machine Learning, NLP, RDF, Semantic Web

This year’s ESWC Semantic Web Evaluation Challenge track had a task on Schema-agnostic Queries over Linked Data: SAQ-2015. The idea is to support a SPARQL-like query language that does not require knowing the underlying graph schema nor the URIs to use for terms and individuals, as in the follwing examples.

 SELECT ?y {BillClinton hasDaughter ?x. ?x marriedTo ?y.}

 SELECT ?x {?x isA book. ?x by William_Goldman.
            ?x has_pages ?p. FILTER (?p > 300)}

We adapted our Schema Free Querying system to the task as described in the following paper.

Zareen Syed, Lushan Han, Muhammad Mahbubur Rahman, Tim Finin, James Kukla and Jeehye Yun, UMBC_Ebiquity-SFQ: Schema Free Querying System, ESWC Semantic Web Evaluation Challenge, Extended Semantic Web Conference, June 2015.

Users need better ways to explore large complex linked data resources. Using SPARQL requires not only mastering its syntax and semantics but also understanding the RDF data model, the ontology and URIs for entities of interest. Natural language question answering systems solve the problem, but these are still subjects of research. The Schema agnostic SPARQL queries task defined in SAQ-2015 challenge consists of schema-agnostic queries following the syntax of the SPARQL standard, where the syntax and semantics of operators are maintained, while users are free to choose words, phrases and entity names irrespective of the underlying schema or ontology. This combination of query skeleton with keywords helps to remove some of the ambiguity. We describe our framework for handling schema agnostic or schema free queries and discuss enhancements to handle the SAQ-2015 challenge queries. The key contributions are the robust methods that combine statistical association and semantic similarity to map user terms to the most appropriate classes and properties used in the underlying ontology and type inference for user input concepts based on concept linking.

Querying RDF Data with Text Annotated Graphs

June 6th, 2015, by Tim Finin, posted in Big data, Database, Machine Learning, RDF, Semantic Web

New paper: Lushan Han, Tim Finin, Anupam Joshi and Doreen Cheng, Querying RDF Data with Text Annotated Graphs, 27th International Conference on Scientific and Statistical Database Management, San Diego, June 2015.

Scientists and casual users need better ways to query RDF databases or Linked Open Data. Using the SPARQL query language requires not only mastering its syntax and semantics but also understanding the RDF data model, the ontology used, and URIs for entities of interest. Natural language query systems are a powerful approach, but current techniques are brittle in addressing the ambiguity and complexity of natural language and require expensive labor to supply the extensive domain knowledge they need. We introduce a compromise in which users give a graphical “skeleton” for a query and annotates it with freely chosen words, phrases and entity names. We describe a framework for interpreting these “schema-agnostic queries” over open domain RDF data that automatically translates them to SPARQL queries. The framework uses semantic textual similarity to find mapping candidates and uses statistical approaches to learn domain knowledge for disambiguation, thus avoiding expensive human efforts required by natural language interface systems. We demonstrate the feasibility of the approach with an implementation that performs well in an evaluation on DBpedia data.

Discovering and Querying Hybrid Linked Data

June 5th, 2015, by Tim Finin, posted in Big data, KR, Machine Learning, Semantic Web


New paper: Zareen Syed, Tim Finin, Muhammad Rahman, James Kukla and Jeehye Yun, Discovering and Querying Hybrid Linked Data, Third Workshop on Knowledge Discovery and Data Mining Meets Linked Open Data, held in conjunction with the 12th Extended Semantic Web Conference, Portoroz Slovenia, June 2015.

In this paper, we present a unified framework for discovering and querying hybrid linked data. We describe our approach to developing a natural language query interface for a hybrid knowledge base Wikitology, and present that as a case study for accessing hybrid information sources with structured and unstructured data through natural language queries. We evaluate our system on a publicly available dataset and demonstrate improvements over a baseline system. We describe limitations of our approach and also discuss cases where our system can complement other structured data querying systems by retrieving additional answers not available in structured sources.

Clare Grasso: Information Extraction from Dirty Notes for Clinical Decision Support

May 11th, 2015, by Tim Finin, posted in Machine Learning, NLP, Ontologies, Semantic Web

Information Extraction from Dirty Notes
for Clinical Decision Support

Clare Grasso

10:00am Tuesday, 12 May 2015, ITE346

The term clinical decision support refers broadly to providing clinicians or patients with computer-generated clinical knowledge and patient-related information, intelligently filtered or presented at appropriate times, to enhance patient care. It is estimated that at least 50% of the clinical information describing a patient’s current condition and stage of therapy resides in the free-form text portions of the Electronic Health Record (EHR). Both linguistic and statistical natural language processing (NLP) models assume the presence of a formal underlying grammar in the text. Yet, clinical notes are often times filled with overloaded and nonstandard abbreviations, sentence fragments, and creative punctuation that make it difficult for grammar-based NLP systems to work effectively. This research focuses on investigating scalable machine learning and semantic techniques that do not rely on an underlying grammar to extract medical concepts in the text in order to apply them in CDS on commodity hardware and software systems. Additionally, by packaging the extracted data within a semantic knowledge representation, the facts can be combined with other semantically encoded facts and reasoned over to help to inform clinicians in their decision making.

Mid-Atlantic Student Colloquium on Speech, Language & Learning, Fri. 1/30

January 25th, 2015, by Tim Finin, posted in Machine Learning, NLP

The fourth Mid-Atlantic Student Colloquium on Speech, Language and Learning (MASC-SLL) will he held at JHU this coming Friday, January 30. It’s a good opportunity to sample current research on language technology and machine learning, including the work of a number of UMBC students. The program for the one-day colloquium includes oral presentations, poster sessions, a panel and three breakout sessions.

The event is free and open to all, but registration is requested by Tuesday, January 27. Note that the location has been moved to the Glass Pavilion on the JHU Homewood Campus

You are currently browsing the archives for the Machine Learning category.

  Home | Archive | Login | Feed