July 24th, 2018
Ontology-Grounded Topic Modeling for Climate Science Research
Jennifer Sleeman, Milton Halem and Tim Finin, Ontology-Grounded Topic Modeling for Climate Science Research
, Semantic Web for Social Good Workshop, Int. Semantic Web Conf., Monterey, Oct. 2018. (Selected as best paper), to appear, Emerging Topics in Semantic Technologies, E. Demidova, A.J. Zaveri, E. Simperl (Eds.), AKA Verlag Berlin, 2018.
In scientific disciplines where research findings have a strong impact on society, reducing the amount of time it takes to understand, synthesize and exploit the research is invaluable. Topic modeling is an effective technique for summarizing a collection of documents to find the main themes among them and to classify other documents that have a similar mixture of co-occurring words. We show how grounding a topic model with an ontology, extracted from a glossary of important domain phrases, improves the topics generated and makes them easier to understand. We apply and evaluate this method to the climate science domain. The result improves the topics generated and supports faster research understanding, discovery of social networks among researchers, and automatic ontology generation.
November 18th, 2017
M.S. Thesis Defense
Internal Penetration Test of a Simulated Automotive Ethernet Environment
Kenneth Owen Truex
11:15 Tuesday, 21 November 2017, ITE325, UMBC
The capabilities of modern day automobiles have far exceeded what Robert Bosch GmbH could have imagined when it proposed the Controller Area Network (CAN) bus back in 1986. Over time, drivers wanted more functionality, comfort, and safety in their automobiles — creating a burden for automotive manufacturers. With these driver demands came many innovations to the in-vehicle network core protocol. Modern automobiles that have a video based infotainment system or any type of camera assisted functionality such as an Advanced Driver Assistance System (ADAS) use ethernet as their network backbone. This is because the original CAN specification only allowed for up to 8 bytes of data per message on a bus rated at 1 Mbps. This is far less than the requirements of more advanced video-based automotive systems. The ethernet protocol allows for 1500 bytes of data per packet on a network rated for up to 100 Mbps. This led the automotive industry to adopt ethernet as the core protocol, overcoming most of the limitations posed by the CAN protocol. By adopting ethernet as the protocol for automotive networks, certain attack vectors are now available for black hat hackers to exploit in order to put the vehicle in an unsafe condition. I will create a simulated automotive ethernet environment using the CANoe network simulation platform by Vector GmbH. Then, a penetration test will be conducted on the simulated environment in order to discover attacks that pose a threat to automotive ethernet networks. These attacks will strictly follow a comprehensive threat model in order to narrowly focus the attack surface. If exploited successfully, these attacks will cover all three sides of the Confidentiality, Integrity, Availability (CIA) triad.
I will then propose a new and innovative mitigation strategy that can be implemented on current industry standard ECUs and run successfully under strict time and resource limitations. This new strategy can help to limit the attack surface that exists on modern day automobiles and help to protect the vehicle and its occupants from malicious adversaries.
Committee: Drs. Anupam Joshi (chair), Richard Forno, Charles Nicholas, Nilanjan Banerjee
November 17th, 2017
Discovering Scientific Influence using Cross-Domain Dynamic Topic Modeling
We describe an approach using dynamic topic modeling to model influence and predict future trends in a scientific discipline. Our study focuses on climate change and uses assessment reports of the Intergovernmental Panel on Climate Change (IPCC) and the papers they cite. Since 1990, an IPCC report has been published every five years that includes four separate volumes, each of which has many chapters. Each report cites tens of thousands of research papers, which comprise a correlated dataset of temporally grounded documents. We use a custom dynamic topic modeling algorithm to generate topics for both datasets and apply crossdomain analytics to identify the correlations between the IPCC chapters and their cited documents. The approach reveals both the influence of the cited research on the reports and how previous research citations have evolved over time. For the IPCC use case, the report topic model used 410 documents and a vocabulary of 5911 terms while the citations topic model was based on 200K research papers and a vocabulary more than 25K terms. We show that our approach can predict the importance of its extracted topics on future IPCC assessments through the use of cross domain correlations, Jensen-Shannon divergences and cluster analytics.
November 14th, 2017
ACM Tech Talk Series
A Practitioner’s Introduction to Deep Learning
Ashwin Kumar Ganesan, PhD student
1:00-2:00pm Friday, 17 November 2017?, ITE325, UMBC
In recent years, Deep Neural Networks have been highly successful at performing a number of tasks in computer vision, natural language processing and artificial intelligence in general. The remarkable performance gains have led to universities and industries investing heavily in this space. This investment creates a thriving open source ecosystem of tools & libraries that aid the design of new architectures, algorithm research as well as data collection.
This talk (and hands-on session) introduce people to some of the basics of machine learning, neural networks and discusses some of the popular neural network architectures. We take a dive into one of the popular libraries, Tensorflow, and an associated abstraction library Keras.
To participate in the hands-on aspects of the workshop, bring a laptop computer with Python installed and install the following libraries using pip. For windows or (any other OS) consider doing an installation of anaconda that has all the necessary libraries.
- numpy, scipy & scikit-learn
- tensorflow / tensoflow-gpu (The first one is the GPU version)
- matplotlib for visualizations (if necessary)
- jupyter & ipython (We will use python2.7 in our experiments)
Following are helpful links:
Contact Nisha Pillai (NPillai1 at umbc.edu) with any questions regarding this event.
October 22nd, 2017
Multi-observable Session Reputation Scoring System
11:00-12:00 Monday, 23 October 2017, ITE 346
With increasing adoption of Cloud Computing, cyber attacks have become one of the most effective means for adversaries to inflict damage. To overcome limitations of existing blacklists and whitelists, our research focuses to develop a dynamic reputation scoring model for sessions based on a variety of observable and derived attributes of network traffic. Here we propose a technique to greylist sessions using observables like IP, Domain, URL and File Hash by scoring them numerically based on the events in the session. This enables automatic labeling of possible malicious hosts or users that can help in enriching the existing whitelists or blacklists.
July 16th, 2017
Deep Representation of Lyrical Style and Semantics for Music Recommendation
Abhay L. Kashyap
11:00-1:00 Thursday, 20 July 2017, ITE 346
In the age of music streaming, the need for effective recommendations is important for music discovery and a personalized user experience. Collaborative filtering based recommenders suffer from popularity bias and cold-start which is commonly mitigated by content features. For music, research in content based methods have mainly been focused in the acoustic domain while lyrical content has received little attention. Lyrics contain information about a song’s topic and sentiment that cannot be easily extracted from the audio. This is especially important for lyrics-centric genres like Rap, which was the most streamed genre in 2016. The goal of this dissertation is to explore and evaluate different lyrical content features that could be useful for content, context and emotion based models for music recommendation systems.
With Rap as the primary use case, this dissertation focuses on featurizing two main aspects of lyrics; its artistic style of composition and its semantic content. For lyrical style, a suite of high level rhyme density features are extracted in addition to literary features like the use of figurative language, profanity and vocabulary strength. In contrast to these engineered features, Convolutional Neural Networks (CNN) are used to automatically learn rhyme patterns and other relevant features. For semantics, lyrics are represented using both traditional IR techniques and the more recent neural embedding methods.
These lyrical features are evaluated for artist identification and compared with artist and song similarity measures from a real-world collaborative filtering based recommendation system from Last.fm. It is shown that both rhyme and literary features serve as strong indicators to characterize artists with feature learning methods like CNNs achieving comparable results. For artist and song similarity, a strong relationship was observed between these features and the way users consume music while neural embedding methods significantly outperformed LSA. Finally, this work is accompanied by a web-application, Rapalytics.com, that is dedicated to visualizing all these lyrical features and has been featured on a number of media outlets, most notably, Vox, attn: and Metro.
Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates, Cynthia Matuszek and Pranam Kolari (Walmart Labs)
June 16th, 2017
UMBC Data Science Graduate Programs
UMBC’s Data Science Master’s program prepares students from a wide range of disciplinary backgrounds for careers in data science. In the core courses, students will gain a thorough understanding of data science through classes that highlight machine learning, data analysis, data management, ethical and legal considerations, and more.
Students will develop an in-depth understanding of the basic computing principles behind data science, to include, but not limited to, data ingestion, curation and cleaning and the 4Vs of data science: Volume, Variety, Velocity, Veracity, as well as the implicit 5th V — Value. Through applying principles of data science to the analysis of problems within specific domains expressed through the program pathways, students will gain practical, real world industry relevant experience.
The MPS in Data Science is an industry-recognized credential and the program prepares students with the technical and management skills that they need to succeed in the workplace.
For more information and to apply online, see the Data Science MPS site.
June 7th, 2017
The University of Maryland, Baltimore County is looking to hire a Professor of the Practice to head a new graduate program in Data Science. See the job announcement for more information and apply online at Interfolio.
In addition to developing and teaching graduate data science courses, the new faculty member will serve as the Graduate Program Director of UMBC’s program leading to a master’s degree in Data Science. This cross-disciplinary program is offered to professional students through a partnership between the College of Engineering and Information Technology; the College of Arts, Humanities and Social Sciences; the College of Natural and Mathematical Sciences; the Department of Computer Science and Electrical Engineering; and UMBC’s Division of Professional Studies.