UMBC ebiquity
Data Science

Archive for the 'Data Science' Category

MS defense: Internal Penetration Test of a Simulated Automotive Ethernet Environment, 11/21

November 18th, 2017, by Tim Finin, posted in cybersecurity, Data Science, Security

M.S. Thesis Defense

Internal Penetration Test of a Simulated Automotive Ethernet Environment

Kenneth Owen Truex

11:15 Tuesday, 21 November 2017, ITE325, UMBC

The capabilities of modern day automobiles have far exceeded what Robert Bosch GmbH could have imagined when it proposed the Controller Area Network (CAN) bus back in 1986. Over time, drivers wanted more functionality, comfort, and safety in their automobiles — creating a burden for automotive manufacturers. With these driver demands came many innovations to the in-vehicle network core protocol. Modern automobiles that have a video based infotainment system or any type of camera assisted functionality such as an Advanced Driver Assistance System (ADAS) use ethernet as their network backbone. This is because the original CAN specification only allowed for up to 8 bytes of data per message on a bus rated at 1 Mbps. This is far less than the requirements of more advanced video-based automotive systems. The ethernet protocol allows for 1500 bytes of data per packet on a network rated for up to 100 Mbps. This led the automotive industry to adopt ethernet as the core protocol, overcoming most of the limitations posed by the CAN protocol. By adopting ethernet as the protocol for automotive networks, certain attack vectors are now available for black hat hackers to exploit in order to put the vehicle in an unsafe condition. I will create a simulated automotive ethernet environment using the CANoe network simulation platform by Vector GmbH. Then, a penetration test will be conducted on the simulated environment in order to discover attacks that pose a threat to automotive ethernet networks. These attacks will strictly follow a comprehensive threat model in order to narrowly focus the attack surface. If exploited successfully, these attacks will cover all three sides of the Confidentiality, Integrity, Availability (CIA) triad.

I will then propose a new and innovative mitigation strategy that can be implemented on current industry standard ECUs and run successfully under strict time and resource limitations. This new strategy can help to limit the attack surface that exists on modern day automobiles and help to protect the vehicle and its occupants from malicious adversaries.

Committee: Drs. Anupam Joshi (chair), Richard Forno, Charles Nicholas, Nilanjan Banerjee

new paper: Discovering Scientific Influence using Cross-Domain Dynamic Topic Modeling

November 17th, 2017, by Tim Finin, posted in Data Science, Earth science, KR, Machine Learning, NLP

Discovering Scientific Influence using Cross-Domain Dynamic Topic Modeling

Jennifer Sleeman, Milton Halem, Tim Finin and Mark Cane, Discovering Scientific Influence using Cross-Domain Dynamic Topic Modeling, International Conference on Big Data, IEEE, December 2017.

We describe an approach using dynamic topic modeling to model influence and predict future trends in a scientific discipline. Our study focuses on climate change and uses assessment reports of the Intergovernmental Panel on Climate Change (IPCC) and the papers they cite. Since 1990, an IPCC report has been published every five years that includes four separate volumes, each of which has many chapters. Each report cites tens of thousands of research papers, which comprise a correlated dataset of temporally grounded documents. We use a custom dynamic topic modeling algorithm to generate topics for both datasets and apply crossdomain analytics to identify the correlations between the IPCC chapters and their cited documents. The approach reveals both the influence of the cited research on the reports and how previous research citations have evolved over time. For the IPCC use case, the report topic model used 410 documents and a vocabulary of 5911 terms while the citations topic model was based on 200K research papers and a vocabulary more than 25K terms. We show that our approach can predict the importance of its extracted topics on future IPCC assessments through the use of cross domain correlations, Jensen-Shannon divergences and cluster analytics.

A Practitioners Introduction to Deep Learning, 1pm Fri 11/17

November 14th, 2017, by Tim Finin, posted in AI, Data Science, Machine Learning, talks

ACM Tech Talk Series

A Practitioner’s Introduction to Deep Learning

Ashwin Kumar Ganesan, PhD student

1:00-2:00pm Friday, 17 November 2017?, ITE325, UMBC

In recent years, Deep Neural Networks have been highly successful at performing a number of tasks in computer vision, natural language processing and artificial intelligence in general. The remarkable performance gains have led to universities and industries investing heavily in this space. This investment creates a thriving open source ecosystem of tools & libraries that aid the design of new architectures, algorithm research as well as data collection.

This talk (and hands-on session) introduce people to some of the basics of machine learning, neural networks and discusses some of the popular neural network architectures. We take a dive into one of the popular libraries, Tensorflow, and an associated abstraction library Keras.

To participate in the hands-on aspects of the workshop, bring a laptop computer with Python installed and install the following libraries using pip.  For windows or (any other OS) consider doing an installation of anaconda that has all the necessary libraries.

  • numpy, scipy & scikit-learn
  • tensorflow / tensoflow-gpu (The first one is the GPU version)
  • matplotlib for visualizations (if necessary)
  • jupyter & ipython (We will use python2.7 in our experiments)

Following are helpful links:

Contact Nisha Pillai (NPillai1 at with any questions regarding this event.

Arya Renjan: Multi-observable Session Reputation Scoring System

October 22nd, 2017, by Tim Finin, posted in AI, cybersecurity, Data Science, Ebiquity, Machine Learning, meetings, talks

Multi-observable Session Reputation Scoring System

Arya Renjan

11:00-12:00 Monday, 23 October 2017, ITE 346

With increasing adoption of Cloud Computing, cyber attacks have become one of the most effective means for adversaries to inflict damage. To overcome limitations of existing blacklists and whitelists, our research focuses to develop a dynamic reputation scoring model for sessions based on a variety of observable and derived attributes of network traffic. Here we propose a technique to greylist sessions using observables like IP, Domain, URL and File Hash by scoring them numerically based on the events in the session. This enables automatic labeling of possible malicious hosts or users that can help in enriching the existing whitelists or blacklists.

PhD defense: Deep Representation of Lyrical Style and Semantics for Music Recommendation

July 16th, 2017, by Tim Finin, posted in Data Science, Machine Learning, NLP, Semantic Web

Dissertation Defense

Deep Representation of Lyrical Style and Semantics for Music Recommendation

Abhay L. Kashyap

11:00-1:00 Thursday, 20 July 2017, ITE 346

In the age of music streaming, the need for effective recommendations is important for music discovery and a personalized user experience. Collaborative filtering based recommenders suffer from popularity bias and cold-start which is commonly mitigated by content features. For music, research in content based methods have mainly been focused in the acoustic domain while lyrical content has received little attention. Lyrics contain information about a song’s topic and sentiment that cannot be easily extracted from the audio. This is especially important for lyrics-centric genres like Rap, which was the most streamed genre in 2016. The goal of this dissertation is to explore and evaluate different lyrical content features that could be useful for content, context and emotion based models for music recommendation systems.

With Rap as the primary use case, this dissertation focuses on featurizing two main aspects of lyrics; its artistic style of composition and its semantic content. For lyrical style, a suite of high level rhyme density features are extracted in addition to literary features like the use of figurative language, profanity and vocabulary strength. In contrast to these engineered features, Convolutional Neural Networks (CNN) are used to automatically learn rhyme patterns and other relevant features. For semantics, lyrics are represented using both traditional IR techniques and the more recent neural embedding methods.

These lyrical features are evaluated for artist identification and compared with artist and song similarity measures from a real-world collaborative filtering based recommendation system from It is shown that both rhyme and literary features serve as strong indicators to characterize artists with feature learning methods like CNNs achieving comparable results. For artist and song similarity, a strong relationship was observed between these features and the way users consume music while neural embedding methods significantly outperformed LSA. Finally, this work is accompanied by a web-application,, that is dedicated to visualizing all these lyrical features and has been featured on a number of media outlets, most notably, Vox, attn: and Metro.

Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates, Cynthia Matuszek and Pranam Kolari (Walmart Labs)

UMBC Data Science Graduate Program Starts Fall 2017

June 16th, 2017, by Tim Finin, posted in Big data, Data Science, Database, Datamining, KR, Machine Learning, NLP


UMBC Data Science Graduate Programs

UMBC’s Data Science Master’s program prepares students from a wide range of disciplinary backgrounds for careers in data science. In the core courses, students will gain a thorough understanding of data science through classes that highlight machine learning, data analysis, data management, ethical and legal considerations, and more.

Students will develop an in-depth understanding of the basic computing principles behind data science, to include, but not limited to, data ingestion, curation and cleaning and the 4Vs of data science: Volume, Variety, Velocity, Veracity, as well as the implicit 5th V — Value. Through applying principles of data science to the analysis of problems within specific domains expressed through the program pathways, students will gain practical, real world industry relevant experience.

The MPS in Data Science is an industry-recognized credential and the program prepares students with the technical and management skills that they need to succeed in the workplace.

For more information and to apply online, see the Data Science MPS site.

UMBC Seeks Professor of the Practice to Head new Data Science Program

June 7th, 2017, by Tim Finin, posted in Data Science, Semantic Web, UMBC

The University of Maryland, Baltimore County is looking to hire a Professor of the Practice to head a new graduate program in Data Science. See the job announcement for more information and apply online at Interfolio.

In addition to developing and teaching graduate data science courses, the new faculty member will serve as the Graduate Program Director of UMBC’s program leading to a master’s degree in Data Science. This cross-disciplinary program is offered to professional students through a partnership between the College of Engineering and Information Technology; the College of Arts, Humanities and Social Sciences; the College of Natural and Mathematical Sciences; the Department of Computer Science and Electrical Engineering; and UMBC’s Division of Professional Studies.

You are currently browsing the archives for the Data Science category.

  Home | Archive | Login | Feed