Modeling and Extracting Information about Cybersecurity Events from Text
9:30-11:30 Monday, 18 November, 2019, ITE346?
People now rely on the Internet to carry out much of their daily activities such as banking, ordering food, and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data, and identity theft. With the large and increasing number of transactions done every day, the frequency of cybercrime events is also growing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cyber threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.
This dissertation makes two significant contributions. First, we defined rich cybersecurity event schema and annotated the news corpus following the schema. Our schema consists of event type definitions, semantic roles, and event arguments. Second, we present CASIE, a cybersecurity event extraction system. CASIE can detect cybersecurity events, identify event participants and their roles, including specifying realis values. It also groups the events, which are coreference. CASIE produces output in easy to use format as a JSON object.
We believe that this dissertation will be useful for cybersecurity management in the future. It will quickly grasp cybersecurity event information out of the unstructured text and fill in the event frame. So we can compete with tons of cybersecurity events that happen every day.
Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates, Karuna Pande Joshi, Francis Ferraro
Automated Data Augmentation via Wikidata Relationships
Oyesh Singh, UMBC 10:30-11:30 Monday, 21 October 2019, ITE 346
With the increase in complexity of machine learning models, there is more need for data than ever. In order to fill this gap of annotated data-scarce situation, we look towards the ocean of free data present in Wikipedia and other WIkimedia resources. Wikipedia has an enormous amount of data in many languages along with the knowledge graph defined in Wikidata. In this presentation, I will explain how we utilized the Wikipedia/Wikidata data to boost the performance of BERT models for named entity recognition.
2018 Mid-Atlantic Student Colloquium on Speech, Language and Learning
The 2018 Mid-Atlantic Student Colloquium on Speech, Language and Learning (MASC-SLL) is a student-run, one-day event on speech, language & machine learning research to be held at the University of Maryland, Baltimore County (UMBC) from 10:00am to 6:00pm on Saturday May 12. There is no registration charge and lunch and refreshments will be provided. Students, postdocs, faculty and researchers from universities & industry are invited to participate and network with other researchers working in related fields.
Students and postdocs are encouraged to submit abstracts describing ongoing, planned, or completed research projects, including previously published results and negative results. Research in any field applying computational methods to any aspect of human language, including speech and learning, from all areas of computer science, linguistics, engineering, neuroscience, information science, and related fields is welcome. Submissions and presentations must be made by students or postdocs. Accepted submissions will be presented as either posters or talks.
1:00-2:00pm Friday, 17 November 2017?, ITE325, UMBC
In recent years, Deep Neural Networks have been highly successful at performing a number of tasks in computer vision, natural language processing and artificial intelligence in general. The remarkable performance gains have led to universities and industries investing heavily in this space. This investment creates a thriving open source ecosystem of tools & libraries that aid the design of new architectures, algorithm research as well as data collection.
This talk (and hands-on session) introduce people to some of the basics of machine learning, neural networks and discusses some of the popular neural network architectures. We take a dive into one of the popular libraries, Tensorflow, and an associated abstraction library Keras.
To participate in the hands-on aspects of the workshop, bring a laptop computer with Python installed and install the following libraries using pip. For windows or (any other OS) consider doing an installation of anaconda that has all the necessary libraries.
numpy, scipy & scikit-learn
tensorflow / tensoflow-gpu (The first one is the GPU version)
matplotlib for visualizations (if necessary)
jupyter & ipython (We will use python2.7 in our experiments)
With increasing adoption of Cloud Computing, cyber attacks have become one of the most effective means for adversaries to inflict damage. To overcome limitations of existing blacklists and whitelists, our research focuses to develop a dynamic reputation scoring model for sessions based on a variety of observable and derived attributes of network traffic. Here we propose a technique to greylist sessions using observables like IP, Domain, URL and File Hash by scoring them numerically based on the events in the session. This enables automatic labeling of possible malicious hosts or users that can help in enriching the existing whitelists or blacklists.
The OntologySummit is an annual series of online and in-person events that involves the ontology community and communities related to each year’s topic. The topic chosen for the 2018 Ontology Summit will be Ontologies in Context, which the summit describes as follows.
“In general, a context is defined to be the circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood and assessed. Some examples of synonyms include circumstances, conditions, factors, state of affairs, situation, background, scene, setting, and frame of reference. There are many meanings of “context” in general, and also for ontologies in particular. The summit this year will survey these meanings and identify the research problems that must be solved so that contexts can succeed in achieving the full understanding and assessment of an ontology.”
Each year’s Summit comprises of a series of both online and face-to-face events that span about three months. These include a vigorous three-month online discourse on the theme, and online panel discussions, research activities which will culminate in a two-day face-to-face workshop and symposium.
Over the next two months, there will be a sequence of weekly online meetings to discuss, plan and develop the 2018 topic. The summit itself will start in January with weekly online sessions of invited speakers. Visit the the 2018 Ontology Summit site for more information and to see how you can participate in the planning sessions.
A Hands-on Introduction to TensorFlow and Machine Learning
Abhay Kashyap, UMBC ebiquity Lab
10:00-11:00am Tuesday, 28 March 2017, ITE346 ITE325b
As many of you know, TensorFlow is an open source machine learning library by Google which simplifies building and training deep neural networks that can take advantage of computers with GPUs. In this meeting, I will introduce some basic concepts of TensorFlow and machine learning in general. This will be a hands on tutorial where we will sit and code up some basic examples in TensorfFow. Specifically, we will use TensorFlow to implement linear regression, softmax classifiers and feed forward neural networks (MLP). You can find the Python notebooks here. If time permits, we will go over the implementation of the popular word2vec algorithm and introduce LSTMs to build language models.
What you need to know: Python and the basics of linear algebra and matrix operations. While it helps to know basics of machine learning, no prior knowledge will be assumed and there will be a gentle high level introduction to the algorithms we will implement.
What you need to bring: A laptop that has Python and pip installed. Having virtual environments set up on your computer is also a plus. (Warning: Windows-only users might be publicly shamed)