As cybersecurity-related threats continue to increase, understanding how the field is changing over time can give insight into combating new threats and understanding historical events. We show how to apply dynamic topic models to a set of cybersecurity documents to understand how the concepts found in them are changing over time. We correlate two different data sets, the first relates to specific exploits and the second relates to cybersecurity research. We use Wikipedia concepts to provide a basis for performing concept phrase extraction and show how using concepts to provide context improves the quality of the topic model. We represent the results of the dynamic topic model as a knowledge graph that could be used for inference or information discovery.
Modeling and Extracting Information about Cybersecurity Events from Text
9:30-11:30 Monday, 18 November, 2019, ITE346?
People now rely on the Internet to carry out much of their daily activities such as banking, ordering food, and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data, and identity theft. With the large and increasing number of transactions done every day, the frequency of cybercrime events is also growing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cyber threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.
This dissertation makes two significant contributions. First, we defined rich cybersecurity event schema and annotated the news corpus following the schema. Our schema consists of event type definitions, semantic roles, and event arguments. Second, we present CASIE, a cybersecurity event extraction system. CASIE can detect cybersecurity events, identify event participants and their roles, including specifying realis values. It also groups the events, which are coreference. CASIE produces output in easy to use format as a JSON object.
We believe that this dissertation will be useful for cybersecurity management in the future. It will quickly grasp cybersecurity event information out of the unstructured text and fill in the event frame. So we can compete with tons of cybersecurity events that happen every day.
Committee: Drs. Tim Finin (chair), Anupam Joshi, Tim Oates, Karuna Pande Joshi, Francis Ferraro