Modeling and Extracting Information about Cybersecurity Events from Text

People now rely on the Internet to carry out much of their daily activities such as banking, ordering food, and socializing with their family and friends. The technology facilitates our lives, but also comes with many problems, including cybercrimes, stolen data, and identity theft. With the large and increasing number of transactions done every day, the frequency of cybercrime events is also growing. Since the number of security-related events is too high for manual review and monitoring, we need to train machines to be able to detect and gather data about potential cyber threats. To support machines that can identify and understand threats, we need standard models to store the cybersecurity information and information extraction systems that can collect information to populate the models with data from text.

This dissertation makes two significant contributions. First, we defined rich cybersecurity event schema and annotated a news corpus following the schema. Our schema consists of event type definitions, semantic roles, and event arguments. Second, we present CASIE, a cybersecurity event extraction system. CASIE can detect cybersecurity events, identify event participants and their roles, including specifying realis values. It also groups the events, which are coreference. CASIE produces output in an easy to use format, as a JSON object.


We believe that this work will be useful for cybersecurity management in the future. It will quickly grasp cybersecurity event information out of the unstructured text and fill in the event frame. So we can keep up with many cybersecurity events that happen every day.

cybersecurity, information extraction, natural language processing


University of Maryland, Baltimore county

Proquest metadata

Downloads: 31 downloads

UMBC ebiquity