IEEE Access Journal
Creating Cybersecurity Knowledge Graphs from Malware After Action Reports
December 1, 2020
After Action Reports provide incisive analysis of cyber-incidents. Extracting cyber-knowledge from these sources would provide security analysts with credible information, which they can use to detect, or find patterns indicative of, a future cyber-attack. It is not possible for a security analyst to read and garner relevant information from a large number of after action reports and similar textual documents that detail an attack. An automated pipeline that extracts from text sources, represents this in a knowledge graph and reasons over it, could help them to analyze cyber-attacks of the future. In this paper, we describe a system to extract information from After Action Reports, which are published by established security corporations, and represent that in a Cybersecurity Knowledge Graph (CKG). We also show how these can also incorporate information from semi structured sources such as STIX. They can also help security analysts execute queries that involve inferences, and retrieve information required to detect a future attack. We extract entities by building a customized named entity recognizer called `Malware Entity Extractor' (MEE). We then build a neural network to predict how pairs of `malware entities' are related to each other. Once, we have predicted entity pairs and the relationship between them, we assert the `entity-relationship set' into a cybersecurity knowledge graph. In this process, each individual source of information (i.e. after action report) would lead to its own graph. Our next step in the process is to fuse the graph on common entities where possible, to create a single graph which represented knowledge in multiple documents. The cybersecurity knowledge graph can be populated from one After Action Report, and can also be fused with another knowledge graph about a similar cyber-attack, or an After Action Reports describing attributes of a similar malware. We show how this knowledge can be used to answer analyst queries that are not possible to be answered from a single source.
Downloads: 600 downloads