Mid-Atlantic Student Colloquium on Speech, Language and Learning
CyberEnt: Extracting Domain Specific Entities from Cybersecurity Text,
April 30, 2022
We have created an initial large, unstructured CTI corpus from a variety of open sources such as cybersecurity vendor reports/blogs, vulnerability databases (Common Vulnerabilities and Exposures (CVE)) records, and Advanced Persistent Threat (APT) reports. We are using the corpus to train and test cybersecurity entity models using the SpaCy framework and, in particular, exploring self-learning methods to automatically recognize cybersecurity entities based on limited but high-quality training datasets.
This material is based upon work supported by a grant from NSA and from National Science Foundation Grant No. 2114892.
Downloads: 143 downloads