Mid-Atlantic Student Colloquium on Speech, Language and Learning

CyberEnt: Extracting Domain Specific Entities from Cybersecurity Text

, , , , and

We have created an initial large, unstructured CTI corpus from a variety of open sources such as cybersecurity vendor reports/blogs, vulnerability databases (Common Vulnerabilities and Exposures (CVE)) records, and Advanced Persistent Threat (APT) reports. We are using the corpus to train and test cybersecurity entity models using the SpaCy framework and, in particular, exploring self-learning methods to automatically recognize cybersecurity entities based on limited but high-quality training datasets.

This material is based upon work supported by a grant from NSA and from National Science Foundation Grant No. 2114892.


  • 135523 bytes

  • 451057 bytes

cybersecurity, information extraction, named entity recognition, natural language processing

InProceedings

(Poster presentation)

Downloads: 1015 downloads

UMBC ebiquity