IEEE International Symposium on Technologies for Homeland Security

Understanding Multi-lingual Threat Intelligence for AI based Cyber-defense Systems

Information across political, cultural, and geographical boundaries is widely communicated over a global Internet. Today we have a multilingual Internet where people converse in languages like English, Mandarin, Russian, Hindi, etc. Cyber threats in particular, originate from and are mitigated over a broad range of geographic regions. Although cybersecurity web data is vastly available on the web, it is disparate among major natural languages, decreasing interoperability on a multilingual level. The vast geographic distribution of cyber attacks increases the difficulty of employing strong cyber risk management across organizations worldwide. Cybersecurity actors, both attackers and defenders, converse over social media, blogs, dark web vulnerability markets,etc in diverse languages. These non-traditional sources are becoming an important asset for threat intelligence mining and many times are first to receive the latest intelligence about vulnerabilities, exploits, and threats. These sources are prime tools for dissemination of integral threat intelligence data, ranging from political factors such as the international origination and intention behind attacks, to technical factors such as sources of new software vulnerabilities and exploits.

UMBC ebiquity