Data Quality and Linguistic Cues for Domain-independent Deception Detection

Casey Hanks; Rakesh M. Verma

IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT)

Data Quality and Linguistic Cues for Domain-independent Deception Detection

December 6, 2022

Deception is pervasive in today’s connected society and is being spread in a multitude of different forms with diverse goals, which we refer to as domains of deception. The most crucial research task in the field of deception is identification of deception, which in most cases involves a machine learning model making the binary classification of Deceptive or Not Deceptive. These classification models are very important as they can help protect the security of an organization by preventing phishing emails from being read, protect online retailers from being flooded with fictitious reviews, and many other tasks depending on the domain of deception they are trained to handle. There has been a fair amount of research focused on the classification of deception, however most research has focused on one domain of deception exclusively. In this work, we look at the quality of multiple datasets across different domains of deception, investigate the traces that deception may leave across domains by performing multiple tests using machine learning models, as well as ascertain how using linguistic cues to identify deception performs over multiple domains.

807646 bytes

BibTeX OWL Tweet Scholar

Tags: computational linguistics, data quality, deception, learning, natural language processing

Type: InProceedings

Publisher: IEEE

Note: doi: 10.1109/BDCAT56447.2022.00042

Downloads: 147 downloads