IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT)

Data Quality and Linguistic Cues for Domain-independent Deception Detection


Deception is pervasive in today’s connected society and is being spread in a multitude of different forms with diverse goals, which we refer to as domains of deception. The most crucial research task in the field of deception is identification of deception, which in most cases involves a machine learning model making the binary classification of Deceptive or Not Deceptive. These classification models are very important as they can help protect the security of an organization by preventing phishing emails from being read, protect online retailers from being flooded with fictitious reviews, and many other tasks depending on the domain of deception they are trained to handle. There has been a fair amount of research focused on the classification of deception, however most research has focused on one domain of deception exclusively. In this work, we look at the quality of multiple datasets across different domains of deception, investigate the traces that deception may leave across domains by performing multiple tests using machine learning models, as well as ascertain how using linguistic cues to identify deception performs over multiple domains.

  • 807646 bytes

computational linguistics, data quality, deception, learning, natural language processing



doi: 10.1109/BDCAT56447.2022.00042

Downloads: 139 downloads

UMBC ebiquity