Comparing and Evaluating Semantic Data Automatically Extracted from Text,

Dawn Lawrie; Tim Finin; James Mayfield; Paul McNamee

AAAI 2013 Fall Symposium on Semantics for Big Data

Comparing and Evaluating Semantic Data Automatically Extracted from Text,

Dawn Lawrie, Tim Finin, James Mayfield, and Paul McNamee

November 15, 2013

One way to obtain large amounts of semantic data is to extract facts from the vast quantities of text that is now available on-line. The relatively low accuracy of current information extraction techniques introduces a need for evaluating the quality of the knowledge bases (KBs) they generate. We frame the problem as comparing KBs generated by different systems from the same documents and show that exploiting provenance leads to more efficient techniques for aligning them and identifying their differences. We describe two types of tools: entity-match focuses on differences in entities found and linked; kbdiff focuses on differences in relations among those entities. Together, these tools support assessment of relative KB accuracy by sampling the parts of two KBs that disagree. We explore the usefulness of the tools through the construction of tens of different KBs built from the same 26,000 Washington Post articles and identifying the differences.

559802 bytes

BibTeX OWL Tweet Scholar

Tags: big data, information extraction, natural language processing, semantics

Type: InProceedings

Publisher: AAAI Press

Downloads: 1505 downloads