]>
In this dissertation, we demonstrate several joint models to refine a knowledge graph under different settings, ranging from validating inferred facts to extracting and justifying beliefs from text. We first consider verifying an existing knowledge graph without any additional or supporting provenance information. We develop unsupervised models using knowledge-enriched tensor factorization to determine the validity of the inferred facts by learning entity and relation embeddings. Compared to previous approaches, our model depends on neither external schema nor a corpus to guide the learning of the embeddings. Rather, it constrains the embeddings using graph structures computed using data-driven approaches. We introduce four models, two quadratic and two linear in the number of entities, and study the effect of incorporating graph structures. We also provide a convergence proof for one of our models, demonstrating that the linear model with more than two variables converges. Compared with other baselines we found the models with prior information to achieve better performance and generalization especially when the graph is very sparse.
Secondly, we consider verifying an existing knowledge graph, but we assume that we may make use of text-based provenance. In the previous problem, we assumed the underlying knowledge graph does not contain errors. However, it is rare to obtain such a good quality extraction from text. Hence, we explore the reading consistency of a machine to extract beliefs from given provenance sentences to construct a knowledge graph. We describe an approach to jointly determine if an existing knowledge graph belief was read consistently or not and suggest a potential fix when it was not read consistently. Unlike previous approaches, ours does not depend on opaque web search engines, does not make use of schema, and does not assume an ensemble of IE systems. By conducting experiments on different IE and human-generated datasets, we found that most of the errors made by information extraction systems are due to choosing an incorrect relation given provenance information, and a simple model can perform comparably well with a complex or more expressive model. As the errors made by IE systems are mostly lexical or syntactic in nature, the word order (or composability) can be ignored for the task.
We finally consider how to verify beliefs represented in natural language. This deviates from the assumptions of our previous contributions---namely that here we are not working with tuples but rather the text used to generate tuples. Most of the current information extraction systems do not question the quality of the input sentences and process/extract facts from it. In the case of misinformed articles, incorrect facts could be learned which might conflict with existing knowledge graph facts. To fill the gap we propose a novel model to determine the validity of the input sentence and provide interpretable, evidence justifications to explain the classifier's prediction. Compared to previous work, which are focused on specific datasets and use dataset-specific heuristics, we focus on studying the effectiveness of frame-based semantics to narrow the search space of evidence sentences to provide better explanations and study the effect of utilizing discrete inference for claim validity task, and benefit of jointly learning the claim classification and explanation task. We found joint modeling performs better compared to a single task. Also, better evidence sentences are retrieved when semantic-frames from FrameNet are considered achieving significant performance gain nearly double the performance in the retrieval and the classification task.
]]>