Learning by Reading: Automatic Knowledge Extraction Through Semantic Analysis

by

Friday, July 2, 2010, 10:00am - Friday, July 2, 2010, 12:00pm

ITE 351, UMBC

information extraction, natural language processing

Ph.D. Dissertation Defense

To support rich semantic analysis of text, traditional natural language processing tools require access to a cache of static knowledge with both broad coverage and deep meaning. Acquiring this knowledge by hand is so expensive and error-prone, it has been dubbed the "knowledge acquisition bottleneck". In this work, we present a method for reducing the impact of this bottleneck by automating the knowledge acquisition task using the novel approach of bootstrapping a machine learner with a fully-realized semantic analysis engine, creating a life-long learner.

We present an overview of our learner: a system that automatically produces lexical and ontological knowledge resources by building a corpus of raw texts from the web, semantically analyzing them to the best ability of the existing engine, and extracting the word meanings. We expand on this overview by presenting a series of experiments in chronological order, each evolving on the previous one as we explore the possibilities presented by our methodology.

Finally, we explore a series of improvements to our system: we discuss a variety of changes to individual components, as well as complete methodological shifts. These discussions will set the stage for the next round of interesting experiments in pursuit of a fully automatic language learner.

Committee:
  • Marjorie McShane
  • Sergei Nirenburg (chair)
  • Tim Oates
  • Yelena Yesha
  • Nicholas Cassimatis, RPI

Sergei Nirenburg

OWL Tweet

UMBC ebiquity