Learning by Reading: Automatic Knowledge Extraction Through Semantic Analysis
Friday, July 2, 2010, 10:00am - Friday, July 2, 2010, 12:00pm
ITE 351, UMBC
To support rich semantic analysis of text, traditional natural language processing tools require access to a cache of static knowledge with both broad coverage and deep meaning. Acquiring this knowledge by hand is so expensive and error-prone, it has been dubbed the "knowledge acquisition bottleneck". In this work, we present a method for reducing the impact of this bottleneck by automating the knowledge acquisition task using the novel approach of bootstrapping a machine learner with a fully-realized semantic analysis engine, creating a life-long learner.
We present an overview of our learner: a system that automatically produces lexical and ontological knowledge resources by building a corpus of raw texts from the web, semantically analyzing them to the best ability of the existing engine, and extracting the word meanings. We expand on this overview by presenting a series of experiments in chronological order, each evolving on the previous one as we explore the possibilities presented by our methodology.
Finally, we explore a series of improvements to our system: we discuss a variety of changes to individual components, as well as complete methodological shifts. These discussions will set the stage for the next round of interesting experiments in pursuit of a fully automatic language learner.
Committee:- Marjorie McShane
- Sergei Nirenburg (chair)
- Tim Oates
- Yelena Yesha
- Nicholas Cassimatis, RPI