We KnowItAll: lessons from a Quarter Century of Web Extraction Research
by Oren Etzioni
Tuesday, November 10, 2009, 16:30pm - Tuesday, November 10, 2009, 17:30pm
For the last quarter century (measured in person years), the KnowItAll project has investigated information extraction at Web scale. If successful, this effort will begin to address the long-standing "Knowledge Acquisition Bottleneck" in Artificial Intelligence, and will enable a new generation of search engines that extract and synthesize information from text to answer complex user queries. To date, we have generalized information extraction methods to process arbitrary Web text, to handle unanticipated concepts, and to leverage the redundancy inherent in the Web corpus, but many challenges remain. One of the most formidable challenges is moving from extracting isolated nuggets of information to capturing a coherent body of knowledge that can support automatic inference. My talk will describe the lessons we have learned and identify directions for future work.