| Building intelligent systems in open, heterogeneous, dynamic, distributed environments |
| Title: Information Extraction via Automatic Generation of Semantic Classifiers Speaker: Zareen Syed Start Date: Tuesday, September 16, 2008, 10:30AM End Date: Tuesday, September 16, 2008, 12:00PM Location: ITE 346 Abstract: Information extraction is an important unsolved problem of natural
language processing (NLP). It is the problem of extracting entities
(such as people, organizations or locations) and named relations
between entities (such as "People born-in Country") from text
documents. An important challenge in information extraction is the
labeling of training data which is usually done manually and is
therefore very expensive. This talk introduces a new "model" to generate training data with least manual intervention. Our approach uses structured data available in Encarta (Encyclopedia) to generate the training data. Encarta articles are categorized and linked to related articles by experts. We harvest the structured data available in Encarta and use it in an intuitive way for automatic generation of classifiers. The classifiers were employed on the following information extraction tasks:
The talk will also cover the challenges faced in using the Encarta and MindNet resources and give an overview of promising future work directions. Web Site: http://ebiquity.umbc.edu/ Tags: information extraction, natural language processing, encarta |
| Home | About Us | Contact Us | Site Map | Legal | Privacy Copyright © 1999-2009 UMBC ebiquity research group. Copyright © 2003-2009 Site design and RGB engine code by Filip Perich. XG Page gen 0.016 sec. |