UMBC ebiquity

A Hybrid Approach to Unsupervised Relation Discovery via Linguistic Analysis, Entropy-based Label Ranking and Semantic Typing

Speaker: Zareen Syed

Start: Tuesday, October 27, 2009, 10:15AM

End: Tuesday, October 27, 2009, 11:30AM

Location: ITE 325 B

Abstract: Zareen Syed will talk about "A Hybrid Approach to Unsupervised Relation Discovery via Linguistic Analysis, Entropy-based Label Ranking and Semantic Typing"

ABSTRACT:
There are today two main approaches in Information Extraction systems to extract entities and relations between them from text: a knowledge engineering approach which requires grammars to be hand crafted to express the rules for the system, a quite laborious process; an automatic training approach which requires the hand annotation of training data by a domain expert, and where sufficient volume of training data is required in order to get reasonable accuracy. In boththese approaches, the set of relations need to be identified beforehand and each time a new relation is required, the whole process has to be repeated again. Research effort has been done in the direction of "unrestricted relation discovery" [2] to automatically identify the different relations present in text without specifying a specific relation or set of relations in advance. Recently, linguistic analysis has been described as an effective technique for relation extraction [3].

We propose an approach for unsupervised and unrestricted relation discovery between entities using factz (encoded as triples) produced by Powerset [4] and which have been extracted through a linguistic analysis process. We first evaluate the Powerset factz with ground truth and with text based label ranking approach [1]. Our evaluation results show that Powerset labels have higher accuracy than text based label ranking. We have developed an unsupervised approach for findingrelation labels that are synonyms from Powerset factz based on relational clustering. In order to select a label to represent a semantic relation between a pair of entities which is expressed through multiple factz in Powerset, we have developed a hybrid label selection approach based on relation label synonyms and text based label ranking. Initial experiments show that our approach is suitable for finding domain specific synonyms which may not be present in Wordnet synsets.Further improvements on relation labeling can be made leveraging the semantics of a knowledge base.

[1] Jinxiu Chen et. al, 2005. Unsupervised Feature Selection for Relation Extraction. In Proceedings IJCNLP-2005.
[2] Shinyama, Y. and Sekine, S. 2006. Preemptive information extraction using unrestricted relation discovery. In Proceedings of the Human Language Technology Conference. ACL, Morristown, NJ.
[3] Yulan Yan et al. 2009. Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web. Proceedings of the 47th Annual Meeting of the ACL and the 4th IJCNLP of the AFNLP.
[4]www.powerset.com

Host: Tim Finin

,