| UMBC ebiquity |
Improving Word Similarity by Augmenting PMI with Estimates of Word PolysemyTweetAuthors: Lushan Han, Tim Finin, Paul McNamee, Anupam Joshi, and Yelena Yesha Date: June 01, 2011 Abstract: Although pointwise mutual information (PMI) has become a commonly used word similarity measure, a clear understanding of how it works has been lacking. In this paper we explore how PMI differs from distributional similarity, and we introduce a novel metric, PMImax, that augments PMI with information about a word's number of senses. The coefficients of PMImax are determined empirically by maximizing a utility function based on the performance of automatic thesaurus generation. We show that PMImax outperforms traditional PMI in the application of automatic thesaurus generation and in word similarity benchmark datasets: human similarity ratings and TOEFL synonym questions. PMImax achieves a correlation coefficient comparable to the best knowledge-based approaches on the Miller-Charles similarity rating dataset. See Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy, TKDE, 2012. Type: TechReport Organization: Computer Science and Electrical Engineering Institution: University of Maryland, Baltimore County Tags: semantic similarity, pointwise mutual information, automatic thesaurus generation, corpus statistics Google Scholar: search Active Project Bookmark at: Digg | Del.icio.us | Connotea | CiteULike |