Improving Word Similarity by Augmenting PMI with Estimates of Word PolysemyTweet
Date: June 01, 2011
Abstract: Although pointwise mutual information (PMI) has become a commonly used word similarity measure, a clear understanding of how it works has been lacking. In this paper we explore how PMI differs from distributional similarity, and we introduce a novel metric, PMImax, that augments PMI with information about a word's number of senses. The coefficients of PMImax are determined empirically by maximizing a utility function based on the performance of automatic thesaurus generation. We show that PMImax outperforms traditional PMI in the application of automatic thesaurus generation and in word similarity benchmark datasets: human similarity ratings and TOEFL synonym questions. PMImax achieves a correlation coefficient comparable to the best knowledge-based approaches on the Miller-Charles similarity rating dataset.
Organization: Computer Science and Electrical Engineering
Institution: University of Maryland, Baltimore County
Google Scholar: search