Improved Information Retrieval through Set-Based Preference
Friday, April 25, 2008, 9:00am
228 ITE
I present a new information retrieval framework based on set-based
preference learning that provides users with individually customized
search results. Advances in information storage and retrieval have
enabled computer users to search immense data repositories for very
specific content. While newer information retrieval algorithms can
provide a much richer result set than traditional term-weighting
methods, they adopt a one-size-fits-all approach. By analyzing a
user's prior search data, I estimate their preference for result sets
in terms of two criteria: the relevance of individual documents to
the user's query and the overall diversity of content throughout the
result set. I then select a subset of all search results that best
balances their personal preference for the two criteria. I
demonstrate the effectiveness of my approach on toy domains, data
from the Reuters news service, and the 20 Newsgroups data set.
I also present a heuristic to automatically estimate a user's valuation of document relevance relative to document diversity. Many existing methods for diversifying search result sets include parameters that control the role that relevance and diversity play in selecting search results. However, these methods do not automatically customize the values of these controlling parameters to individual users. I present an approach to estimate each user's personal valuation of relevance and diversity in search results automatically, with no need for manual supervision.