Identifying objects and people on the semantic web
By Tim Finin on Saturday, December 18th, 2004 at 1:21 pm.One of the motivations for the semantic web is information integration. The use of URIs as a shared way to reference a resource supports this. But, it’s easy to forget, or at least not focus on, how messy the web and the world really are. We don’t all share the same set of concepts. We have no mechanism (yet) to forge a consensus about what URIs to use to denote concepts of individuals in the non-web world. References after depend on context. People typically describe things by attributes. And so forth.
Here are a pair of interesting papers from the TAP project, a collaboration between Stanford, IBM and the W3C, that deal with some of the practical problems in this area.
Object Coidentification on the Semantic Web, R. Guha. “The SemanticWeb seeks integrate data from many different sources. Since different sources often use different names for the same object, we need to map between these names. We first consider the use of keys to do this mapping and discuss some of the associated problems. We introduce the concept of bootstrapping from some shared names to more shared names and discuss some conditions under which this process is guaranteed to be correct. We describe a probabilistic approach to matching and propose approximations to address the issue of requiring a combinatorially large number of joint probabilities. We report on empirical studies for validating this approach in two interesting domains. Finally, we discuss the implications of better matching techniques for privacy.”
Disambiguating People in Search, R. Guha and A. Garg. “Searching for information about people is a common activity on web search engines. For most names, there are multiple people in the world with that name, forcing users to add keywords to narrow down the results to pages that refer to the particular person they are looking for. In this paper, we present a solution to this problem. We start with simple user interface for indicating which person the user intended. We then focus on how a search engine can rank more highly the pages that refer to this person. We propose an evaluation criterion for this feature and present results from a vector similarity based approach to this problem. To improve on these results, we describe a formal model of reference and provide a general framework for using knowledge from many different sources, to solve this problem. We present results from an empirical study that validates our framework.”
Related posts: • Virtual property to kill for!; • Google Scholar, it’s a good thing; • Tags are the new black;
