UMBC ebiquity

Text Based Similarity Metrics and Delta for Semantic Web Graphs

Authors: Krishnamurthy Viswanathan

Date: June 28, 2010

Abstract: Recognizing that two semantic web documents or graphs are similar, and characterizing their differences is useful in many tasks, including retrieval, updating, version control and knowledge base editing. We describe a number of text based similarity metrics that characterize the relation between semantic web graphs and evaluate these metrics for three specific cases of similarity that we have identified: similarity in classes and properties used while differing only in literal content, difference only in base-URI, and versioning relationship. In addition to determining the similarity between two Semantic Web graphs, we generate a ’delta’ between graphs that have been identified as having a versioning relationship. The delta consists of triples to be added or removed from one to make them equivalent. This method takes into account the text of the RDF graph’s serialization as a document, rather than relying solely on the document URI. We have prototyped these techniques in a system that we call similis. We have evaluated our system on several tasks using a collection of graphs from the archive of the Swoogle Semantic Web search engine.

Type: MastersThesis

Address: 1000 Hilltop Circle

Organization: University of Maryland, Baltimore County

Publisher: UMBC

Pages: 89

Tags: information retrieval, semantic web, near duplicate detection, delta, version relation, diff, similarity metric

Google Scholar: search

Number of downloads: 1003

 

Available for download as


size: 1012155 bytes