RDF molecules and lossless decompositions of RDF graphs

July 31st, 2005

Some RDF graphs can be viewed as making assertions about the world. Suppose you were given a graph, G, and asked to find supporting evidence on the web.

One approach is to search for documents with RDF graphs containing G as a sub-graph, adhering to RDF’s semantics for blank nodes and maybe applying some RDFS and OWL semantics. Even after doing that, few or maybe no RDF documents may contain *all* of G as a subgraph.

Another approach is to decompose G into its constituent triples and for each, use a Swoogle-like system to find documents containing it. But then what? The presence of blank nodes makes it difficult or impossible to assemble the support for G.

We’ve been exploring a third way using the notion of an RDF molecule. We start by computing a lossless decomposition of G into a set of subgraphs M. The decomposition is lossless in that combining the M‘s elements produces the original graph G, even if their blank nodes have been renamed apart. We can then use a Swoogle-like system to search for documents supporting each molecule in M. Find support for all, we have support for G.

We suspect that the RDF molecule concept has other potential uses. For details, see

Tracking RDF Graph Provenance using RDF Molecules, Li Ding, Tim Finin, Yun Peng, Paulo Pinheiro da Silva, and Deborah McGuinness, report TR-CS-05-06, Computer Science and Electrical Engineering, University of Maryland, Baltimore County, April 30, 2005.

The Semantic Web facilitates integrating partial knowledge and finding evidence for hypothesis from web knowledge sources. However, the appropriate level of granularity for tracking provenance of RDF graph remains in debate. RDF document is too coarse since it could contain irrelevant information. RDF triple will fail when two triples share the same blank node. Therefore, this paper investigates lossless decomposition of RDF graph and tracking the provenance of RDF graph using RDF molecule, which is the finest and lossless component of an RDF graph. A sub-graph is lossless if it can be used to restore the original graph without introducing new triples. A sub-graph is finest if it cannot be further decomposed into lossless sub-graphs. The lossless decomposition algorithms and RDF molecule have been formalized and implemented by a prototype RDF graph provenance service in Swoogle project.