On the Semantic Web, universities do ontologies, companies do data

April 24th, 2006

Here’s an interesting figure form Li Ding’s dissertation on Semantic Web Search. It shows the distribution across various Internet top level domains of (1) the sites that Swoogle has crawled, (2) ontology documents that Swoogle has discovered, and (3) all Semantic Web documents it has discovered.

Distribution of Semantic Web files by tld

The “pure SWDs” are RDF documents in some form (e.g., XML, N3) and excluding XHTML documents with embedded RDF. Swoogle considers a Semantic Web document to be an ontology (a SWO in Swoogle-speak) if a significant fraction of its triples are involved in defining terms as opposed to making assertions about individuals. What is considered a “significant fraction” has changed and I’m not sure what the current value is. But Swoogle only considers about 1% of the Semantic Web documents it has found to be ontologies.

Note that .edu sites publish 40% of the ontologies, .org sites 20% and .com sites 10%. Of course, many of those .edu ontologies are probably from student projects of one kind or another. When we look at all Semantic Web documents (pure SWDs), the .com sites dominate, publishing over 40% of the files.