UMBC ebiquity
Semantic Web terms: defined and used

Semantic Web terms: defined and used

Tim Finin, 1:00pm 17 July 2006

The 1.6M Semantic Web documents that Swoogle has discovered on the Web include about 10,000 ‘ontologies’ that define one or more terms. These Semantic Web ontologies define a total of 1,576,927 named terms — RDF classes or properties. Most of these have never been directly used to encode data. We consider a class to be directly used if it has at least one immediate instance and a property to have been directly used (or populated) if it has been used in a triple to assert a value for an RDF instance. We consider a term to be directly defined if it is the subject of a triple that asserts definitional properties, such a subclass for a class or range for a property.

Analyzing Swoogle’s metadata on terms shows some interesting things. First, there are more than a few terms that appear on the Web as both a class and a property. Second, we can look at the distribution of terms across the four categories based on whether or not they’ve been defined and used. We can summarize the results in the following table.

Based on data from
Swoogle on 7/16/06
properties
-defined
-used
-defined
+used
+defined
+used
+defined
-used

c
l
a
s
s
e
s

-defined
-used
109,690
26,652
160,671
132,112
-defined
+used
9,684
292
99
17
+defined
+used
24,299
58
20
18
+defined
-used
1,257,563
33
87
236

Here are some categories we can identify:

  • The green cells are what we might consider ideal, including classes and properties that are both defined and have been directly used to encode data.
  • The pink cells are mostly errors: terms that have been defined/used as both a class and a property.
  • The yellow cells are terms that have been defined but not directly used to encode any data. The majority are terms that we believe were intended to be used to describe instances and data, but never were, for one reason or another. Many ontologies have been created but never really used. Some ontologies have been extensively used to describe data, but not all of the terms have turned out to be useful. WordNet, may deserve special mention. Several of the encodings represent each lexical entry as a class and most have not been used to create instances.
  • The blue cells are terms have not been explicitly defined but have been used. Note that this is much more common for properties than for classes. It’s common to attach a class somewhere in some taxonomy, but many people will invent and use new properties without providing any definitional features (e.g., domain or range).
  • The gray cell represents terms that have neither been defined nor used to encode data. While this sounds strange and may reflect problematic terms, the group includes some ordinary terms that represent XML datatypes. These are often used as the value of a domain or range assertion and show up in Swoogle as terms.

Comments are closed.