folksonomy is the new black

December 26th, 2004

Interest in ontologies has gone down and up over the past 20 years and its been very strong in the last five years. Designing a good ontology for a complex real world topic is hard and made especially so by the usual goal that it be relatively independent of any small set of driving tasks. There are so many ways you can go wrong — too simple, too complex, too philosophical, to pragmatic, non-extensible, too big, too small, too brittle, too loose. And how do you evaluate the one you come up with? Sometimes it seems that ontological engineering requires graduate level training in way too many advanced topics — knowledge representation, logic, databases, philosophy.

While the semantic web movement hasn’t changed any of these problems, it has opened up new avenues by making this a problem by and for the web — an open, distributed, heterogeneous environment in which people and software agents create, publish, search for, combine, exchange and use information.

One interesting phenomenon is a number of sites which are using what some call folksonomies — informal tagging systems developed bottom up by their users. Examples of sites that use folksonomies include flickr, furl, and Google’s gmail. As a way to build an ontology, you can’t get much simpler that this — the tags form a flat one-level taxonomy of classes. You can attach a set of tags to an object (URL, picture or email message) and find objects indexed by a set of tags. What you can’t do are things like (i) define relations between tags (e.g., declare that rdf is a subtag of semanticWeb or that NYC and newYork are equivalent); (ii) form combinations of tags other than intersection (e.g., find pictures tagged as domesticatedAnimals OR pets but NOT cats); and (iii) define and use properties (e.g., this picture depicts an animal whose owner is a person with lastname=”finin”).

This is not a great leap forward for classification theory and the basic approach is quite common (e.g., see the use of faceted classifications in library science or polyclave classification systems in Biology), but what is interesting is letting a community of people develop and share folksonomies in a natural way with the hope that consensus vocabularies will naturally emerge.

Flickr, furl and allow you to make public your tags and tagged objects and to search over those of others, introducing an interesting social dimension. In the natural course of things, users will tend to converge around a set of tags to denote a common shared concept. This is accelerated by the fact that, for and furl, users are tagging objects from a common universe of URLs. Simple statistical techniques can reveal tags that are related or similar in that they’ve been used by different people to classify a common object. If shared consensus sets of tags do emerge in these communities as they grow, it will be significant.

Can we extend the expressive power of these systems, say by using RDF and introducing some of the features of RDFS and (even) OWL, resulting in folksologies. It’s a good question. We can do it, of course, but will the result be as easy for people to learn and use? That’s an even better question.