Rise of the machine tags

January 28th, 2007

Machine tags. It’s a bit awkward, but I love the name.

One problem with the name Semantic Web is that it hides the fact that RDF is content generated by machines and for machines. Or at least it should be. Ultimately, of course, most of us don’t want to know what’s going on under the hood if we don’t have to. But for now, those of us trying to work with the ideas and potential applications can’t avoid it.

Last week Flickr announced their new machine tag feature that supports a style of tagging moves much closer to RDF. The idea is to take a tag like

    dc:title=”Computing Machinery and Intelligence”

and parse it as

    <namespace> : <predicate> = <value>

This syntax was already recognized for at least two special cases: geo:lat=… and geo:lon=… tags were recognized by Flickr’s mapping components and upcoming:event=81334 was recognized as a reference to an event registered on upcoming.org. Now Flickr has updated its systems to recognize any tag with form symbol:symbol=value and has changed its internal databases to record the separate components. What’s more, Flickr has extended its API so that you can query on machine tags with wildcards for any of its three elements.

Flickr’s Dan Catt calls it Not Quite RDF (NQRDF) in an interesting post. Among the things that are missing, of course, is a way to map the “prefix” like DC: into a full URI, the thing that is normally done in an RDF header

    <rdf :RDF
    … >

Here’s where Swoogle can help. When we were looking at popular RDF namespaces we noticed that there was not much ambiguity in mapping between prefixes and namespaces, except for some pathological cases such as namespace prefixes like n1, n2, …

Take the MusicBrainz ontology, for example. Swoogle knows just over twelve thousand documents that use it, either version 2.0 or 2.1. Of these, all but one use the prefix mm:. The sole oddball chose to use mbmeta: as a prefix for the MusicBrainz ontology. Of the documents that declare the prefix mm: for a namespace, every one uses it to refer to a version of MuzicBrainz.

    documents declaring a mm prefix

    documents URI target
    9340 http://musicbrainz.org/mm/mm-2.1#
    2670 http://musicbrainz.org/mm/mm-2.0#
    2 http://musicbrainz.org/mm#

What ambiguity remains could be further reduced given a combination of a prefix and a predicate. So, if people started using mm: to refer to other vocabularies (e.g., the mickeyMouse vocabulary) chances are good that we could distinguish the desired predicate given the prefix and the predicate’s local name. For example, of the ontologies ever declared using mm:, only MusicBrainz has a releaseStatus predicate and only MickeyMouse has an appearedIn predicate. So these can be easily disambiguated:

  • mm:releaseStatus=mm:Remix
  • mm:appearedIn=”How to Play Baseball”

We’ve been working on a scheme to make it easy for people to tag photos of organisms with their scientific names and have those tags map to our Ethan ontology. I think that we’ll have to take another look to see how machine tags might help.