Wikipedia infobox template coherence

November 15th, 2009

Wikipedia has an interesting RFC on approaches to achieve and maintain better coherence in its infobox templates. This is significant because Wikipedia is becoming the new CYC — a broad, practical KB filled with general purpose background knowledge. The RFC was kicked off by discussions on dbpedia template annotations. The RFC defines the problem as:

“Wikipedia uses hundreds of infobox templates for describing various entity types like NFL teams, schools in Canada, train stations etc. These infoboxes are separated and do not use a common vocabulary. Several different spellings of attributes are used for them, which all stand for the same meaning (e.g. birth_place, birthPlace, origin). This poses limitations to checking consistency within Wikipedia infoboxes, amongst different language editions, and it makes it hard for external tools to reuse the information in infoboxes.”

The goals mentioned in the RFC include (1) establishing the currently missing links between synonymous template attributes, (2) enabling authors to use template annotations to check for for factual inconsistencies (e.g., outdated population figures), and (3) providing consensus about which properties should be used in templates and what data they should contain.


Faviki uses Wikipedia and DBpedia for semantic tagging

May 26th, 2008

Faviki is a new social bookmarking system that uses Wikipedia articles for tags. It actually uses URLS in the DBpedia namespace that correspond to Wikipedia pages. The immediate benefits of this approach are several:

  • Users select tags from a large, common tag space. The ‘meaning’ of each tag ca be understood by reading the associated Wikipedia page. This makes it more likely that resources that share a tag, even if assigned by different people, are actually related.
  • Since the universe of tags is derived from Wikipedia, it is generated, kept current and maintained by a large and diverse set of people.
  • The tags have structured information associated with them and are part of broader-than, narrower-than lattice. It is not clear to me how much reasoning Faviki does with the linked data or when. But there is clearly a lot of potential here.
  • There is an opportunity to make the tagging system multi-lingual, since Wikipedia has articles in multiple languages and supports a way to link equivalent articles expressed in different languages.

The downside, of course, is that you lose the freedom and ease of most open tagging approaches — using the words and phrases that come immediately to mind.

The Faviki system is related to our own Wikitology project, which is exploring the use of using Wikipedia terms as an ontology, and also to Harry Chen’s Gnizer tagging system, which is an RDF-based social tagging system. Our current Wikitology work is focused on mapping text and entities from text into a set of terms derived from Wikipedia and salted with additional data from Dbpedia and Freebase.

One interesting research question is whether it’s possible to combine the ease of using user-generated tags with the power of mapping them into tags in a structured or semi-structured knowledge base.

Deriving knowledge bases from Wikipedia and using them in innovative is a very exciting topic that is sure to receive a lot of work in the coming years.

(spotted on ReadWriteWeb)