Wikipedia infobox template coherence

November 15th, 2009

Wikipedia has an interesting RFC on approaches to achieve and maintain better coherence in its infobox templates. This is significant because Wikipedia is becoming the new CYC — a broad, practical KB filled with general purpose background knowledge. The RFC was kicked off by discussions on dbpedia template annotations. The RFC defines the problem as:

“Wikipedia uses hundreds of infobox templates for describing various entity types like NFL teams, schools in Canada, train stations etc. These infoboxes are separated and do not use a common vocabulary. Several different spellings of attributes are used for them, which all stand for the same meaning (e.g. birth_place, birthPlace, origin). This poses limitations to checking consistency within Wikipedia infoboxes, amongst different language editions, and it makes it hard for external tools to reuse the information in infoboxes.”

The goals mentioned in the RFC include (1) establishing the currently missing links between synonymous template attributes, (2) enabling authors to use template annotations to check for for factual inconsistencies (e.g., outdated population figures), and (3) providing consensus about which properties should be used in templates and what data they should contain.