“In that release we’ll go beyond the ability to extract semantic data from your content. We will link that extracted semantic data to datasets from dozens of other information sources, from Wikipedia to Freebase to the CIA World Fact Book. In short – instead of being limited to the contents of the document you’re processing, you’ll be able to develop solutions that leverage a large and rapidly growing information asset: the Linked Data Cloud.”
The new capabilities will be available in release 4 that is expected
out on 09 January 2009.
The change is based on Calais returning de-referenceable URIs for the entities it finds. Accessing those URIs will produce RDF with links to corresponding entities in DBpedia, Freebase and other sources of “Semantic Web” data. It will be very interesting to see how well their system does at mapping document entities (e.g., “secretary Rice”) to entities in the LOD cloud such as http://dbpedia.org/resource/Condoleezza_Rice. Accessing that URI with a request for content type application/rdf+xml returns the RDF at http://dbpedia.org/data/Condoleezza_Rice that has RDF assertions extracted by DBpedia from Wikipedia.