From tables to 5 star linked data

December 25th, 2010

The goal and vision of the Semantic Web is to create a Web of connected and interlinked data (items) which can be shared and reused by all. Sharing and opening up “raw data” is great; but the Semantic Web isn’t just about sharing data. To create a Web of data, one needs interlinking between data. In 2006, Sir Tim Berners-Lee introduced the notion of linked data in which he outlined the best practices for creating and sharing data on the Web. To encourage people and government to share data, he recently developed the following rating system –

The highest rating is for the data that can link to other people’s data to provide context. While the Semantic Web has been growing steadily, there is lot of data that is still in raw format. A study by Google researchers shows that there are 154 million tables with high quality relational data on the world wide web. The US government along with 7 other nations have started sharing data publicly. Not all the data is RDF or confers with the best practices of publishing and sharing linked data.

Here in the Ebiquity Research Lab, we have been focusing on converting data in tables and spreadsheets into RDF; but our focus is not on generating just RDF, but rather generate high quality linked data (as now Berners-Lee calls it “5 star data”). Our goal is to build a completely automated framework for interpreting tables and generating linked data from it.

As part of our preliminary research, we have already developed a baseline framework which can link the table column headers to classes from ontologies in the linked data cloud datasets, link the table cells to entities in the linked data cloud and identify relations between table columns and map them to properties in the linked data cloud. You can read papers related to our preliminary research at [1]. We will use this blog as a medium to publish updates in our pursuit of creating “5-star” data for the Semantic Web.

If you are data publisher, go grab some Linked Data star badges at [2]. You can show your support to the open data movement by gettings t-shirts, mugs and bumper stickers from [3]  ! (all profits go to W3C)

Happy Holidays ! Let 2011 be yet another step forward in the open data movement !

Parallax: a better interface for Freebase

August 14th, 2008

David Huynh completed his PhD at MIT CSAIL last year and joined MetaWeb a few months ago, where he has been working on new and better interfaces to explore the data encoded in their Freebase system. He recently released Parallax as a prototype browsing interface for Freebase. Here is a video that shows the interface in action.

Freebase Parallax: A new way to browse and explore data from David Huynh on Vimeo.

Freebase is “an open database of the world’s information” that is constructed by a Wiki-like collaborative community. In many ways it is like the Semantic Web model, with two big differences: (1) the data is stored centrally rather than distributed across the Web and (2) the representation system is not based on RDF but rather uses a custom built object-oriented data representation language.

Freebase is a great resource. Much of the data is extracted from Wikipedia, so its content has a large overlap with DBpedia. But it is also relatively easy to upload additional information in various structured forms and many have done so, resulting in an extended coverage.

This is clearly a system in the Web of Data space along with the Linking Open Data effort and having it should offer a way for us all to explore the consequences of some of the underlying design decisions.

rdf:about is a concise collection of RDF resources

June 7th, 2008

Joshua Tauberer, a Upenn Linguistics graduate student, maintains rdf:about as a resouce of information on the semantic web language RDF. Its a consise collection of information that manages not to overwhelm and includes good Quick Intro and RDF in Depth pages.

