UMBC ebiquity
Semantic Web

From tables to 5 star linked data

December 25th, 2010, by Varish Mulwad, posted in RDF, Semantic Web

The goal and vision of the Semantic Web is to create a Web of connected and interlinked data (items) which can be shared and reused by all. Sharing and opening up “raw data” is great; but the Semantic Web isn’t just about sharing data. To create a Web of data, one needs interlinking between data. In 2006, Sir Tim Berners-Lee introduced the notion of linked data in which he outlined the best practices for creating and sharing data on the Web. To encourage people and government to share data, he recently developed the following rating system -

The highest rating is for the data that can link to other people’s data to provide context. While the Semantic Web has been growing steadily, there is lot of data that is still in raw format. A study by Google researchers shows that there are 154 million tables with high quality relational data on the world wide web. The US government along with 7 other nations have started sharing data publicly. Not all the data is RDF or confers with the best practices of publishing and sharing linked data.

Here in the Ebiquity Research Lab, we have been focusing on converting data in tables and spreadsheets into RDF; but our focus is not on generating just RDF, but rather generate high quality linked data (as now Berners-Lee calls it “5 star data”). Our goal is to build a completely automated framework for interpreting tables and generating linked data from it.

As part of our preliminary research, we have already developed a baseline framework which can link the table column headers to classes from ontologies in the linked data cloud datasets, link the table cells to entities in the linked data cloud and identify relations between table columns and map them to properties in the linked data cloud. You can read papers related to our preliminary research at [1]. We will use this blog as a medium to publish updates in our pursuit of creating “5-star” data for the Semantic Web.

If you are data publisher, go grab some Linked Data star badges at [2]. You can show your support to the open data movement by gettings t-shirts, mugs and bumper stickers from [3]  ! (all profits go to W3C)

Happy Holidays ! Let 2011 be yet another step forward in the open data movement !

[1] – http://ebiquity.umbc.edu/person/html/Varish/Mulwad/?pub=on#pub

[2] – http://lab.linkeddata.deri.ie/2010/lod-badges/

[3] – http://www.cafepress.co.uk/w3c_shop

Provenance Tracking in Science Data Processing Systems

May 28th, 2008, by Tim Finin, posted in Semantic Web

Maybe we should think of data provenance as being like a recipe. Recipes for preparing food are more than just a list of ingredients and specify, often in great detail, how the ingredients are combined, cooked and served and also specify the cooking implements and their settings.

Curt Tilmes presented his PhD dissertation proposal yesterday on “Provenance Tracking in Science Data Processing Systems”. Curt works at at the NASA Goddard Spaceflight Center and is responsible for managing the data processing of earth science climate research data. Curt has some very good ideas about how to capture all of the relevant provenance data for sophisticated scientific data. He’s using, of course, the Semantic Web languages (RDF and OWL) to express and share the provenance data.

Part of the problem is that you have to capture not just the inputs to a dataset, but how the inputs were processed to produce the dataset, including (ideally) the algorithms, software and hardware. As an easily grasped example to illustrate this, he referred to a recent post by Ray Pierre on the RealClimate blog, How to cook a graph in three easy lessons. This post demonstrates how Roy Spencer processes inputs from two common climate datasets (the Southern Oscillation and Pacific Decadal Oscillation indexes) to get the results that support the conclusion that global warming is due to natural causes and not human activity.

Faviki uses Wikipedia and DBpedia for semantic tagging

May 26th, 2008, by Tim Finin, posted in AI, Semantic Web, Social media

Faviki is a new social bookmarking system that uses Wikipedia articles for tags. It actually uses URLS in the DBpedia namespace that correspond to Wikipedia pages. The immediate benefits of this approach are several:

  • Users select tags from a large, common tag space. The ‘meaning’ of each tag ca be understood by reading the associated Wikipedia page. This makes it more likely that resources that share a tag, even if assigned by different people, are actually related.
  • Since the universe of tags is derived from Wikipedia, it is generated, kept current and maintained by a large and diverse set of people.
  • The tags have structured information associated with them and are part of broader-than, narrower-than lattice. It is not clear to me how much reasoning Faviki does with the linked data or when. But there is clearly a lot of potential here.
  • There is an opportunity to make the tagging system multi-lingual, since Wikipedia has articles in multiple languages and supports a way to link equivalent articles expressed in different languages.

The downside, of course, is that you lose the freedom and ease of most open tagging approaches — using the words and phrases that come immediately to mind.

The Faviki system is related to our own Wikitology project, which is exploring the use of using Wikipedia terms as an ontology, and also to Harry Chen’s Gnizer tagging system, which is an RDF-based social tagging system. Our current Wikitology work is focused on mapping text and entities from text into a set of terms derived from Wikipedia and salted with additional data from Dbpedia and Freebase.

One interesting research question is whether it’s possible to combine the ease of using user-generated tags with the power of mapping them into tags in a structured or semi-structured knowledge base.

Deriving knowledge bases from Wikipedia and using them in innovative is a very exciting topic that is sure to receive a lot of work in the coming years.

(spotted on ReadWriteWeb)

Int. Semantic Web Conf. workshop details

May 23rd, 2008, by Tim Finin, posted in iswc, Semantic Web

The 7th International Semantic Web Conference (ISWC) has an exciting program of thirteen one-day workshops that will be held on October 26 and 27. The deadlines for submitting papers vary. See the individual workshop pages for detailed information on their scope and structure and for information on submitting papers and participating.

The final scheduling of the workshops, assigning them to the 26th or 27th, has not yet been done.

PhD proposal: Context and Policies in Declarative Networked Systems

May 19th, 2008, by Tim Finin, posted in Semantic Web

UMBC PhD student Palanivel Kodeswaran will present his dissertation proposal on Use of Context and Policies in Declarative Networked Systems at 3:30 on Tuesday May 20 in ITE 325. Dissertation proposals are public and visitors are welcome. If you are a PhD student and are (or should be!) working on your own proposal, going to these is a good way to prepare. You can see what’s involved, what work and doesn’t and what kind of questions you can expect. See the link above for the full abstract, but here is a teaser.

“In this thesis, we propose to build a declarative framework that can reason over the requirements of applications, the current network context, operator policies, and appropriately configure the network to provide better network support for applications. … In particular, the contributions of this thesis are (i) Developing a framework for using context and policies in declarative networked systems (ii) Runtime adaptation of network configuration based on application requirements and node/operator policy (iii) Formalize cross layer interactions as opposed to ad hoc optimizations (iv) Simulation and test bed implementations to validate and evaluate proposed approach.”