T2LD – An automatic framework for extracting, interpreting and representing tables as linked data

by

Tuesday, June 29, 2010, 9:30am - Tuesday, June 29, 2010, 11:30am

ITE 325 - B

linked data, semantic web

MS Thesis Defense

We present an automatic framework for extracting, interpreting and generating linked data from tables. In the process of representing tables as linked data, we assign every column header a class label from an appropriate ontology, link table cells (if appropriate) to an entity from the Linked Open Data cloud and identify relations between various columns in the table, which helps us to build an overall interpretation of the table. Using the limited evidence provided by a table in the form of table headers and table data in rows and columns, we adopt a novel approach of querying existing knowledge bases such as Wikitology and DBpedia to figure the class labels for table headers. In the process of entity linking, besides querying knowledge bases, we use machine learning algorithms like SVM and SVM-rank which can learn to rank entities within a given set to link a table cell to entity. We further use the class labels, linked entities and information from the knowledge bases to identify relations between columns. We prototyped a system to evaluate our approach against tables obtained from Google Squared, Wikipedia and tables obtained from a dataset which Google shared with us.


Committee Members:
  • Dr. Tim Finin (Chair)
  • Dr. Anupam Joshi
  • Dr. Tim Oates
  • Dr. Evelyne Viegas (Microsoft Research)

Tim Finin

OWL Tweet