Generating Linked Data by inferring the semantics of tables
May 6, 2011
PDF Document - Need a reader? Get one here
A vast amount of information is encoded in tables on the web, spreadsheets and databases. Considerable work has been focused on exploiting unstructured free text; however techniques that are effective for documents and free text do not work well with tables. Early work in table interpretation in the field of document analysis and later on the Web, focused mainly on understanding and extracting tables from scanned documents and html web pages. Relatively little work has addressed the understanding and interpretation of the semantics and meaning associated with tables. In this work, we present a framework for understanding and interpreting the “semantics” of tables. The meaning of the table is often encoded in the column headers, the relations implicit between the various columns, the table’s caption, as well the text surrounding the table. Using this evidence, augmented with a background knowledge base such as the Linked Open Data cloud, our framework will map every column header to a class from an appropriate ontology, link the data values to existing entities in the linked data cloud (or map them as values of a property wherever appropriate) and discover and identify relations between various columns. The interpreted semantics will be represented as linked RDF assertions which can be used for further reasoning.
Presented at the CSEE Research Review 2011