Generating Linked Data by inferring the semantics of tables
Wednesday, May 25, 2011, 9:30am - Wednesday, May 25, 2011, 11:30am
ITE 325b
Ph.D. Preliminary Examination
A vast amount of information is encoded in tables on the web, spreadsheets and databases. Considerable work has been focused on exploiting unstructured free text; however techniques that are effective for documents and free text do not work well with tables. In this research we present techniques to generate high quality linked data from tables by jointly inferring the semantics of column headers, table cell values (e.g., strings and numbers), relations between columns, augmented with background knowledge from open data sources such as the Linked Open Data cloud. We represent a table's meaning by mapping columns to classes from an appropriate ontology, linking cell values to literal constants or entities in the linked data cloud (existing or new) and discovering or and identifying relations between columns. The interpreted meaning is represented as linked RDF assertions. An initial evaluation of our preliminary baseline system demonstrate the feasibility of tackling the problem. Based on this work and its evaluation, we are further developing our framework grounded in the theory of graphical models and probabilistic reasoning.
Committee members:
- Dr. Tim Finin (chair)
- Dr. Anupam Joshi
- Dr. Tim Oates
- Dr. Yun Peng
- Dr. L V Subramaniam (IBM Research India)
- Dr. Indrajit Bhattacharya (Indian Institute of Science)