Knowledge Graph-driven Tabular Data Discovery from Scientific Documents

Vijay S. Kumar; Varish Mulwad; Jenny Weisenberg Williams; Tim Finin; Sharad Dixit; Anupam Joshi

VLDB 2023 Workshop on Tabular Data Analysis

Knowledge Graph-driven Tabular Data Discovery from Scientific Documents

Vijay S. Kumar, Varish Mulwad, Jenny Weisenberg Williams, Tim Finin, Sharad Dixit, and Anupam Joshi

September 1, 2023

Synthesizing information from collections of tables embedded within scientific and technical documents is increasingly critical to emerging knowledge-driven applications. Given their structural heterogeneity, highly domain-specific content, and diffuse context, inferring a precise semantic understanding of such tables is traditionally better accomplished through linking tabular content to concepts and entities in reference knowledge graphs. However, existing tabular data discovery systems are not designed to adequately exploit these explicit, human-interpretable semantic linkages. Moreover, given the prevalence of misinformation, the level of confidence in the reliability of tabular information has become an important, often overlooked, factor in the discovery of open datasets. We describe a preliminary implementation of a discovery engine that enables table-based semantic search and retrieval of tabular information from a linked knowledge graph of scientific tables. We discuss the viability of semantics-guided tabular data analysis operations, including on-the-fly table generation under reliability constraints, within discovery scenarios motivated by intelligence production from documents.

See paper, slides, poster, and presentation video.