TALK: Real-time knowledge extraction from short semi-structured documents

November 3rd, 2019

A semantically rich framework to enable real-time knowledge extraction from short length semi-structured documents

Lavana Elluri

10:30-11:30 Monday, 4 November 2019, ITE346

Knowledge is currently maintained as a large volume of unstructured text data in books, laws, regulations and policies, news and social media, academic and scientific reports, conversation and correspondence, etc. Most of these text documents are not often machine-processable. Hence it is hard to find relevant information from these texts quickly. Extracting and categorizing knowledge from the text of these numerous text stores requires significant manual effort and time. A critical open challenge that we propose to address is automated incremental text classification and identifying context from small documents. Our aim is to develop a semantically rich framework, including algorithms that will extract and classify the context of the text in real-time, to help enable users that update their policies regularly and organizations that are submitting proposals. We will use techniques from deep learning, semantic web, and natural language processing to build this framework. Our objectives include representing knowledge in cloud compliance / legal texts to create and populate a knowledge graph based on data protection regulations. Additionally, we will also correlate rules implemented in the referencing document with the rules in original policies to determine context similarity.