Understanding the Logical and Semantic Structure of Large Documents
Muhammad Mahbubur Rahman
11:00am Wednesday, 30 May 2018, ITE 325b
Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports poses challenges not present in short documents. The reasons behind this challenge are that large documents may be multi-themed, complex, noisy and cover diverse topics. This dissertation describes a framework that can analyze large documents, and help people and computer systems locate desired information in them. It aims to automatically identify and classify different sections of documents and understand their purpose within the document. A key contribution of this research is modeling and extracting the logical and semantic structure of electronic documents using deep learning techniques. The effectiveness and robustness of ?the framework is evaluated through extensive experiments on arXiv and requests for proposals datasets.
Committee Members: Drs. Tim Finin (Chair), Anupam Joshi, Tim Oates, Cynthia Matuszek, James Mayfield (JHU)