Using Information Extraction to Automatically Generate Probabilistic Ontologies
by Tom Briggs
Tuesday, April 25, 2006, 12:00pm - Tuesday, April 25, 2006, 14:00pm
325b
Publishing data for the Semantic Web is a time consuming process requiring individuals who posses both domain specific knowledge and expertise with Description Logic languages. This is becoming the single greatest challenge to future development of the Semantic Web. There is a strong need for an autonomous agent that is capable of interpreting the vast amount of loosely organized data currently available on the web and in databases into a formal ontological representation. Recently, there have been several key innovations in the fields of Text Mining, Information Extraction, and Concept Learning which led to increased accuracy of these methods.
Previous approaches towards ontology generation using information extraction techniques rely on crisp ontology languages. However, uncertainty, generally the result of noise in the inputs, pervades the process from beginning to end, and is a challenge to crisp DL's. BayesOWL is a probabilistic ontology language which allows assertion of concepts and role relations with a degree of belief in the assertion. We propose that a framework can be developed that will automatically create taxonomic ontologies from an existing corpus of relevant documents using techniques from Information Extraction and Text Mining to extract concepts from these documents. Relevant concepts can be placed in a hierarchy using a semantic dictionary (such as WordNet), and a final BayesOWL ontology can be marked up using probabilities derived from frequency counts observed in the corpus.