Wikipedia as an ontology for describing documents
by Zareen Syed
Monday, October 29, 2007, 11:30am - Monday, October 29, 2007, 13:00pm
325b ITE
We are exploring the idea of using Wikipedia's articles and associated pages as a topic ontology. The benefits of this approach are that the terms in the derived ontology are kept current, represent the consensus of a large community, and can be understood by ordinary people by reading the associated Web pages.
We have investigated the use of the text of Wikipedia articles, the category link graph and the article links graph for predicting common concepts related to a set of documents. We describe several heuristics and algorithms that we implemented and evaluated to aggregate and refine results, including the use of a spreading activation approach on the graphs.
The Wikipedia Category graph can be used to predict generalized concepts however, using the article links graph can help in predicting more specific concepts or concepts that do not exist in the category hierarchy. We show through our experiments on Wikipedia that it is possible to predict common concepts that do not exist as Wikipedia categories by utilizing the page links graph. Such predicted concept could in turn be used to define new categories or sub-categories within Wikipedia. The results of our preliminary experiments are encouraging and give us a direction for future research and experimentation along these lines.