PowerRelations: A Question Answering System for DBPedia
by Lushan Han
Tuesday, April 26, 2011, 11:00am - Tuesday, April 26, 2011, 12:15pm
ITE 325 - B
Large amounts of structured and semi-structured semantic data are available on the Web. A well-known example is DBpedia, which extracts data from Wikipedia, encodes it in the Semantic Web language RDF, and stores it in a triplestore. Although a formal query language, SPARQL, is available for accessing such data, it remains challenging for users to query the knowledge unless they are familiar with SPARQL and the particular ontologies used. We have developed an intuitive system for users to express queries by describing entities and relations using natural language terms in a simple graphical interface. Our system automatically translates the user's intuitive description into a corresponding SPARQL query that produces an answer. Our key contribution is the robust techniques mapping user terms in a variety of expressions to the most appropriate concepts and relations used in DBpedia, even though its ontologies are diverse and relatively informal due to the nature of Wikipedia and the noisy information extraction process. Our approach combines a statistical analysis of the DBpedia knowledge base that includes correlations of concepts and relations and lexical semantic similarity metrics learned from WordNet and a large text corpus. We disambiguate user input terms by exploring all possible interpretations and selecting the best one based on correlation and similarity. To improve recall, we further harvest properties similar to best choices taking into account the context just disambiguated. Initial experiments show that the system works very well on a collection of test questions from the 2011 Workshop on Question-Answering over Linked Data.