SemNews: news text to Semantic Web

January 12, 2006

SemNews Understands the News

Prototype UMBC system interprets online news stories
and publishes text meaning on the Semantic Web

SemNews is a prototype application being developed by UMBC Ph.D. student Akshay Java that uses a sophisticated text understanding system to interpret summaries of news stories, publishes the results on the semantic web and provides browsing and query services over them. The project is the result of a collaboration between the UMBC's Institute for Language and Information Technologies and Ebiquity Laboratory with partial support from the Lockheed Martin Corporation.

SemNews monitors a number of news source RSS feeds and processes new stories as they are published. After extracting a story's metadata, its news summary is interpreted by the OntoSem text analyzer which does a syntactic, semantic, and pragmatic analysis of the text, resulting in its text meaning representation or TMR. A TMR is a language-neutral description (an interlingua) of the meaning conveyed in a natural language text. In addition to providing information about the lexical-semantic dependencies in the text, the TMR represents stylistic factors, discourse relations, speaker attitudes, and other pragmatic factors present in the discourse structure. In doing so, the TMR captures not only the meaning of individual elements in the text, but also the relations between those elements, and captures both propositional and non-propositional components of textual meaning. OntoSem's TMRs are represented in a custom frame-based representation language and grounded in the Mikrokosmos ontology, an extensive ontology with over 30K concepts and nearly 400K entities.

Each story's metadata and TMR are translated into the Semantic Web language OWL via the OntoSem2OWL translator developed for this project. The results are then added to a special collection indexed by the Swoogle search engine and also put into a RDF triple store. These are used to support several services enabling people and agents to semantically browse, query and visualize the stories in the collection, enabling access to information that would otherwise not be easy to find using simple keyword based search.

For example, one can browse through the story collection via the ontology to find stories that involve certain concepts, such as a terrorist organization; find all stories that involve an entities in OntoSem's onomasticon, such as al qaeda or Karbala; visualize the stories on a map based on the locations they reference; or construct an arbitrary query, such as finding "stories in which the nation named Afghanistan was the location of a bombing event." Users can also define semantic "alerts" as queries over the RDF triple store and/or the Swoogle collection. For each alert, SemNews will generate an RSS feed of the results.

The SemNews system is currently a research prototype that is being used to refine the underlying technologies and to explore how the sophisticated automatic linguistic processing of text can be integrated into the Semantic Web and conventional web applications. Ongoing work on SemNews includes an evaluation of its semantic recall and precision as well as a service that can group and cluster stories based on their semantic representations.

For more information

For more information, please contact UMBC ebiquity.