Streaming Knowledge Bases
August 31, 2007
A knowledge base can be thought of as a special kind of database for knowledge management. It provides the means for computerized collection, organization and retrieval of knowledge. Due to growth in deployment of sensors, we encounter many scenarios where data is constantly flowing between sensors and applications. The volume of data produced is large, so is the rate of the data-flow. In such scenarios, knowledge extraction boils down to finding useful information i.e. detecting events of interest. Typical use cases where event detection is of paramount importance are surveillance, tracking, telecommunications data management, disease outburst detection and environmental monitoring. There are many streaming database applications built to deal with these dynamic environments. Some examples of query processors based on adaptive data-flow are TelegraphCQ and the Aurora project.
With the emergence of Semantic Web, we now have a universal medium for data, information and knowledge exchange. RDF graphs are used to denote relation and interaction between different entities or resources. Some popular and uniform data interchange formats are developed to support RDF graphs. Knowledge extraction in Semantic Web is carrying out inferencing on such RDF graphs. Existing tools like JENA, Sesame are used for this task.
As Semantic Web continues to grow, more and more data will be expressed in uniform formats recommended by Semantic Web, such as RDF/XML or n-triples. In a pervasive environment, performing reasoning on this streaming data becomes a challenging task. Existing reasoners use techniques that load the whole RDF graph in main memory and carry out queries on it. This approach is of little use in real-time reasoning for streaming scenarios and takes considerable amount of time.
We combine the continuous query processors with Semantic Web techniques to build an "rdfs:subClassOf" reasoner that can deal with streaming data. Given an ontology, we pre-compute the transitive closure of all classes on "rdfs:subClassOf" relationship and store the class-subclass relationships in a database table. At run-time we just need to query the database to identify subclass events of the event of concern. There are already many applications which describe data in RDF compatible formats. We feed streams of such RDF data to our query processor and carry out real-time rdfs:subClassOf reasoning on them.
University of Maryland, Baltimore County
Department of Computer Science and Electrical Engineering
Google Scholar Citations: 4 citations