Faceted search for DBLP bibliographic data

June 29th, 2007

DBLPMichael Ley started DBLP in 1993 as an experiment in providing Web access to bibliographic information on database systems and logic programming using the then new Web infrastructure. Over the past 14 years it has grown to be an important resource for Computer Science with high quality information on more than 900K articles from selected journals and conferences across the discipline. To reflect the broader scope, its acronym is now taken to stand for Digital Bibliography and Library Project. (Btw, it has always seemed ironic to me that the DBLP data is not stored in a database system, but rather in a large collection of files glued together by a set of scripts — sort of like the Universe.).

DBLP is also been a great dataset for many research projects since its information can be freely downloaded as an XML document (a big one!). For example, our research group has used it for data mining, visualization, social networking, and some semantic web work.

The L3S Research Center at University of Hannover has a new DBLP faceted search service that lets users do keyword searches over all of the metadata and also supports more elaborate navigational access to the collection. Searchers can restrict queries by topic, where paper topic keywords are automatically generated using “higher-order co-occurrences of author keywords”. The system uses a novel Semantic GrowBag algorithm that uses the “structure of the semantic network induced by the usage of keywords over the document corpus”.

This looks quite useful.