IBM’s UIMA and RDF — extracting knowledge from text

August 9th, 2005

IBM is getting notice for it’s UIMA project. See recent articles from Reuters, CCN and ZDNET.

UIMA stands for Unstructured Information Analysis Architecture which is described as “An Open, Industrial-Strength Platform for Unstructured Information Analysis and Search”. I’d characterize it as a framework for integrating various natural language processing components in a way that supports search and other applications.

One of UIMA’s features, in fact the glue that holds it all together, is the Common Analysis System (CAS) is the subsystem that handles data exchanges between the different components and unstructured information management (UIM) applications. Alas, CAS’s common representation scheme is not based on RDF. It might be compatible, though, both as a KR framework and at a technology level since CAS has an XML serialization. We’ll have to look into this — if anyone has an opinion on this, please comment. UIMA certainly could be a useful framework for systems that add RDF annotations to text.

UIMA is a project that has been going on for some years at IBM — Alfred Spector spoke about it at WWW2002 and see this CNN article from Jnaury 2003 — so maybe it’s not surprising that it has developed its own knowledge representation language. But I think it could only strengthen the ambitious project to use a real standard like RDF. The reasons are many — it has a sound semantics, a significant community of researchers and developers are helping it to evolve, a large and growing amount of information is already published in RDF, and some key companies have committed to using it for metadata.

The UIMA Software Development Kit is currently available on from IBM’s alphaWorks Site and IBM has announced that it will make the UIMA core framework open-source. The UMIA SDK is a Java implementation of the framework and comes with an Eclipse-based development environment that includes a set of tools and utilities for using UIMA.

A good introduction to UIMA is

T. Götz and O. Suhre , Design and implementation of the UIMA Common Analysis System, IBM Systems Journal, Volume 43, Number 3, 2004.

That same issue has several other articles on UIMA.