UMBC ebiquity
IBM’s UIMA and RDF — extracting knowledge from text

IBM’s UIMA and RDF — extracting knowledge from text

Tim Finin, 10:25am 9 August 2005

IBM is getting notice for it’s UIMA project. See recent articles from Reuters, CCN and ZDNET.

UIMA stands for Unstructured Information Analysis Architecture which is described as “An Open, Industrial-Strength Platform for Unstructured Information Analysis and Search”. I’d characterize it as a framework for integrating various natural language processing components in a way that supports search and other applications.

One of UIMA’s features, in fact the glue that holds it all together, is the Common Analysis System (CAS) is the subsystem that handles data exchanges between the different components and unstructured information management (UIM) applications. Alas, CAS’s common representation scheme is not based on RDF. It might be compatible, though, both as a KR framework and at a technology level since CAS has an XML serialization. We’ll have to look into this — if anyone has an opinion on this, please comment. UIMA certainly could be a useful framework for systems that add RDF annotations to text.

UIMA is a project that has been going on for some years at IBM — Alfred Spector spoke about it at WWW2002 and see this CNN article from Jnaury 2003 — so maybe it’s not surprising that it has developed its own knowledge representation language. But I think it could only strengthen the ambitious project to use a real standard like RDF. The reasons are many — it has a sound semantics, a significant community of researchers and developers are helping it to evolve, a large and growing amount of information is already published in RDF, and some key companies have committed to using it for metadata.

The UIMA Software Development Kit is currently available on from IBM’s alphaWorks Site and IBM has announced that it will make the UIMA core framework open-source. The UMIA SDK is a Java implementation of the framework and comes with an Eclipse-based development environment that includes a set of tools and utilities for using UIMA.

A good introduction to UIMA is

T. Götz and O. Suhre , Design and implementation of the UIMA Common Analysis System, IBM Systems Journal, Volume 43, Number 3, 2004.

That same issue has several other articles on UIMA.

Related posts:

  1. Extracting Wikipedia infobox values from text
  2. SKOS: Simple Knowledge Organization System
  3. Open Government Knowledge: AI Opportunities and Challenges (OGK2011)
  4. Reuters Calais: free text to Semantic Web services
  5. HealthMap mines text for a global disease alert map

3 Responses to “IBM’s UIMA and RDF — extracting knowledge from text”

  1. Elias Torres Says:

    Tim,

    You mention two things in this post, one is whether CAS should use RDF internally as opposed to whatever it uses today and the other is whether it could consume/produce RDF data. I’ll try to get some answers for you on the subject.

  2. david ferrucci Says:

    The CAS is not RDF, however, RDF can be generated from CASes. There are utlities under development that map from RDF into UIMA’s internal representations.

    On this subject see “Component Services for UIMA Knowledge Integration (SUKI)” off the http://www.research.ibm/UIMA. This discusses a project that address the use of UIMA to transform the results of UIMA analysis into formal knowledge representations.

  3. Amit Kumar Says:

    I’ve just completed my graduation in Computer Science. I’ve been alloted a project i.e to design a semantic search engine using one of the two frameworks i.e RDF or UIMA. I would like to know which one is a better option and how much resources are available on the web regarding the same. I have about two months to submit the project with little or no external help. Please let me know at thecoffeeshop@rediffmail.com. Thank You.