Parsing Blonde Speak

September 5th, 2007

[Post by Jesse English and Akshay Java]

Understanding blond speak ain’t that easy! Barney Pell, Powerset CEO, recently put Powerset’s NLP technology to the task. Human language is already quite complicated and any NLP system trying to process unstructured, ungrammatical and noisy text needs to be robust. At UMBC, Dr. Sergei Nirenburg and his team at ILIT have been working on OntoSem, an Ontological Semantics-based NLP system. We have used OntoSem to process news data (SemNews) and export the Text Meaning Representation (TMR) into OWL. You can read more about this system in our recent publication.

We decided to use OntoSem to process Miss Carolina’s response. Here is an excerpt of the TMR it generated. The complete TMR is available here (Miss Carolina’s answer processed by OntoSem).

<concept name=”MODALITY-1095″ type=”MODALITY”>
<attribute type=”textpointer” value=”BELIEVE”/>
<attribute type=”word-num” value=”2″/>
<attribute type=”TYPE” value=”BELIEF”/>
<attribute type=”VALUE” value=”1″/>
<relation type=”SCOPE” target=”LARGE-GEOPOLITICAL-ENTITY-1097″/>
<attribute type=”FROM-SENSE” value=”BELIEVE-V2″/>
<attribute type=”ILLOCUTIONARY-FORCE” value=”IMPERATIVE”/>
<attribute type=”TRANSFORMATION-USED” value=”NP_V_NP 1″/>
<attribute type=”TIME” value=”(FIND-ANCHOR-TIME)”/>
<attribute type=”SAME-SCORE” value=”(MODALITY 0.0050000004 BELIEVE-V5)”/>
<attribute type=”HEAD” value=”YES”/>
<attribute type=”TEXT” value=””/>

<concept name=”HELP-1373″ type=”EVENT”>
<attribute type=”textpointer” value=”HELP”/>
<attribute type=”word-num” value=”65″/>
<relation type=”BENEFICIARY” target=”NATION-1376″/>
<relation type=”THEME” target=”EVENT-1374″/>
<attribute type=”FROM-SENSE” value=”HELP-V1″/>
<attribute type=”TRANSFORMATION-USED” value=”NP_V_NP 3″/>
<attribute type=”TIME” value=”(FIND-ANCHOR-TIME)”/>
<attribute type=”HEAD” value=”YES”/>
<attribute type=”TEXT” value=””/>

An interesting part of the text processing is that of understanding modalities. For example the word “believe” which expresses the speaker’s attitude to what is being said. OntoSem has the capability of processing such complicated linguistic constructs. OntoSem uses a large ontology to support it’s text processing capabilities. Hence, the word “Help“, for example, can be mapped to it’s concept “EVENT” and also a relation which indicates that the beneficiary of the Help event is actually U.S.

So … in Miss Carolina’s words I hope “education here in the U.S. help the U.S. or or“! Till then, I guess we will have to rely on machines to understand blonde speak!