Microsoft Word add-in annotates text with ontology terms

March 16th, 2009

Microsoft has announced an add-in for Word 2007 that lets authors annotate a word or phrase with terms defined in external ontologies.

Addressing this critical challenge for researchers, Microsoft Corp. and Creative Commons announced today, before an industry panel at the O’Reilly Emerging Technology Conference (ETech 2009), the release of the Ontology Add-in for Microsoft Office Word 2007 that will enable authors to easily add scientific hyperlinks as semantic annotations, drawn from ontologies, to their documents and research papers. Ontologies are shared vocabularies created and maintained by different academic domains to model their fields of study. This Add-in will make it easier for scientists to link their documents to the Web in a meaningful way. Deployed on a wide scale, ontology-enabled scientific publishing will provide a Web boost to scientific discovery.

The add-in is available for download from codeplex, Microsoft’s open source project hosting website. Its has support for a number of features, including syntax coloring of informative words, automatic detection of identifiers, and built-in access to ontologies and controlled vocabularies maintained by NCBO as well as biological databases such as Protein Data Bank, UniProtKB, and NCBI GenBank/RefSeq.

The add-in was produced by the UCSD BioLit group, hence the initial connections to bioinformatics ontologies. It would be great if future versions would have builtin awareness of the more popular linked data vocabularies.

The annotation is done using a custom XML schema which can be extracted and mapped to RDF. This example, from the codeplex site, shows the word “disease” being tagged with Human Disease ontology.

<w:customXml w:uri=""
        <w:attr w:name="id" w:val="DOID:4" /> 
        <w:attr w:name="type" w:val="Human disease" /> 
        <w:attr w:name="status" w:val="true" /> 
        <w:attr w:name="OntName" w:val="Human disease" /> 
        <w:attr w:name="url" 
          w:val="" /> 
    <w:smartTag w:uri="BioLitTags" w:element="tag1">

It’s not pretty and more verbose than RDFa, but gets the job done. There are many interesting add-ins for Microsoft Office components but most seem to be available for Office 2007 but not the Mac version, Office 2008. 🙁

(h/t Frank van Harmelen)