UMBC ebiquity
Ontologies

Archive for the 'Ontologies' Category

Entity Disambiguation in Google Auto-complete

September 23rd, 2012, by Varish Mulwad, posted in AI, Google, KR, Ontologies, Semantic Web

Google has added an “entity disambiguation” feature along with auto-complete when you type in your search query. For example, when I search for George Bush, I get the following additional information in auto-complete.

As you can see, Google is able to identify that there are two George Bushes’ — the 41st and the 43rd President and accordingly makes a suggestion to the user to select the appropriate president. Similarly, if you search for Johns Hopkins, you get suggestions for John Hopkins – the University, the Entrepreneur and the Hospital.  In the case of the Hopkins query, its the same entity name but with different types and thus Google appends different entity types along with the entity name.

However, searching for Michael Jordan produces no entity disambiguation. If you are looking for Michael Jordan, the UC Berkeley professor, you will have to search for “Michael I Jordan“. Other examples that Google is not handling right now include queries such as apple — {fruit, company}, jaguar {animal, car}.  It seems to me that Google is only including disambiguation between popular entities in its auto-complete. While there are six different George Bushes’ and ten different Michael Jordans‘ on Wikipedia, Google includes only two and none respectively when it disambiguates George Bush and Michael Jordan.

Google talked about using its knowledge graph to produce this information.  One can envision the knowledge graph maintaining, a unique identity for each entity in its collection, which will allow it to disambiguate entities with similar names (in the Semantic Web world, we call it as assigning a unique uri to each unique thing or entity). With the Hopkins query, we can also see that the knowledge graph is maintaining entity type information along with each entity (e.g. Person, City, University, Sports Team etc).  While folks at Google have tried to steer clear of the Semantic Web, one can draw parallels between the underlying principles on the Semantic Web and the ones used in constructing the Google knowledge graph.

Google releases dataset linking strings and concepts

May 19th, 2012, by Tim Finin, posted in AI, Google, KR, NLP, Ontologies, Semantic Web, Wikipedia

Yesterday Google announced a very interesting resource with 175M short, unique text strings that were used to refer to one of 7.6M Wikipedia articles. This should be very useful for research on information extraction from text.

“We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia’s groupings of articles into hierarchical categories.

The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article’s canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept’s url. Our database thus includes weights that measure degrees of association.”

The details of the data and how it was constructed are in an LREC 2012 paper by Valentin Spitkovsky and Angel Chang, A Cross-Lingual Dictionary for English Wikipedia Concepts. Get the data here.

Google Knowledge Graph: first impressions

May 19th, 2012, by Tim Finin, posted in AI, Google, KR, NLP, Ontologies, Semantic Web

The Google’s Knowledge Graph showed up for me this morning — it’s been slowly rolling out since the announcement on Wednesday. It builds lots of research from human language technology (e.g., entity recognition and linking) and the semantic web (graphs of linked data). The slogan, “things not strings”, is brilliant and easily understood.

My first impression is that it’s fast, useful and a great accomplishment but leaves lots of room for improvement and expansion. That last bit is a good thing, at least for those of us in the R&D community. Here are some comments based on some initial experimentation.

GKG only works on searches that are simple entity mentions like people, places, organizations. It doesn’t do products (Toyota Camray), events (World War II), or diseases (diabetes) but does recognize that ‘Mercury’ could be a planet or an element.

It’s a bit aggressive about linking: when searching for “John Smith” it zeros in on the 17th century English explorer. Poor Professor Michael Jordan never get a chance, and providing context by adding Berkeley just suppresses the GKG sidebar. “Mitt” goes right to you know who. “George Bush” does lead to a disambiguation sidebar, though. Given that GKG doesn’t seem to allow for context information, the only disambiguating evidence it has is popularity (i.e., pagerank).

Speaking of context, the GKG results seem not to draw on user-specific information, like my location or past search history. When I search for “Columbia” from my location here in Maryland, it suggests “Columbia University” and “Columbia, South Carolina” and not “Columbia, Maryland” which is just five miles away from me.

Places include not just GPEs (geo-political entities) but also locations (Mars, Patapsco river) and facilities (MOMA, empire state building). To the GKG, the White House is just a place.

Organizations seem like a weak spot. It recognizes schools (UCLA) but company mentions seem not to be directly handled, not even for “Google”. A search for “NBA” suggests three “people associated with NBA” and “National Basketball Association” is not recognized. Forget finding out about the Cult of the Dead Cow.

Mike Bergman has some insights based on his exploration of the GKG in Deconstructing the Google Knowledge Graph

The use of structured and semi-structure knowledge in search is an exciting area. I expect we will see much more of this showing up in search engines, including Bing.

Got a problem? There’s a code for that

September 15th, 2011, by Tim Finin, posted in Google, KR, Ontologies, OWL, Semantic Web, Social media

The Wall Street Journal article Walked Into a Lamppost? Hurt While Crocheting? Help Is on the Way describes the International Classification of Diseases, 10th Revision that is used to describe medical problems.

“Today, hospitals and doctors use a system of about 18,000 codes to describe medical services in bills they send to insurers. Apparently, that doesn’t allow for quite enough nuance. A new federally mandated version will expand the number to around 140,000—adding codes that describe precisely what bone was broken, or which artery is receiving a stent. It will also have a code for recording that a patient’s injury occurred in a chicken coop.”

We want to see the search engine companies develop and support a Microdata vocabulary for ICD-10. An ICDM-10 OWL DL ontology has already been done, but a Microdata version might add a lot of value. We could use it on our blogs and Facebook posts to catalog those annoying problems we encounter each day, like W59.22XD (Struck by turtle, initial encounter), or Y07.53 (Teacher or instructor, perpetrator of maltreat and neglect).

Humor aside, a description logic representation (e.g., in OWL) makes the coding system seem less ridiculous. Instead of appearing as a catalog of 140K ground tags, it would emphasize that it is a collection of a much smaller number of classes that can be combined in productive ways to produce them or used to create general descriptions (e.g., bitten by an animal).

JWS special issues: Semantic Sensing and Social Semantic Web

July 27th, 2011, by Tim Finin, posted in AI, Ontologies, Semantic Web, Social media

The Journal of Web Semantics announced two new special issues, one on semantic sensing and another on the semantic and social web. Both will be publshed in 2012 with preprints made freely available online as papers are accepted.

The special issue on semantic sensing will be edited by Harith Alani, Oscar Corcho and Manfred Hauswirth. Papers will be reviewed on a rolling basis and authors are encouraged to submit before the final deadline of 20 December 2011.

The issue on the semantic and social web will be edited by John Breslin and Meena Nagarajan. Papers will be reviewed on a rolling basis and authors are encouraged to submit before the final deadline of 21 January 2012.

See the JWS Guide for Authors for details on the submission process.

Follow the Journal of Web Semantics on facebook and twitter

November 9th, 2009, by Tim Finin, posted in Ontologies, Semantic Web, Social media, Web

Journal of Web SemanticsThe Journal of Web Semantics now has a facebook page and a Twitter account to augment its blog. All three will be used for news and announcements of call for papers, special issues, availability of new papers, etc. As you might expect, the tweets will be terse items, the facebook updates longer notes and the blog posts full of details. Those who are interested can follow @journalWebSem on Twitter, become a fan of the JWS on facebook, and subscribe to the blog’s feed.

New York Times publishes Linked Open Data

October 30th, 2009, by Tim Finin, posted in Ontologies, RDF, Semantic Web

Like many newspapers, the New York Times links the first mention of well known entitles in its articles to a reference page. For example, a mention of Barack Obama links to a page which is a collection of basic information on President Obama and links to relevant stories and other resources that the Times has created.

Now the Times is also using RDF to publish some of information as linked open data. Yesterday the Times announced the publication of an LOD collection covering about 5,000 people at http://data.nytimes.com/ under under a Creative Commons 3.0 Attribution License and plan to put their full collection of 30K topics online soon.

“Over the last several months we have manually mapped more than 5,000 person name subject headings onto Freebase and DBPedia. And today we are pleased to announce the launch of http://data.nytimes.com and the release of these 5,000 person name subject headings as Linked Open Data.

Over the next several months, we plan to expand http://data.nytimes.com to include each of the nearly 30,000 subject headings we use to power Times Topics pages, a collection that includes locations, organizations and descriptors in addition to person names.”

OWL 2 becomes a W3C recommendation

October 27th, 2009, by Tim Finin, posted in AI, KR, Ontologies, OWL, Semantic Web

OWL 2, the new version of the Web Ontology Language, officially became a W3C standard yesterday. From the W3C press release:

“Today W3C announces a new version of a standard for representing knowledge on the Web. OWL 2, part of W3C’s Semantic Web toolkit, allows people to capture their knowledge about a particular domain (say, energy or medicine) and then use tools to manage information, search through it, and learn more from it. Furthermore, as an open standard based on Web technology, it lowers the cost of merging knowledge from multiple domains.”

WolframAlpha releases API

October 16th, 2009, by Tim Finin, posted in AI, KR, NLP, Ontologies, Semantic Web

Wolfram|Alpha is an interesting query answering system developed by Wolfram Research that is a blend of a question answering system and a Semantic Web alternative. It tries to interpret and answer queries expressed as a sequence of words from a large collection of interlinked tables. Oh, and Mathematica is in thrown in for free. A free Web version was released last Spring.

The news today is that Wolfram|Alpha has released an API, as noted in their blog:

“The API allows your application to interact with Wolfram|Alpha much like you do on the web—you send a web request with the same query string you would type into Wolfram|Alpha’s query box and you get back the same computed results. It’s just that both are in a form your application can understand. There are plenty of ways to tweak and control the results, as well.”

The pricing plan runs from $60/month for 1000 (6 cents a query) queries to $220K for up to 10M queries/month (2.2 cents a query). programming language bindings are available for Java, PHP, Perl, Python, Ruby and .NET.

Their original web interface remains free, but the TOS specifies that it “may be used only by a human being using a conventional web browser to manually enter queries one at a time.”

Microsoft Word add-in annotates text with ontology terms

March 16th, 2009, by Tim Finin, posted in Ontologies, RDF, Semantic Web

Microsoft has announced an add-in for Word 2007 that lets authors annotate a word or phrase with terms defined in external ontologies.

Addressing this critical challenge for researchers, Microsoft Corp. and Creative Commons announced today, before an industry panel at the O’Reilly Emerging Technology Conference (ETech 2009), the release of the Ontology Add-in for Microsoft Office Word 2007 that will enable authors to easily add scientific hyperlinks as semantic annotations, drawn from ontologies, to their documents and research papers. Ontologies are shared vocabularies created and maintained by different academic domains to model their fields of study. This Add-in will make it easier for scientists to link their documents to the Web in a meaningful way. Deployed on a wide scale, ontology-enabled scientific publishing will provide a Web boost to scientific discovery.

The add-in is available for download from codeplex, Microsoft’s open source project hosting website. Its has support for a number of features, including syntax coloring of informative words, automatic detection of identifiers, and built-in access to ontologies and controlled vocabularies maintained by NCBO as well as biological databases such as Protein Data Bank, UniProtKB, and NCBI GenBank/RefSeq.

The add-in was produced by the UCSD BioLit group, hence the initial connections to bioinformatics ontologies. It would be great if future versions would have builtin awareness of the more popular linked data vocabularies.

The annotation is done using a custom XML schema which can be extracted and mapped to RDF. This example, from the codeplex site, shows the word “disease” being tagged with Human Disease ontology.

<w:customXml w:uri="http://biolit.ucsd.edu/biolitschema"
  w:element="biolit-term">
    <w:customXmlPr>
        <w:attr w:name="id" w:val="DOID:4" /> 
        <w:attr w:name="type" w:val="Human disease" /> 
        <w:attr w:name="status" w:val="true" /> 
        <w:attr w:name="OntName" w:val="Human disease" /> 
        <w:attr w:name="url" 
          w:val="http://purl.org/obo/owl/DOID#DOID_4" /> 
    </w:customXmlPr>
    <w:smartTag w:uri="BioLitTags" w:element="tag1">
        <w:r>
            <w:t>disease</w:t> 
        </w:r>
    </w:smartTag>
</w:customXml>

It’s not pretty and more verbose than RDFa, but gets the job done. There are many interesting add-ins for Microsoft Office components but most seem to be available for Office 2007 but not the Mac version, Office 2008. :-(

(h/t Frank van Harmelen)

Ontology Summit 2009: Toward Ontology-based Standards

March 15th, 2009, by Tim Finin, posted in KR, Ontologies, Semantic Web

A two day event, Ontology Summit 2009: Toward Ontology-based Standards, will be held 6-7 April 2009 at NIST in Gaithersburg MD. The Summit is co-organized by NIST and a number of other organizations and is part of NIST’s Interoperability week.

“This summit will address the intersection of two active communities, namely the technical standards world, and the community of ontology and semantic technologies. This intersection is long overdue because each has much to offer the other. Ontologies represent the best efforts of the technical community to unambiguously capture the definitions and interrelationships of concepts in a variety of domains. Standards — specifically information standards — are intended to provide unambiguous specifications of information, for the purpose of error-free access and exchange. If the standards community is indeed serious about specifying such information unambiguously to the best of its ability, then the use of ontologies as the vehicle for such specifications is the logical choice. Conversely, the standards world can provide a large market for the industrial use of ontologies, since ontologies are explicitly focused on the precise representation of information. This will be a boost to worldwide recognition of the utility and power of ontological models. The goal of this Ontology Summit 2009 is to articulate the power of synergizing these two communities in the form of a communique in which a number of concrete challenges can be laid out. These challenges could serve as a roadmap that will galvanize both communities and bring this promising technical area to the attention of others.”

The meeting is free, but advanced registration by March 31 is required. You can also register to participate remotely.

Videos of Semantic Web talks and tutorials from ISWC 2008 now online

December 22nd, 2008, by Tim Finin, posted in AI, iswc, KR, Ontologies, Semantic Web

High quality videos of tutorials and talks from the Seventh International Semantic Web Conference are now available on the excellent VideoLectures.net site. It’s a great opportunity to benefit from the conference if you were not able to attend or, even if you were, to see presentations you were not able to attend.

Videolectures captured the slides for most of the presentations (which are available for downloading) and their site shows both the the speaker’s video and slides in synchronization. Videolectures used three camera crews in parallel so were able to capture almost all of the presentations. Here are some highlights from the ~90 videos to whet your appetite.

You are currently browsing the archives for the Ontologies category.

  Home | Archive | Login | Feed