Archive for the 'Ontologies' Category
May 19th, 2012, by Tim Finin, posted in AI, Google, KR, NLP, Ontologies, Semantic Web, Wikipedia
Yesterday Google announced a very interesting resource with 175M short, unique text strings that were used to refer to one of 7.6M Wikipedia articles. This should be very useful for research on information extraction from text.
“We consider each individual Wikipedia article as representing a concept (an entity or an idea), identified by its URL. Text strings that refer to concepts were collected using the publicly available hypertext of anchors (the text you click on in a web link) that point to each Wikipedia page, thus drawing on the vast link structure of the web. For every English article we harvested the strings associated with its incoming hyperlinks from the rest of Wikipedia, the greater web, and also anchors of parallel, non-English Wikipedia pages. Our dictionaries are cross-lingual, and any concept deemed too fine can be broadened to a desired level of generality using Wikipedia’s groupings of articles into hierarchical categories.
The data set contains triples, each consisting of (i) text, a short, raw natural language string; (ii) url, a related concept, represented by an English Wikipedia article’s canonical location; and (iii) count, an integer indicating the number of times text has been observed connected with the concept’s url. Our database thus includes weights that measure degrees of association.”
The details of the data and how it was constructed are in an LREC 2012 paper by Valentin Spitkovsky and Angel Chang, A Cross-Lingual Dictionary for English Wikipedia Concepts. Get the data here.
May 19th, 2012, by Tim Finin, posted in AI, Google, KR, NLP, Ontologies, Semantic Web
The Google’s Knowledge Graph showed up for me this morning — it’s been slowly rolling out since the announcement on Wednesday. It builds lots of research from human language technology (e.g., entity recognition and linking) and the semantic web (graphs of linked data). The slogan, “things not strings”, is brilliant and easily understood.
My first impression is that it’s fast, useful and a great accomplishment but leaves lots of room for improvement and expansion. That last bit is a good thing, at least for those of us in the R&D community. Here are some comments based on some initial experimentation.
GKG only works on searches that are simple entity mentions like people, places, organizations. It doesn’t do products (Toyota Camray), events (World War II), or diseases (diabetes) but does recognize that ‘Mercury’ could be a planet or an element.
It’s a bit aggressive about linking: when searching for “John Smith” it zeros in on the 17th century English explorer. Poor Professor Michael Jordan never get a chance, and providing context by adding Berkeley just suppresses the GKG sidebar. “Mitt” goes right to you know who. “George Bush” does lead to a disambiguation sidebar, though. Given that GKG doesn’t seem to allow for context information, the only disambiguating evidence it has is popularity (i.e., pagerank).
Speaking of context, the GKG results seem not to draw on user-specific information, like my location or past search history. When I search for “Columbia” from my location here in Maryland, it suggests “Columbia University” and “Columbia, South Carolina” and not “Columbia, Maryland” which is just five miles away from me.
Places include not just GPEs (geo-political entities) but also locations (Mars, Patapsco river) and facilities (MOMA, empire state building). To the GKG, the White House is just a place.
Organizations seem like a weak spot. It recognizes schools (UCLA) but company mentions seem not to be directly handled, not even for “Google”. A search for “NBA” suggests three “people associated with NBA” and “National Basketball Association” is not recognized. Forget finding out about the Cult of the Dead Cow.
Mike Bergman has some insights based on his exploration of the GKG in Deconstructing the Google Knowledge Graph
The use of structured and semi-structure knowledge in search is an exciting area. I expect we will see much more of this showing up in search engines, including Bing.
September 15th, 2011, by Tim Finin, posted in Google, KR, Ontologies, OWL, Semantic Web, Social media
The Wall Street Journal article Walked Into a Lamppost? Hurt While Crocheting? Help Is on the Way describes the International Classification of Diseases, 10th Revision that is used to describe medical problems.
“Today, hospitals and doctors use a system of about 18,000 codes to describe medical services in bills they send to insurers. Apparently, that doesn’t allow for quite enough nuance. A new federally mandated version will expand the number to around 140,000—adding codes that describe precisely what bone was broken, or which artery is receiving a stent. It will also have a code for recording that a patient’s injury occurred in a chicken coop.”
We want to see the search engine companies develop and support a Microdata vocabulary for ICD-10. An ICDM-10 OWL DL ontology has already been done, but a Microdata version might add a lot of value. We could use it on our blogs and Facebook posts to catalog those annoying problems we encounter each day, like W59.22XD (Struck by turtle, initial encounter), or Y07.53 (Teacher or instructor, perpetrator of maltreat and neglect).
Humor aside, a description logic representation (e.g., in OWL) makes the coding system seem less ridiculous. Instead of appearing as a catalog of 140K ground tags, it would emphasize that it is a collection of a much smaller number of classes that can be combined in productive ways to produce them or used to create general descriptions (e.g., bitten by an animal).
July 27th, 2011, by Tim Finin, posted in AI, Ontologies, Semantic Web, Social media
The Journal of Web Semantics announced two new special issues, one on semantic sensing and another on the semantic and social web. Both will be publshed in 2012 with preprints made freely available online as papers are accepted.
The special issue on semantic sensing will be edited by Harith Alani, Oscar Corcho and Manfred Hauswirth. Papers will be reviewed on a rolling basis and authors are encouraged to submit before the final deadline of 20 December 2011.
The issue on the semantic and social web will be edited by John Breslin and Meena Nagarajan. Papers will be reviewed on a rolling basis and authors are encouraged to submit before the final deadline of 21 January 2012.
See the JWS Guide for Authors for details on the submission process.
November 9th, 2009, by Tim Finin, posted in Ontologies, Semantic Web, Social media, Web
The Journal of Web Semantics now has a facebook page and a Twitter account to augment its blog. All three will be used for news and announcements of call for papers, special issues, availability of new papers, etc. As you might expect, the tweets will be terse items, the facebook updates longer notes and the blog posts full of details. Those who are interested can follow @journalWebSem on Twitter, become a fan of the JWS on facebook, and subscribe to the blog’s feed.
October 30th, 2009, by Tim Finin, posted in Ontologies, RDF, Semantic Web
Like many newspapers, the New York Times links the first mention of well known entitles in its articles to a reference page. For example, a mention of Barack Obama links to a page which is a collection of basic information on President Obama and links to relevant stories and other resources that the Times has created.
Now the Times is also using RDF to publish some of information as linked open data. Yesterday the Times announced the publication of an LOD collection covering about 5,000 people at http://data.nytimes.com/ under under a Creative Commons 3.0 Attribution License and plan to put their full collection of 30K topics online soon.
“Over the last several months we have manually mapped more than 5,000 person name subject headings onto Freebase and DBPedia. And today we are pleased to announce the launch of http://data.nytimes.com and the release of these 5,000 person name subject headings as Linked Open Data.
Over the next several months, we plan to expand http://data.nytimes.com to include each of the nearly 30,000 subject headings we use to power Times Topics pages, a collection that includes locations, organizations and descriptors in addition to person names.”
October 27th, 2009, by Tim Finin, posted in AI, KR, Ontologies, OWL, Semantic Web
OWL 2, the new version of the Web Ontology Language, officially became a W3C standard yesterday. From the W3C press release:
“Today W3C announces a new version of a standard for representing knowledge on the Web. OWL 2, part of W3C’s Semantic Web toolkit, allows people to capture their knowledge about a particular domain (say, energy or medicine) and then use tools to manage information, search through it, and learn more from it. Furthermore, as an open standard based on Web technology, it lowers the cost of merging knowledge from multiple domains.”
October 16th, 2009, by Tim Finin, posted in AI, KR, NLP, Ontologies, Semantic Web
Wolfram|Alpha is an interesting query answering system developed by Wolfram Research that is a blend of a question answering system and a Semantic Web alternative. It tries to interpret and answer queries expressed as a sequence of words from a large collection of interlinked tables. Oh, and Mathematica is in thrown in for free. A free Web version was released last Spring.
The news today is that Wolfram|Alpha has released an API, as noted in their blog:
“The API allows your application to interact with Wolfram|Alpha much like you do on the web—you send a web request with the same query string you would type into Wolfram|Alpha’s query box and you get back the same computed results. It’s just that both are in a form your application can understand. There are plenty of ways to tweak and control the results, as well.”
The pricing plan runs from $60/month for 1000 (6 cents a query) queries to $220K for up to 10M queries/month (2.2 cents a query). programming language bindings are available for Java, PHP, Perl, Python, Ruby and .NET.
Their original web interface remains free, but the TOS specifies that it “may be used only by a human being using a conventional web browser to manually enter queries one at a time.”
March 16th, 2009, by Tim Finin, posted in Ontologies, RDF, Semantic Web
Microsoft has announced an add-in for Word 2007 that lets authors annotate a word or phrase with terms defined in external ontologies.
Addressing this critical challenge for researchers, Microsoft Corp. and Creative Commons announced today, before an industry panel at the O’Reilly Emerging Technology Conference (ETech 2009), the release of the Ontology Add-in for Microsoft Office Word 2007 that will enable authors to easily add scientific hyperlinks as semantic annotations, drawn from ontologies, to their documents and research papers. Ontologies are shared vocabularies created and maintained by different academic domains to model their fields of study. This Add-in will make it easier for scientists to link their documents to the Web in a meaningful way. Deployed on a wide scale, ontology-enabled scientific publishing will provide a Web boost to scientific discovery.
The add-in is available for download from codeplex, Microsoft’s open source project hosting website. Its has support for a number of features, including syntax coloring of informative words, automatic detection of identifiers, and built-in access to ontologies and controlled vocabularies maintained by NCBO as well as biological databases such as Protein Data Bank, UniProtKB, and NCBI GenBank/RefSeq.
The add-in was produced by the UCSD BioLit group, hence the initial connections to bioinformatics ontologies. It would be great if future versions would have builtin awareness of the more popular linked data vocabularies.
The annotation is done using a custom XML schema which can be extracted and mapped to RDF. This example, from the codeplex site, shows the word “disease” being tagged with Human Disease ontology.
<w:attr w:name="id" w:val="DOID:4" />
<w:attr w:name="type" w:val="Human disease" />
<w:attr w:name="status" w:val="true" />
<w:attr w:name="OntName" w:val="Human disease" />
<w:smartTag w:uri="BioLitTags" w:element="tag1">
It’s not pretty and more verbose than RDFa, but gets the job done. There are many interesting add-ins for Microsoft Office components but most seem to be available for Office 2007 but not the Mac version, Office 2008.
(h/t Frank van Harmelen)
March 15th, 2009, by Tim Finin, posted in KR, Ontologies, Semantic Web
A two day event, Ontology Summit 2009: Toward Ontology-based Standards, will be held 6-7 April 2009 at NIST in Gaithersburg MD. The Summit is co-organized by NIST and a number of other organizations and is part of NIST’s Interoperability week.
“This summit will address the intersection of two active communities, namely the technical standards world, and the community of ontology and semantic technologies. This intersection is long overdue because each has much to offer the other. Ontologies represent the best efforts of the technical community to unambiguously capture the definitions and interrelationships of concepts in a variety of domains. Standards — specifically information standards — are intended to provide unambiguous specifications of information, for the purpose of error-free access and exchange. If the standards community is indeed serious about specifying such information unambiguously to the best of its ability, then the use of ontologies as the vehicle for such specifications is the logical choice. Conversely, the standards world can provide a large market for the industrial use of ontologies, since ontologies are explicitly focused on the precise representation of information. This will be a boost to worldwide recognition of the utility and power of ontological models. The goal of this Ontology Summit 2009 is to articulate the power of synergizing these two communities in the form of a communique in which a number of concrete challenges can be laid out. These challenges could serve as a roadmap that will galvanize both communities and bring this promising technical area to the attention of others.”
The meeting is free, but advanced registration by March 31 is required. You can also register to participate remotely.
December 22nd, 2008, by Tim Finin, posted in AI, iswc, KR, Ontologies, Semantic Web
High quality videos of tutorials and talks from the Seventh International Semantic Web Conference are now available on the excellent VideoLectures.net site. It’s a great opportunity to benefit from the conference if you were not able to attend or, even if you were, to see presentations you were not able to attend.
Videolectures captured the slides for most of the presentations (which are available for downloading) and their site shows both the the speaker’s video and slides in synchronization. Videolectures used three camera crews in parallel so were able to capture almost all of the presentations. Here are some highlights from the ~90 videos to whet your appetite.
October 3rd, 2008, by Tim Finin, posted in AI, KR, Ontologies, Semantic Web
Conrad Barski, M.D. will give a talk on “How To Tell Stuff To Your Computer — The Enigmatic Art of Knowledge Representation” at UMBC at 1:00pm on Friday 17 October in Lecture Hall 8 in the ITE building.
Barski maintains an interesting site, Lisperati , that has graphical introductions to a number of topics, including Lisp, Haskell, Emacs, etc. and well as serving as he home of FringeDC an informal group of people interested in “fringe” programming languages.
Here’s the abstract for his talk.
“Have you ever wondered how we take information from the “real world” and put it into our computers? When we do this, do we lose parts of the information? Are some concepts just too hard to turn into ones and zeroes? How is our ability to enter information limited by the data structures we use inside of our computers? These questions enter into a science that is rarely discussed: The science of Knowledge Representation.
My presentation on KR will include some navel gazing, but also some nitty-gritty practical examples of Description Logics, RDF, and other modern approaches to capturing complicated information within a computer. We will also discuss some likely future directions this field may head into.”
Dr. Barski is a Medical Software Developer working on cardiology procedure documentation for Wolters Kluwer Health. He is also currently working on a textbook on the Common Lisp programming language.
You can submit a question either before, during or after the talk here.
You are currently browsing the archives for the Ontologies category.