Microsoft has announced an add-in for Word 2007 that lets authors annotate a word or phrase with terms defined in external ontologies.
Addressing this critical challenge for researchers, Microsoft Corp. and Creative Commons announced today, before an industry panel at the O’Reilly Emerging Technology Conference (ETech 2009), the release of the Ontology Add-in for Microsoft Office Word 2007 that will enable authors to easily add scientific hyperlinks as semantic annotations, drawn from ontologies, to their documents and research papers. Ontologies are shared vocabularies created and maintained by different academic domains to model their fields of study. This Add-in will make it easier for scientists to link their documents to the Web in a meaningful way. Deployed on a wide scale, ontology-enabled scientific publishing will provide a Web boost to scientific discovery.
The add-in is available for download from codeplex, Microsoft’s open source project hosting website. Its has support for a number of features, including syntax coloring of informative words, automatic detection of identifiers, and built-in access to ontologies and controlled vocabularies maintained by NCBO as well as biological databases such as Protein Data Bank, UniProtKB, and NCBI GenBank/RefSeq.
The add-in was produced by the UCSD BioLit group, hence the initial connections to bioinformatics ontologies. It would be great if future versions would have builtin awareness of the more popular linked data vocabularies.
The annotation is done using a custom XML schema which can be extracted and mapped to RDF. This example, from the codeplex site, shows the word “disease” being tagged with Human Disease ontology.
It’s not pretty and more verbose than RDFa, but gets the job done. There are many interesting add-ins for Microsoft Office components but most seem to be available for Office 2007 but not the Mac version, Office 2008.
A two day event, Ontology Summit 2009: Toward Ontology-based Standards, will be held 6-7 April 2009 at NIST in Gaithersburg MD. The Summit is co-organized by NIST and a number of other organizations and is part of NIST’s Interoperability week.
“This summit will address the intersection of two active communities, namely the technical standards world, and the community of ontology and semantic technologies. This intersection is long overdue because each has much to offer the other. Ontologies represent the best efforts of the technical community to unambiguously capture the definitions and interrelationships of concepts in a variety of domains. Standards — specifically information standards — are intended to provide unambiguous specifications of information, for the purpose of error-free access and exchange. If the standards community is indeed serious about specifying such information unambiguously to the best of its ability, then the use of ontologies as the vehicle for such specifications is the logical choice. Conversely, the standards world can provide a large market for the industrial use of ontologies, since ontologies are explicitly focused on the precise representation of information. This will be a boost to worldwide recognition of the utility and power of ontological models. The goal of this Ontology Summit 2009 is to articulate the power of synergizing these two communities in the form of a communique in which a number of concrete challenges can be laid out. These challenges could serve as a roadmap that will galvanize both communities and bring this promising technical area to the attention of others.”
High quality videos of tutorials and talks from the Seventh International Semantic Web Conference are now available on the excellent VideoLectures.net site. It’s a great opportunity to benefit from the conference if you were not able to attend or, even if you were, to see presentations you were not able to attend.
Videolectures captured the slides for most of the presentations (which are available for downloading) and their site shows both the the speaker’s video and slides in synchronization. Videolectures used three camera crews in parallel so were able to capture almost all of the presentations. Here are some highlights from the ~90 videos to whet your appetite.
An OWL 2 Far?: A panel that takes up the question of whether having standard languages based on formal methods with steadily increasing power is the right way to support the Semantic Web.
Conrad Barski, M.D. will give a talk on “How To Tell Stuff To Your Computer — The Enigmatic Art of Knowledge Representation” at UMBC at 1:00pm on Friday 17 October in Lecture Hall 8 in the ITE building.
Barski maintains an interesting site, Lisperati , that has graphical introductions to a number of topics, including Lisp, Haskell, Emacs, etc. and well as serving as he home of FringeDC an informal group of people interested in “fringe” programming languages.
Here’s the abstract for his talk.
“Have you ever wondered how we take information from the “real world” and put it into our computers? When we do this, do we lose parts of the information? Are some concepts just too hard to turn into ones and zeroes? How is our ability to enter information limited by the data structures we use inside of our computers? These questions enter into a science that is rarely discussed: The science of Knowledge Representation.
My presentation on KR will include some navel gazing, but also some nitty-gritty practical examples of Description Logics, RDF, and other modern approaches to capturing complicated information within a computer. We will also discuss some likely future directions this field may head into.”
Dr. Barski is a Medical Software Developer working on cardiology procedure documentation for Wolters Kluwer Health. He is also currently working on a textbook on the Common Lisp programming language.
You can submit a question either before, during or after the talk here.
Evri is another entry into the ’semantic search’ space and has recently opened up a beta site with the slogan Search less, understand more. Evri is an startup launched by Vulcan Inc, a company founded by Paul Allen in 1986 as a private investment and R&D firm.
Here’s part of how Evri describes itself on their (FAQ).
“What is Evri doing? Evri is creating a map of connections between people, places, and things on the web. You’ll use this map to find the things you’re interested in. Instead of searching by keywords and looking for relevant results, Evri will lead you to other relevant articles, images, and video based on what you’re reading.
… Where does Evri get its information? We search the World Wide Web and gather content from as many highly regarded information sources as we can find, and we’re adding more sources all the time.”
Saying that Evri does ’semantic search’ is not quite right — their initial focus is on providing widgets for blogs and other web sites that use the text on the page to recommend links to other, related information.
Evri appears to have developed an underlying ontology that is used to organize their knowledge of “people, products and things”, capturing both a type taxonomy and relations. Some of this is revealed in the beta**2 part of their site, Evri’s Garden. There is a query system over their knowledge base complex search queries.
The current push, though, seems to be to get bloggers to add an Evri widget to their blogs that will pop up a window with links to related articles and information.
This is an interesting development that is worth watching.
Databases are a fundamental technology for most information systems and especially those based on the web. A group of senior database researchers met recently to assess the state of database research, as documented in site. So, where did the Semantic Web fit into their vision?
“In late May, 2008, a group of database researchers, architects, users and pundits met at the Claremont Resort in Berkeley, California to discuss the state of the research field and its impacts on practice. This was the seventh meeting of this sort in twenty years, and was distinguished by a broad consensus that we are at a turning point in the history of the field, due both to an explosion of data and usage scenarios, and to major shifts in computing hardware and platforms. Given these forces, we are at a time of opportunity for research impact, with an unusually large potential for influential results across computing, the sciences and society. This report details that discussion, and highlights the group’s consensus view of new focus areas, including new database engine architectures, declarative programming languages, the interplay of structured and unstructured data, cloud data services, and mobile and virtual worlds.”
It’s a good report with lots of interesting things in it and definitely worth reading, but I was disappointed to find that it makes no mention of the Semantic Web, RDF, OWL, ontologies, AI, knowledge bases, or reasoning. Here’s a word cloud (generated with wordle) generated from the report, which provides a 10,000 foot view of it’s content.
The reports says that it was “surprisingly easy for the group to reach consensus on a set of research topics to highlight for investigation in coming years”. Those topics are:
Revisiting Database Engines
Declarative Programming for Emerging Platforms
The Interplay of Structured and Unstructured Data
Cloud Data Services
Mobile Applications and Virtual Worlds
There is clearly overlap between the database and semantic web communities in the first three topics.
David Huynh completed his PhD at MIT CSAIL last year and joined MetaWeb a few months ago, where he has been working on new and better interfaces to explore the data encoded in their Freebase system. He recently released Parallax as a prototype browsing interface for Freebase. Here is a video that shows the interface in action.
Freebase is “an open database of the world’s information” that is constructed by a Wiki-like collaborative community. In many ways it is like the Semantic Web model, with two big differences: (1) the data is stored centrally rather than distributed across the Web and (2) the representation system is not based on RDF but rather uses a custom built object-oriented data representation language.
Freebase is a great resource. Much of the data is extracted from Wikipedia, so its content has a large overlap with DBpedia. But it is also relatively easy to upload additional information in various structured forms and many have done so, resulting in an extended coverage.
This is clearly a system in the Web of Data space along with the Linking Open Data effort and having it should offer a way for us all to explore the consequences of some of the underlying design decisions.
“2008-06-20: The Semantic Web Deployment Working Group has published a Candidate Recommendation of RDFa in XHTML: Syntax and Processing. Web documents contain significant amounts of structured data, which is largely unavailable to tools and applications. When publishers can express this data more completely, and when tools can read it, a new world of user functionality becomes available, letting users transfer structured data between applications and web sites, and allowing browsing applications to improve the user experience. RDFa is a specification for attributes to be used with languages such as HTML and XHTML to express structured data. See the group’s RDFa implementation report. The Working Group also updated the companion document RDFa Primer. Learn more about the Semantic Web and the HTML Activity.”
Achieving candidate recommendation status is a significant step toward becoming a W3C recommendation. Congratulation to the working group for all of their efforts in developing RDFa.
Joshua Tauberer, a Upenn Linguistics graduate student, maintains rdf:about as a resouce of information on the semantic web language RDF. Its a consise collection of information that manages not to overwhelm and includes good Quick Intro and RDF in Depth pages.
“We invite submissions to the sixth annual Semantic Web Challenge, the premiere event for demonstrating practical progress towards achieving the vision of the Semantic Web. The central idea of the Semantic Web is to extend the current human-readable web by encoding some of the semantics of resources in a machine-processable form. Moving beyond syntax opens the door to more advanced applications and functionality on the Web. Computers will be better able to search, process, integrate and present the content of these resources in a meaningful, intelligent manner.
As the core technological building blocks are now in place, the next challenge is to show off the benefits of semantic technologies by developing integrated, easy to use applications that can provide new levels of Web functionality for end users on the Web or within enterprise settings. Applications submitted should demonstrate clear practical value that goes above and beyond what is possible with conventional web technologies alone.
Unlike in previous years, the Semantic Web Challenge of 2008 will consist of two tracks: the Open Track and the Billion Triples Track. The key difference between the two tracks is that the Billion Triples Track requires the participants to make use of the data set –a billion triples– provided by the organizers. The Open Track has no such restrictions.
As before, the Challenge is open to everyone from academia and industry. The authors of the best applications will be awarded prizes and featured prominently at special sessions during the conference”
“Swoogle has indexed millions of Semantic Web Documents, but how do I know that mine has been indexed?” Here is a simple way - please try your URL using Swoogle Track Back Service. Here I list several example to show how it works:
http://protege.stanford.edu/plugins/owl/protege
——————————————————————————–
About this URL
The latest ping on [2006-01-29] shows its status is [Succeed, changed into SWD].
Its latest cached original snapshot is [2006-01-29 (3373 bytes)]
Its latest cached NTriples snapshot is [2006-01-29 (41 triples)].
——————————————————————————–
We have found 7 cached versions.
2006-01-29: Original Snapshot (3373 bytes), NTriples Snapshot (41 triples)
2005-08-25: Original Snapshot (3373 bytes), NTriples Snapshot (41 triples)
2005-07-16: Original Snapshot (2439 bytes), NTriples Snapshot (35 triples)
2005-05-20: Original Snapshot (2173 bytes), NTriples Snapshot (30 triples)
2005-04-10: Original Snapshot (1909 bytes), NTriples Snapshot (28 triples)
2005-02-25: Original Snapshot (1869 bytes), NTriples Snapshot (27 triples)
2005-01-24: Original Snapshot, NTriples Snapshot (31 triples)
We may also check the growth of FOAF documents.
http://www.csee.umbc.edu/~dingli1/foaf.rdf
——————————————————————————–
About this URL
The latest ping on [2006-01-29] shows its status is [Succeed, changed into SWD].
Its latest cached original snapshot is [2006-01-29 (6072 bytes)]
Its latest cached NTriples snapshot is [2006-01-29 (98 triples)].
——————————————————————————–
We have found 6 cached versions.
2006-01-29: Original Snapshot (6072 bytes), NTriples Snapshot (98 triples)
2005-07-16: Original Snapshot (6072 bytes), NTriples Snapshot (98 triples)
2005-06-19: Original Snapshot (5053 bytes), NTriples Snapshot (80 triples)
2005-04-17: Original Snapshot (3142 bytes), NTriples Snapshot (50 triples)
2005-04-01: Original Snapshot (1761 bytes), NTriples Snapshot (29 triples)
2005-01-24: Original Snapshot, NTriples Snapshot (29 triples)
Finally, this service may also help us learn the life cycle of a semantic web document: it was created, actively maintained, lingered around for a while and finally died (i.e. went offline).
http://simile.mit.edu/repository/fresnel/style.rdfs.n3
——————————————————————————–
About this URL
The latest ping on [2006-02-02] shows its status is [Failed, http code is not 200 (or406)].
Its latest cached original snapshot is [2005-03-09 (15809 bytes)]
Its latest cached NTriples snapshot is [2005-03-09 (149 triples)].
——————————————————————————–
We have found 3 cached versions.
2005-03-09: Original Snapshot (15809 bytes), NTriples Snapshot (149 triples)
2005-02-25: Original Snapshot (12043 bytes), NTriples Snapshot (149 triples)
2005-01-26: Original Snapshot, NTriples Snapshot (145 triples)
NOTICE: Yesterday we posted a form that direct you to Swoogle trackback service. Unfortunately, the form failed when it was called outside our firewall because a Swoogle API key is required. We didn’t notice at first, because we were inside the firewall when we tested it. When we did, we deleted the post, but PlanetRDF had already picked up the post and it was still in our database. Now the form has been removed, but you can definitely go to swoolge web site and try trackback service there.
We’ve set up a Google group, Swooglers, for users of the Swoogle Semantic Web search engine. Anyone can browse the archived and join, but only members can post messages. Replies are sent to the whole group. We’re not exactly sure what Swooglers will have to talk about, but it might be a place to share your experiences in using Swoogle, ask other users for advice, etc.