Wikidata will create an editable, Semantic Web compatible version of Wikipedia

March 30th, 2012

Wikidata is a new project that “aims to create a free knowledge base about the world that can be read and edited by humans and machines alike.” The project was started by the German chapter of Wikimedia, the organization that oversees Wikipedia and related projects, and is he first new Wikimedia project since 2006.

Wikidata has its roots in the successful Semantic MediaWiki project, and the Wikidata development team is lead by Dr. Denny Vrandecic, a well known member of the Semantic Web research community and one of the Semantic MediaWiki creators in 2005. The project is funded by Paul Allen’s AI2 foundation (which funded Semantic MediaWiki), Google, and the Gordon and Betty Moore Foundation.

Wikidata will expose the data that underlies Wikipedia and other sources as RDF and JSON and also allow people and programs to query the data as well as adding or editing data.

For more information, see an Wikipedia’s Next Big Thing on Techcrunch or the Wikimedia press release on Wikidata. You an also see a recent Wikidata presentation by Denny and view his talk on the nascent Project at the 2011 Wikimania conference.

Wikimedia fans in our area will find it easy to attend Wikimania 2012, which will be held July 12-15 at George Washington University in the Washington DC area.

Google semantic web search

March 15th, 2012

The Wall Street Journal’s Amir Efrati has an article (Google Gives Search a Refresh) and blog post (What Google’s Search Changes Might Mean for You) on upcoming changes Google to its search engine to exploit semantic data.

“Google is undergoing a major, long-term overhaul of its search-engine, using what’s called semantic Web search to enhance the current system in the coming years. The move, starting over the next few months, will impact the way people can use the search engine as well as how the search engine examines sites across the Web before ranking them in search results.

A Google spokesman said the company wouldn’t comment on future search-engine features. But people familiar with the initiative say that Google users will able to browse through the company’s “knowledge graph,” or its ever-expanding database of information about “entities”—people, places and things—the “attributes” of those entities and how different entities are connected to one another.

Some open standards come from the W3C Semantic Web and, which the major search engine players including Google have agreed to recognize, Cornett said.

LinkData service helps produce RDF linked data from tables

March 7th, 2012

Link Data is a nicely done Web site to help people produce RDF data from simple excel spreadsheets. It appears to be the work of researchers at the RIKEN BASE group at the RIKEN Yokohama Institute in Japan. The approach is straightforward and consists of three steps: creating a template, downloading it as an excel spreadsheet and adding your data, and uploading the result to the site for conversion and publishing.

In the first step, you use the site to create a template for your table, each row of which will be mapped to RDF data about a single subject. The first column of a row must represent the subject and the remaining columns its properties. After specifying the number of columns, you enter into each a string or a URI representing the property. If you enter a string (e.g., ’employer’), the system shows some suggested URIs drawn from the OBO ontologies that you can select instead of the string. You can also specify the cell values will be literals of type date, time, integer or float.

After downloading your spreadsheet template to your computer, you will see that the metadata is embedded in the initial rows of the table. Your next task is to enter your data, either as strings or URIs, as appropriate.

The final step is to upload the spreadsheet with your data to, provide some additional data, and have it converted to RDF and make available on the site. Along the way you can see the results via the W3C validator as serialized in RDF/XML or depicted as a graph. You can also see how your data is connected to datasets in the LOD cloud.

Here’s the result of a simple test, in which I created a data set about people, their employers and their countries of residence.

The approach is simple and has many limitations, but I liked the Web interface and workflow. We’ve done some work in this space with RDF123 and are currently working on automating the process of producing five-start linked data.

LOV Secrets of the Real Ontologies of the LOD Cloud

March 4th, 2012

The Linked Open Vocabularies site collects metadata and statistics about the RDFS and OWL vocbularies used in the Linked Open Data cloud. It looks like an interesting and useful resource.

“Welcome to LOV, your entry point to the growing ecosystem of linked open vocabularies (RDFS or OWL ontologies) used in the Linked Data Cloud. Here you will find vocabularies listed and individually described by metadata, classified by vocabulary spaces, interlinked using the dedicated vocabulary VOAF. You will enjoy querying the LOV dataset either at vocabulary level or at element level, exploring the vocabulary content using full-text faceted search, and finding metrics about the use of vocabularies in the Semantic Web. Not finding your favourite one? Suggest a new vocabulary to add to LOV!”

Cray announces uRiKA as an RDF eating graph appliance

March 2nd, 2012

The Register has an article, Cray gets graphic with big data, on Cray’s uRiKA computer that is designed for analyzing large graphs. The specialized machine uses Crays’s Threadstorm processors and is designed to support up to 8,192 processors and 512TB of shared main memory. If you have to ask how much it costs, you probably can’t afford it — starter rack costs “several hundreds of thousands of dollars”.

Cray is positioning uRiKA as a graph appliance that “complements an existing data warehouse or Hadoop cluster by offloading graph workloads and interoperating within the existing enterprise analytics workflow.” Most interesting to me is that it ships ships with a software suite based on open source Semantic Web technology.

“The hardware, while impressive, is not particularly useful without some software. The Urika stack is based on the Apache Jena project, which is a Java framework for building semantic web applications. SPARQL is the pattern-matching query language for graph applications, and Apache Fuseki is the SPARQL server that runs in conjunction with the Jena framework and that allows data stored in the RDF format, the special format for graph data, to be served up over the HTTP protocol.”

Cynthia Parr at TED 2012 on the Encyclopedia of Life

March 1st, 2012

Congratulations to UMBC Ebiquity alumna Cynthia Parr for being selected to present at the 2012 TED conference on the Encyclopedia of Life project. Cyndy was a research professor at UMBC and worked with us on the SPIRE project. She now works for the Smithsonian Institution and is director of the EOL Species Pages Group.She gave her three minute talk last night and we look forward to seeing it when it is available online. Here’s a note from the TED blog.

Cynthia Parr takes the stage to update us on a massive TED Prize project: The Encyclopedia of Life. Perhaps a quarter of TED talks feature living organisms. Whether it’s an urgent need for conservation or a creature that can teach us something, it’s obvious that you care deeply about biodiversity. And with your help we’ve made Ed Wilson’s grand vision a reality. To build the Encyclopedia of Life we started with databases from leading museums, libraries, and science projects. We’ve brought in content from Flickr and Wikipedia and invited everyone to add text directly to EOL.

Thanks to global partners we’ve now got information in Spanish and Arabic with more languages to come. because people should be able to learn about the species they care about in their own language. And because scientists describe 15,000 new species every year, we set it up so that if they publish in an open-access journal we automatically make a new page on EOL.

Everything on EOL, even the software itself, is free to use and to re-use. As of this week, we’ve got information on almost a million species. That’s an incredible number, nearly half the species in the tree of life and it has only been five years. But we are more than just a bunch of web pages.

Our next steps are to get even more eyeballs looking at EOL working with richer, more computable data. You helped us get off the ground, and we’re gaining momentum. The next five years are going to be even more exciting.