Microdata chosen over RDFa for semantics by Google, Bing and Yahoo!

June 2nd, 2011

Google, Bing and Yahoo! are cooperating on an approach to representing structured data in Web pages via the launch of schema.org. The approach is microdata and the schema.org site documents the schemas that are supported today.

“This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google and Yahoo! rely on this markup to improve the display of search results, making it easier for people to find the right web pages. Many sites are generated from structured data, which is often stored in databases. When this data is formatted into HTML, it becomes very difficult to recover the original structured data. Many applications, especially search engines, can benefit greatly from direct access to this structured data. On-page markup enables search engines to understand the information on web pages and provide richer search results in order to make it easier for users to find relevant information on the web. Markup can also enable new tools and applications that make use of the structure. A shared markup vocabulary makes easier for webmasters to decide on a markup schema and get the maximum benefit for their efforts. So, in the spirit of sitemaps.org, Bing, Google and Yahoo! have come together to provide a shared collection of schemas that webmasters can use.”

That’s the good news. The bad news, or at least the less good news, is that it based on microdata and not RDFa. Microdata is a relatively new way to embed semantic information in HTML and designed to be part of the HTML5 suite. It is less expressive than RDFa but also simpler. It’s main advantage over microformats is that it is extensible — you can define new semantic vocabulary terms. Here is how the three companies described the choice.

Google: “Historically, we’ve supported three different standards for structured data markup: microdata, microformats, and RDFa. We’ve decided to focus on just one format for schema.org to create a simpler story for webmasters and to improve consistency across search engines relying on the data.”

Yahoo!:“Today’s announcement offers tremendous opportunity for growth. In addition to consolidating the schemas for the vocabularies we already support, there are schemas for more than a hundred newly created categories including movies, music, organizations, TV shows, products, places and more. We will continue to expand these categories by listening to feedback from the community and will continue publishing new schemas on a regular basis. Don’t worry if your site has already added RDFa or microformats currently supported by our Enhanced Displays program, that site will still appear with an Enhanced Display on Yahoo! – no changes required.”

Bing:“At Bing we understand the significant investment required to implement markup, and feel strongly that by partnering with Google and Yahoo! on standard schemas webmasters can be more efficient with the time they invest… Bing accepts a wide variety of markup formats today (Open Graph, microformat, etc.) for features like Tiles and will continue to do so, but by standardizing on schema.org we are looking to simplify the markup choices for webmasters and amplify the value the receive in return.

The scheme.org site has a FAQ that includes the question “Q: Why microdata? Why not RDFa or microformats?” which is answered thusly:

“Focusing on microdata was a pragmatic decision. Supporting multiple syntaxes makes documentation for webmasters more complex and introduces more overhead in terms of defining new formats. Microformats are concise and easy to understand, but they don’t offer an open extensibility mechanism and the reuse of the class tag can cause conflicts with website CSS. RDFa is extensible and very expressive, but the substantial complexity of the language has contributed to slower adoption. Microdata is the most recent well-known standard, created along with HTML5. It strikes a balance between extensibility and simplicity, and is most suitable for building the schema.org. Google and Yahoo! have in the past supported both microformats and RDFa for certain schemas and will continue to support these syntaxes for those schemas. We will also be monitoring the web for RDFa and microformats adoption and if they pick up, we will look into supporting these syntaxes. Also read the section on the data model for more on RDFa.”

Guha has a generous comment in his post on the official Google blog:

“While this collaborative initiative is new, we draw heavily from the decades of work in the database and knowledge representation communities, from projects such as Jim Gray’s SDSS Skyserver, Cyc and from ongoing efforts such as dbpedia.org and linked data. We feel privileged to build upon this great work. We look forward to seeing structured markup continue to grow on the web, powering richer search results and new kinds of applications.”

I’ve not studied microdata yet, so don’t know how I feel about the expressiveness/simplicity tradeoffs it has made. I wonder if it is possible to add an OWL-like layer on top ofMicrodata, for example.