This could be a big step toward the “web of data” vision of the Semantic Web.
“Yahoo! Search BOSS provides access to structured data acquired through SearchMonkey. Currently, we are only exposing data that has been semantically marked up and subsequently acquired by the Yahoo! Web Crawler. In the near future, we will also expose structured data shared with us in SearchMonkey data feeds. In both cases, we will respect site owner requests to opt-out of structured data sharing through BOSS.”
Here’s how it works:
- Sites use microformats or RDF (encoded using RDFa or eRDF) to add structured data to their pages
- Yahoo’s web crawler encounters embedded markup and indexes the structured data along with the unstructured text
- A BOSS developer specifies “view=searchmonkey_rdf” or “view=searchmonkey_feed” in API requests
- BOSS’s response returns the structured data via either XML or JSON
Yahoo’s SearchMonkey only acquires structured data using certain microformats or RDF vocabularies. The microformats supported are hAtom, hCalendar, hCard, hReview, XFN, Geo, rel-tag and adr. RDF vocabularies handled include Dublin Core, FOAF, SIOC, and “other supported vocabularies”. See the appendix on vocabularies in Yahoo’s SearchMonkey Guide for a full list and more information.
A post on the Yahoo search blog also talks about this and other changes to the BOSS service and includes a nice example of the use of structured data encoded using microformats from President Obama’s LinkedIn page.