Venture Beat reports that Microsoft will acquire Powerset for a price “rumored to be slightly more than $100 million”. Powerset has been developing a Web search system that uses natural language processing technology acquired from PARC to more fully understand user’s queries and the text of documents indexed.
“By buying Powerset, Microsoft is hoping to close the perceived quality gap with Google’s search engine. The move comes as Microsoft CEO Steve Ballmer continues to argue that improving search is Microsoft’s most important task. Microsoft’s market share in search has steadily declined, dropping further and further behind first-place Google and second place Yahoo.
Google has generally dismissed Powerset’s semantic, or “natural language” approach as being only marginally interesting, even though Google has hired some semantic specialists to work on that approach in limited fashion. Google’s search results are still based primarily on the individual words you type into its search bar, and its approach does very little to understand the possible meaning created by joining two or more words together.”
If you put the query “Where is Mount Kilimanjaro” into the beta version of Powerset, it answers
Its response to “what is the Serengeti” is a little less precise. It reports seven things it knows about Serengeti — that it replaced “desert, Platinum”, twilight and Caribbean Blue”, that it hosted ‘migration’, that it provided ‘draw’, that it gained ‘fame’, that it recorded ‘explorations’, that it rutted ‘season’ and that it boasted ‘Blue Wildebeests’. I’m just glad I don’t have a school report due on the Serengeti due tomorrow!
Asking “Who is the president of Zimbabwe” results only in the fallback answer — which appears to be just the set of Wikipedia pages that the query words produce in an IR query. Compare this with the results of the Google query who is the president of zimbabwe site:wikipedia.org.
By the way, the AskWiki system often does a better job on these kinds of question. Asking “where is the Serengeti” produces the answer “The Serengeti ecosystem is located in north-western Tanzania and extends to south-western Kenya between latitudes 1 and 3 S and longitudes 34 and 36 E. It spans some 30,000 km.” It’s a bit of a hack, though. It seems to work by selecting the sentence or two in Wikipedia that best serves as an answer. See our post on Askwiki from last Fall for more examples.
Still, Powerset is an ambitious system that shows promise. What they are trying to do is important and will eventually be done. They have shown real progress in the past two years, more than I had expected. I hope Microsoft can accelerate the development and find practical ways to improve Web search even if the ultimate goal of full language understanding is many years away.