“See the screen shot… users are shown a query and a number of results and are asked to evaluate the relevancy of each result from five choices. In this case, the query is â€œrevealing bikinis.â€ Users are asked to evaluate four sets of results within ten minutes, and are paid $0.02 for the effort.
I spoke with Powerset CEO Barney Pell this evening who confirmed that they are using Mechanical Turk to get human feedback on search results. He says the results are not all Powerset generated – rather, they show results from Powerset, Google and others to see which users prefer for a given query. He also says this is an ongoing project, and new ones will be added soon. Pell also said that Powerset plans to use Mechanical Turk over the long haul, even after launch. Theyâ€™ll put actual user queries into Mechanical Turk in real time, add Powerset and competitor results and see which results people find more relevant. If results suggest Powerset isnâ€™t more relevant, theyâ€™ll adjust their engine.” (link)
This is a good example of how Amazon’s service can work. I was surprised at the low cost — two cents for a judgment! We have a number of projects where we need to have human assessments for training data. Instead of turning to our usual source, students and faculty, maybe we should explore using the Mechanical Turk. In some cases, getting local people to do the judgments was difficult. For example, we were interested in expanding our splog detection system to languages other than English, but didn’t have access to native speakers to the right languages.