At the end of last week, AOL Research announced that it was releasing for research purposes several datasets from its search engine, including query streams for 500K users over three months. Adam D’Angelo points out that this could compromise the privacy of AOL users. The data has been anonymized, of course, by replacing user ids within a query session with a unique number. But some query streams might contain enough information to allow someone to make a good guess at the user’s identity. This sort of query data is one of the things that Google refused to provide the Department of Justice last Spring. On the other hand, Microsoft was offering researchers similar query data earlier this year. I think it’s a close call.
Update: As of 10:00pm Sunday night, the query stream data link is no longer there.