Project Gaydar and privacy in Facebook and other online social networking systems

September 20th, 2009

Today’s Boston Globe has an article on online privacy provocatively titled Project ‘Gaydar’ that leads with a story of an class experiment done by two MIT students on predicting sexual orientation from social network information.

“Using data from the social network Facebook, they made a striking discovery: just by looking at a person’s online friends, they could predict whether the person was gay. They did this with a software program that looked at the gender and sexuality of a person’s friends and, using statistical analysis, made a prediction. The two students had no way of checking all of their predictions, but based on their own knowledge outside the Facebook world, their computer program appeared quite accurate for men, they said.”

I suspect that many will read the article and think that such an analysis can be easily done on their own Facebook information. While I’m not a Facebook expert, I assume that the vast majority of its users employ the default privacy settings which do not allow non-friends to see personal information including gender and the ‘interested in’ attribute, which can be used as a proxy for sexual orientation.

Still, the problem of protecting privacy in online social networking systems is a very real one. The Boston Globe story also mentions work by Murat Kantarcioglu on predicting political affiliations (see Inferring Private Information Using Social Network Data).

“He and a student – who later went to work for Facebook – took 167,000 profiles and 3 million links between people from the Dallas-Fort Worth network. They used three methods to predict a person’s political views. One prediction model used only the details in their profiles. Another used only friendship links. And the third combined the two sets of data. The researchers found that certain traits, such as knowing what groups people belonged to or their favorite music, were quite predictive of political affiliation. But they also found that they did better than a random guess when only using friendship connections. The best results came from combining the two approaches.”

The article also mentions Lise Getoor‘s work on discovering private information by integrating work across Facebook, Flickr, Dogster and BibSonomy (see To Join or not to Join: The Illusion of Privacy in Social Networks with Mixed Public and Private User Profiles).

“Those researchers blinded themselves to the profiles of half the people in each network, and launched a variety of “attacks” on the networks, to see what private information they could glean by simply looking at things like groups people belonged to, and their friendship links. On each network, at least one attack worked. Researchers could predict where Flickr users lived; Facebook users’ gender, a dog’s breed, and whether someone was likely to be a spammer on BibSonomy. The authors found that membership in a group gave away a significant amount of information, but also found that predictions using friend links weren’t as strong as they expected. “Using friends in classifying people has to be treated with care,” computer scientists Lise Getoor and Elena Zheleva wrote.”