Gaydar, Facebook and privacy

October 6th, 2009

In the Fall of 2007, two MIT students carried out a class project exploring how presumably private data could be inferred from an online social networking system. Their experiment was to predict the sexual orientation of Facebook users who make their basic information public by analyzing friendship associations. As reported in the Boston Globe last month, the students’ had not yet published their results.

Well, now they have — in the October issue of the First Monday, “one of the first openly accessible, peer–reviewed journals on the Internet”.

The paper has a lot of detail on the methodology for collecting the data and how it was analyzed. Here’s the abstract.

“Public information about one’s coworkers, friends, family, and acquaintances, as well as one’s associations with them, implicitly reveals private information. Social networking Web sites, e–mail, instant messaging, telephone, and VoIP are all technologies steeped in network data — data relating one person to another. Network data shifts the locus of information control away from individuals, as the individual’s traditional and absolute discretion is replaced by that of his social network. Our research demonstrates a method for accurately predicting the sexual orientation of Facebook users by analyzing friendship associations. After analyzing 4,080 Facebook profiles from the MIT network, we determined that the percentage of a given user’s friends who self–identify as gay male is strongly correlated with the sexual orientation of that user, and we developed a logistic regression classifier with strong predictive power. Although we studied Facebook friendship ties, network data is pervasive in the broader context of computer–mediated communication, raising significant privacy issues for communication technologies to which there are no neat solutions.”

As we had previously noted, this datamining exercise only accesses information that Facebook users explicitly choose to make public. The authors note that their analysis “relies on public self–identification of same–gender interest in Facebook profiles as a sentinel value for LGB identity”. The privacy vulnerability is that the default setting for a Facebook account is that friendship relations are public and you can not control the privacy settings of your friends. So if your leave your friend list public and many of your Facebook friends open up their profiles, it may be possible to draw reasonable inferences about your age, gender, political leanings, sexual preferences and other attributes.

Geographic distribution of social networking systems popularity

August 12th, 2008

Using Google’s Insights for Search, Pingdom has “looked at 12 of the top social networks to answer a simple, but highly interesting question: Where are they the most popular?”. In their post, Social network popularity around the world, they surveyed MySpace, Facebook, Hi5, Friendster, LinkedIn, Orkut,, LiveJournal, Xanga, Bebo, Imeem and Twitter. Their technique was simple: search for MySpace and use the “regional interest” estimates. Here are some observations they made:

  • Facebook is most popular in Turkey and Canada.
  • Friendster and Imeem are most popular in the Philippines.
  • LinkedIn is most popular in India.
  • Twitter is most popular in Japan.
  • LiveJournal is more popular in Russia than it is in the United States.
  • Orkut is more popular in Iran (10th country popularity-wise) than it is in the United States.
  • MySpace is the only social network which is most popular in the United States.
  • MySpace, LinkedIn, LiveJournal, Xanga, and Twitter are the only social networks in this survey which have the United States in their top five countries, popularity-wise. That is just five out of twelve.

The technique is simple and somewhat crude, but probably accurate enough for a first order approximation. It also provides data that compliments the data that these systems provide on the geographic distribution of their users.

Is it Lindsay Lohan or your friends who make you a binge drinker?

June 23rd, 2008

What determines our behavior or beliefs? Are we influenced by people who are the well-known and popular leaders — political, social, religious — in our society or by the few hundred people that are in our immediate social network — family, friends and co-workers. It’s reasonable to assume that it varies by domain or topic, with your music preferences falling in the first category and your spiritual orientation in the second.

Paul Ormerod and Greg Wiltshire have a preprint of a paper ‘Binge’ drinking in the UK: a social network phenomenon (pdf) that reports on a study that the binge drinking phenomenon seems to spread through “small world” social networks rather than by imitating influentials in a “scale free” network

“We analyse the recent rapid growth of ‘binge’ drinking in the UK. This means the consumption of large amounts of alcohol, especially by young people, leading to serious anti-social and criminal behaviour in urban centres. We show how a simple agent-based model, based on binary choice with externalities, combined with a small amount of survey data can explain the phenomenon. We show that the increase in binge drinking is a fashion-related phenomenon, with imitative behaviour spreading across social networks. The results show that a small world network, rather than a random or scale free, offers the best description of the key aspects of the data.”

It’s fascinating that with the right data, simulation models can help to answer such questions.