foaf:mbox_sha1sum considered harmful

December 17th, 2009

The foaf:mbox property is very useful since it is ‘inverse functional’ and can thus serve as an ID for a foaf individual. This lets us infer that two foaf profiles with the same mbox refer to the same person.

Since publishing your email address invites spam, many people use the foaf:mbox_sha1sum property instead of mbox. mbox_sha1sum is also inverse functional but doesn’t reveal your private information (i.e., email address).

Abell on has an interesting post, Gravatars: why publishing your email’s hash is not a good idea, that shows how to crack an MD5 hash of a person’s email address given a little information about the person. (note: The gravitar service supports globally recognized avatars.)

The idea exploits the fact that a few free email services (e.g., gmail, hotmail, yahoo, aol) account for a large fraction of email addresses and using a person’s full name, one can generate likely ‘username’ possibilities. Given an email hash and a persons first and last name, one can generate hashes of likely email addresses until a match is found.

Abell was able (!) to crack 10% of the email addresses for 80,871 users in an hour with a simple Haskell program.

The same attack can be used on foaf:mbox_sha1sum properties, especially since a foaf profile will very handily provide the other useful information about the person. Given the extra information available in many foaf profile (e.g., nick, school homepage) one might even expect better results.

As vulnerabilities go, this doesn’t seem like a very dangerous one. The use of mbox_sha1sum is usually justified as a way to avoid having your email address harvested by spambots. I doubt that spammers would think it productive to spend an hour of computing time to get 1000 email addresses.