UMBC ebiquity
foaf:mbox_sha1sum considered harmful

foaf:mbox_sha1sum considered harmful

Tim Finin, 11:52pm 17 December 2009

The foaf:mbox property is very useful since it is ‘inverse functional’ and can thus serve as an ID for a foaf individual. This lets us infer that two foaf profiles with the same mbox refer to the same person.

Since publishing your email address invites spam, many people use the foaf:mbox_sha1sum property instead of mbox. mbox_sha1sum is also inverse functional but doesn’t reveal your private information (i.e., email address).

Abell on developer.it has an interesting post, Gravatars: why publishing your email’s hash is not a good idea, that shows how to crack an MD5 hash of a person’s email address given a little information about the person. (note: The gravitar service supports globally recognized avatars.)

The idea exploits the fact that a few free email services (e.g., gmail, hotmail, yahoo, aol) account for a large fraction of email addresses and using a person’s full name, one can generate likely ‘username’ possibilities. Given an email hash and a persons first and last name, one can generate hashes of likely email addresses until a match is found.

Abell was able (!) to crack 10% of the email addresses for 80,871 stackoverflow.com users in an hour with a simple Haskell program.

The same attack can be used on foaf:mbox_sha1sum properties, especially since a foaf profile will very handily provide the other useful information about the person. Given the extra information available in many foaf profile (e.g., nick, school homepage) one might even expect better results.

As vulnerabilities go, this doesn’t seem like a very dangerous one. The use of mbox_sha1sum is usually justified as a way to avoid having your email address harvested by spambots. I doubt that spammers would think it productive to spend an hour of computing time to get 1000 email addresses.

Related posts:

  1. FOAF dataset available
  2. New Facebook Groups Considered Somewhat Harmful
  3. New Facebook Groups Considered Harmful
  4. Fininding foaf instances
  5. WebFinger: a finger protocol for the Web

2 Responses to “foaf:mbox_sha1sum considered harmful”

  1. Dan Brickley Says:

    I think this trick has outlived its utility. We designed it when FOAF files were largely self-published; today it is more commonly used when large sites publish on behalf of their non-technical users. These days we have OpenID URIs, clarity in the RDF Core specs that a thing can have multiple URIs and the standard OWL relation owl:sameAs for connecting them. So I think we should probably start phasing it out…

  2. Henry Story Says:

    The real solution is not to publish all the information for all to see, but to publish views on people depending on who is looking. Essentially this is what FaceBook or LinkedIn do. If you are logged into their service and you are looking at a friends info (perhaps this even now extends to foafs) you can see more information about them.

    The same can be done in a distributed way with foaf+ssl, or OpenId if the attribute exchange part is developed more RESTfully.