Google’s splog detection methodology for Blogger
Tim Finin, 1:00pm 23 April 2006Google has a post on their Blogger Buzz blog mentioning their splog elimination strategy for Blogger:
“As others have noted, we’ve made good progress in the past six months in reducing the amount of spam on Blog*Spot. One of the tools we’re using is an automatic spam classifier. The risk in using a classifier is that we will mistakenly identify good content as spam. This percentage of false positives is both very low and one that we are reducing by further improving our systems.”
Irishwonder notes that Google still does manual reviews of splogs:
“A while ago, I posted about how Google’s manual review can be detected through your logs. Well, last week I could verify it’s still true - the URL of doom http://www.corp.google.com/~pong/spam/ has appeared in the logs of my other blog splog and it has ceased to exist.”
Having a manual check for automatically classified splogs is a god idea, especially if your classifier produces a certainty measure so the human checkers can focus on blogs on the blog/splog borderline.

