UMBC ebiquity
How Google separates the blogs from the splogs

How Google separates the blogs from the splogs

Tim Finin, 1:00pm 19 March 2007

Google’s patent application (filed 13 September 2005) for Ranking blog documents is being discussed around the web.

“A blog search engine may receive a search query. The blog search engine may determine scores for a group of blog documents in response to the search query, where the scores are based on a relevance of the group of blog documents to the search query and a quality of the group of blog documents. The blog search engine may also provide information regarding the group of blog documents based on the determined scores.”

The Google Operating System blog has a nice summary of the features Google mentions as useful in separating the blogs from the splogs. No surprises here.

Positive features Negative features
  • links from blogrolls (especially from high-quality blogrolls or blogrolls of “trusted bloggers”)
  • links from other sources (mail, chats)
  • using tags to categorize a post
  • PageRank
  • the number of feed subscriptions (from feed readers)
  • clicks in search results
  • posts added at a predictable time
  • different content between the site and the feed
  • the amount of duplicate content
  • using words/n-grams that appear frequently in spam blogs
  • posts that have identical size
  • linking to a single web page
  • a large number of ads
  • the location of ads (”the presence of ads in the recent posts part of a blog”)

Spotted on Micro Persuasion.


Comments are closed.