We present some updates on the Splogosphere as seen at a pingserver (weblogs.com). This follows our study from a year earlier which reported on splogs in the English speaking blogosphere. Our current update is based on 8.8 million pings on weblogs.com between January 23rd and January 26th. Though not fully representative, it does give a good sense of spam in the indexed blogosphere.
(i) 53% of all pings is spam, 64% of all pings from blogs in English is spam. A year earlier we found that close to 75% of all pings from English blogs are spings. Dave Sifry reported on seeing 70% spings in his last report. Clearly the growth of spings has plateaued, one less thing to worry about.
(ii) 56% of all pinging blogs are spam.
By collapsing these pings to their respective blogs, we chart the distribution of authentic blogs against splogs. These numbers have seen no change, 56% of all pinging blogs are splogs
. (iii) MySpace is now the biggest contributor to the blogosphere
. The other key driver LiveJournal and blogs managed by SixApart (as seen at their update stream
) contribute only 50-60% of what MySpace does. The growth of MySpace blogs has in fact dwarfed the growth of splogs! Further if MySpace is discounted in our analysis close to 84% of all pings are spings! Though MySpace is relatively splog free, we are beginning to notice splogs
, something blog harvesters should keep an eye on. [Note that not all blogspot blogs ping weblogs.com] (iv) Blogspot continues to be heavily spammed
. Most of this spam however is now detected by blog search engines, a point also shared by Matt Cutts
and Randy Morin
. In all of the pings we processed, 51% blogspot blogs were spam!
(v) Most spam blogs are still hosted in the US. We ranked IPs associated with spam blogs based on their frequency of pings, and located them using ARIN.
||Mountain View, CA
||San Francisco, CA
Blogspot hosts the highest number of splogs, but we also found that most of the other top hosts where physically hosted in the US. Perhaps Jonathan Bailey
knows more about the legal ramifications.
(vi) Content on .info domain continues to be a problem. 99.75% of all blogs hosted on these domains are spam. In other words 1.65 Million blogs were spam as opposed to only around 4K authentic blogs! As long these domains are cheap and keyword rich this trend is likely to continue. Sploggers are also exploiting private domain registration services (see here).
(vii) High PPC contexts remain the primary motivation to spam. We identified the top keywords associated with spam blogs and generated a tag cloud using keyword frequency.
We link these keywords to del.icio.us to depict an emerging problem that is quickly becoming serious. We posted on this recently
, though references date to quite a while back. [See related tag spam notes on MyWeb
We will continue our effort on tackling spam. Our ongoing research on spam is catalogued in our tagged splog resources, or better still check out our tutorial at ICWSM this March!