Welcome to the Splogosphere: 75% of new pings are spings (splogs)

December 15th, 2005

In the blogosphere, pings are notifications sent by updated blogs to PingServers. A major issue recently has been unjustified pings, also known as Spings, sent by Splogs. Splogs have been discussed a lot recently, including an interesting thread on post piracy that Steve Rubel initiated on Micropersuasion.

The problem of splogs prompted us to analyze pings from weblogs.com, which publishes hourly pings as changes.xml. We have been collecting these pings over the last 4 weeks for a total of 40 million pings from around 14 million (so claimed) blogs. To begin with, we applied a language identification technique implemented by James Mayfield to identify language by fetching these blogs. As expected most of the pings were from blogs authored in English. But we were able to identify blogs from many other languages as well. For instance, charts below show a distribution of pings from blogs authored in Italian — over a day and over a week. Each bar denotes the number of pings per hour.


Pings over a day
Pings over 8 days

All times are in GMT; clearly Italian authored blogs display a specific blogging pattern.

In the next step we used our work on splog detection to detect splogs (and hence spings) among the english blogs. Our detection mechanism is close to 90% accurate. As shown in the charts below pings from blogs average around 8K per hour and those from splogs average around 25K.


Blog Pings
Splog Pings

Clearly almost 3 out of 4 pings are spings! Going back further to the source of these spings, we observed that more than 50% of claimed blogs pinging weblogs.com are splogs.

Based on the interestingness of this preliminary statistics, scope for further analysis and interest in the resulting dataset we decided to continuosly monitor the pingosphere. So, we now do it “live” on updated blogs published by weblogs.com(delayed by an hour), and have made it publicly available at http://memeta.umbc.edu. The site lists blogging patterns for many other languages, and compares splogs with blogs. All of our work is part of a larger project memeta, towards analyzing the content and structure of the blogosphere.

We hope our effort is a good complement to existing services (e.g., FightSplog, SplogReporter and SplogSpot) towards combating splogs. We currently publish only simple ping statistics on this site, but do stay tuned for fresh splog and classified blog dumps and much more!

UPDATE: Matthew Hurst from BlogPulse points us to an interesting analysis he has done on a day of weblogs.com pings.