We recently analyzed data from three different sources: Bloglines, which manages feeds subscribed by users, a sample of Blogpulse index made available for the WWW Weblogging Ecosystems Workshop and Blogwise, a popular blog directory.
Bloglines has more than 83,000 publicly listed users who subscribe to about 2,786,687 feeds in all, of which aboout 496,893 are unique. These are feeds that matter since they have been actually subscribed by some users. The above chart shows the top domains from these feeds. It is interesting to note that Blogspot contributes to 45% of the feeds that matter followed by Xanga and Flickr. We also see a substantial presence of web 2.0 sites such as Flickr, del.icio.us, technorati, etc that provide their content in RSS.
The Blogpulse data contains 1.3 Million blogs from a 21 day period. 50% of the top domains are contributed by livejournal and most of the domains are those of blog hosting sites. More analysis of this data could be found in the paper on “Characterizing the Splogosphere“. A related post by Matthew Hurst talks about community structure on the blogosphere that goes across different domains. Also compare this with last year’s post on ranking blog hosts and other related posts here and here.
While this data only provides a sample of the blogpulse index, it shows a very interesting difference in content indexed by blog search engines and the feeds that users actually subscribe to in bloglines. Its understandable that there is a difference, blog search engines should also cater to collective mining for trends, and sources like livejournal render themselves well here.
Blogwise is a blog directory that has a relatively small index of 71,252 blogs most of which are contributed by Blogger. The rest of the domains are mostly from blog hosting sites.
- Based on bloglines user subscriptions, even though Blogspot has had serious splog issues, Blogspot still contributes to a significant portion of the feeds that matter on the blogosphere.
- A number of bloglines users subscribe to Web 2.0 sites and dynamically generated RSS feeds over customized queries.
- Finally, in any index of the blogosphere, the number of blogs that are indexed may not be as important as indexing the feeds that really matter to the user.
Thanks to Pranam Kolari for ideas and help with this post. Also Bloglines, Blogpulse and Blogwise for publicly making some of their data available.