| Building intelligent systems in open, heterogeneous, dynamic, distributed environments |
| Description: This dataset consists of 3000 blog homepages, out of which 700 have been labeled as splogs, and another 700 as authentic blogs. This training set was used in results of three papers, with emphasis on identifying blogs [1], on detecting spam blogs [2], and on analysing the splogosphere [3]. This collection can be used in further experimenting with splogs, or for building filters that could be deployed in real world systems. We, and our academic and industrial collaborators have been using such filters to eliminate spam blogs, with good results. [1] Pranam Kolari, Tim Finin, Anupam Joshi, SVMs for the Blogosphere: Blog Identification and Splog Detection, AAAI Spring Symposium on Computational Approaches to Analysing Weblogs, March 2006 [2] Pranam Kolari, Akshay Java, Tim Finin, Tim Oates, Anupam Joshi, Detecting Spam Blogs: A Machine Learning Approach, Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), July 2006 [3] Pranam Kolari, Akshay Java, Tim Finin, Characterizing the Splogosphere, 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wid Web Conference, May 2006 Type: Dataset Authors: Pranam Kolari, Akshay Java, and Anupam Joshi Date: November 14, 2006 Format: TAR.GZIP Compressed (Need an extractor? Get one here) Number of downloads: 1517 Access Control: Publicly Available Available for download as
|
| Home | About Us | Contact Us | Site Map | Legal | Privacy Copyright © 1999-2009 UMBC ebiquity research group. Copyright © 2003-2009 Site design and RGB engine code by Filip Perich. XG Page gen 0.020 sec. |