March 1, 2005 - December 1, 2008
Weblogs, or blogs, have become an important new way to publish information, engage in discussions and form communities. The memeta project is developing a framework for representing and studying the structure and content of communities of blogs. We are particularly interested in how metadata about blogs can be extracted, discovered and computed and how that metadata can be used in the analysis of blogs and to provide new blog related services.
Examples of concrete problems we hope to be able to solve and issues we want to address are distinguishing blogs from non-blogs; recognizing spam blogs (splogs); recognizing comment spam and trackbacks; categorizing and clustering blogs; recommending blogs to people; modeling trust relationships in blog communities; and spotting trends in blog communities.
memeta's blog database is driven by a custom blog crawler that collects information on over six Million blogs.
- P. Kolari, A. Java, T. Finin, J. Mayfield, A. Joshi, and J. Martineau, "Blog Track Open Task: Spam Blog Classification", InCollection, TREC 2006 Blog Track Notebook, November 2006, 7591 downloads, 11 citations.
- P. Kolari, A. Java, T. Finin, T. Oates, and A. Joshi, "Detecting Spam Blogs: A Machine Learning Approach", InProceedings, Proceedings of the 21st National Conference on Artificial Intelligence (AAAI 2006), July 2006, 10402 downloads, 75 citations.
- P. Kolari, A. Java, and T. Finin, "Characterizing the Splogosphere", InProceedings, Proceedings of the 3rd Annual Workshop on Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 15th World Wid Web Conference, May 2006, 6753 downloads, 84 citations.
- P. Kolari, T. Finin, and A. Joshi, "SVMs for the Blogosphere: Blog Identification and Splog Detection", InProceedings, AAAI Spring Symposium on Computational Approaches to Analysing Weblogs, March 2006, 12090 downloads, 112 citations.
- P. Kolari and T. Finin, "Memeta: A Framework for Multi-Relational Analytics on the Blogosphere", InProceedings, AAAI 2006 Student Abstract Program, February 2006, 4090 downloads.