Community Detection in Twitter
May 25, 2011
Twitter has recently evolved into a source of social, political and real time information in addition to being a means of mass-communication and marketing. Monitoring and analyzing information on Twitter can lead to invaluable insights, which might otherwise be hard to get using conventional media resources. An important task in analyzing highly networked information sources like twitter is to identify communities that are formed. A community on twitter can be defined as a set of users that have more links within the set than outside it.
We present a technique to devise a similarity metric between any two users on twitter based on the similarity of their content, links and metadata. The link structure on Twitter can be characterized using the twitter notion of followers, being followed and the @Mentions, @Reply and @RT tags in tweets. Content similarity is characterized by the words in the tweets combined with the hash-tags they are annotated with. Meta-data similarity includes similarity based on other sources of user information such as location, age and gender. We then use this similarity metric to cluster users into communities using spectral and bottom-up agglomerative hierarchical clustering. We evaluate the performance of clustering using different similarity measures on different types of datasets. We also present a heuristic to find communities in twitter that take advantage of the network characteristics of twitter.
University of Maryland Baltimore County
Downloads: 2897 downloads