Communities are central to online social media systems and detecting
their structure and membership is critical for many applications. A
community in real world is represented in a graph as a set of nodes
that are more closely related to one another than the rest of the
network. In social media, a community could be a set of blogs that are
topically related, a group of friends connected via Live Spaces or
even a set of users who share similar tags in their social bookmarks.
Graph structure has commonly been used to detect communities. However,
we can go beyond that by utilizing the special properties and
meta-data available in social media to identify such communities. For
instance, due to the sparsity and long tail structure of social graphs
it is possible to efficiently estimate communities by sampling only a
small portion of the entire graph. Another useful property of social
media datasets is the availability of tags, which provide free
meta-data. Community detection can benefit from not just how nodes
link to each other, but also what tags they use. Grouping blogs or
feeds via tags can help describe the topics that relate the set
(semantics). Communities can be a key to understanding the utility of
a certain social network and why people join it (user intentions).
By
analyzing microblogging communities in Twitter, we describe some of
the user intentions that shed light on how people are participating in
such platforms. Finally, it is typical that many blog posts are
emotionally charged. The current models treat hyperlinks as
endorsement. We describe how sentiment information around the link
provides clues to it's polarity and can be used to identify influence
and bias in social media. There are several applications that can
benefit from these techniques: business intelligence, social
recommendation, filtering tools and advertising; to name a few.
Since social graphs are extremely huge and we are dealing with
vast amounts of real-time data, we are exploring two approaches:
one to develop efficient approximation approaches, and another that
seeks to leverage the power of cell.