Data Clustering with a Relational Push-Pull Model
by Adam Anthony
Monday, April 30, 2007, 10:00am - Monday, April 30, 2007, 12:00pm
Relational data clustering is the task of grouping data objects together when both features and relations between objects are present. I present a new generative model for relational data in which relations between objects can have either a binding or separating effect. For example, with a group of students separated into gender clusters, a "dating" relation would appear most frequently between the clusters, but a "roommate" relation would appear more often within clusters. In visualizing these relations, one can imagine that the "dating" relation effectively pushes clusters apart, while the "roommate" relation pulls clusters into tighter formations. I use simulated annealing to search for optimal values of the unknown model parameters, where the objective function is a Bayesian score derived from the generative model. Specifically, I show that an assumption that relations should most frequently appear within clusters can lead to poor performance, using experiments with artificial data and two real-world data sets: a Hollywood actor database and an ecological food web. The experiments show that push-type relations do exist, and therefore the tendency of relations to pull clusters together cannot be assumed in general.