Estimating Temporal Boundaries For Events Using Social Media Data
Wednesday, June 15, 2011, 10:00am - Wednesday, June 15, 2011, 12:00pm
325b ITE, UMBC
MS Thesis Defense
Social media websites like Twitter, Flickr and YouTube generate a high volume of user generated content as a major event occurs. Our goal is to automatically determine as accurately as possible when an event starts and when it ends by analyzing the content of social media data. Estimating these temporal boundaries segments the event-related data into three major phases: the buildup to the event, the event itself, and the post-event effects and repercussions.
We describe a technique that estimates the temporal boundaries of anticipated events and helps to monitor changes as events unfold. In our approach we train a multiclass support vector machine (SVM) to classify event data into the aforementioned phases. We then discuss an algorithm for choosing the two class boundaries, such that the total error is minimized. We apply our technique to six events - Hurricane Igor (2010), Superbowl XLV (2011), three games from ICC Cricket World Cup 2011 and the Royal Wedding (2011). We train individual classifies for each of these events. Finally we train a general classifier and compare its performance with the individual classifiers.
The contributions of this research are presenting a set of features for detecting temporal boundaries of events, determining a reasonable value of tradeoff parameter for multiclass SVMs, evaluating the effect of smoothing SVM predictions using sliding window of different sizes and presenting the results of our approach on real event data gathered from Twitter. Our approach can potentially be used to detect the presence and scope of significant sub-events occurring during the course of an event. When applied to natural disasters and man-made disturbances, the derived data can help organizations involved in mediation efforts to track and analyze evolving events.