UMBC ebiquity

Detecting Domain Shift

Speaker: Tim Oates

Start: Friday, September 03, 2010, 11:00AM

End: Tuesday, November 30, 1999, 12:00AM

Location: 325b ITE, UMBC

Abstract: Machine learning systems are typically trained in the lab and then deployed in the wild. But what happens when the data to which they are exposed in the wild change in a way that hurts accuracy? For example, a system may be trained to classify movie reviews as either positive or negative (i.e., sentiment classification), but over time book reviews get mixed into the data stream. The problem of responding to such changes when they are known to have occurred has been studied extensively. In this talk I will describe recent work (with Mark Dredze and Christine Piatko) on the problem of automatically detecting such domain changes. We assume only a stream of unlabeled examples and use a measure of the difference between probability distributions called the A-distance applied to margin values from large margin classifiers (such as support vector machines) to detect significant changes. I will describe the application domain, which is statistical natural language processing, the approach, experiments on a variety of corpora and with a variety of tasks, and a theoretical analysis of the A-distance that is used to automatically select parameters for the algorithm.

Tags: learning, natural language processing, language

Host: Tim Finin

 

Assertions:

  1. (Event) Detecting Domain Shift has PowerPoint slides (Resource) Detecting Domain Shift
,