UMBC ebiquity

Detecting Domain Shift

Description: Machine learning systems are typically trained in the lab and then deployed in the wild. But what happens when the data to which they are exposed in the wild change in a way that hurts accuracy? For example, a system may be trained to classify movie reviews as either positive or negative (i.e., sentiment classification), but over time book reviews get mixed into the data stream. The problem of responding to such changes when they are known to have occurred has been studied extensively. In this talk I will describe recent work (with Mark Dredze and Christine Piatko) on the problem of automatically detecting such domain changes. We assume only a stream of unlabeled examples and use a measure of the difference between probability distributions called the A-distance applied to margin values from large margin classifiers (such as support vector machines) to detect significant changes. I will describe the application domain, which is statistical natural language processing, the approach, experiments on a variety of corpora and with a variety of tasks, and a theoretical analysis of the A-distance that is used to automatically select parameters for the algorithm.

Type: Presentation

Authors: Tim Oates

Date: September 03, 2010

Tags: natural language processing, information extraction, learning

Format: Microsoft PowerPoint (Need a reader? Get one here)

Number of downloads: 465

Access Control: Publicly Available

 

Available for download as


size: 4911104 bytes
 

Assertions:

  1. (Resource) Detecting Domain Shift is the PowerPoint slides of (Event) Detecting Domain Shift