UMBC ebiquity
Search the Enron email corpus online

Search the Enron email corpus online

Tim Finin, 11:03am 5 February 2006

The enron email corpus is a collection of hundreds of thousands of email messages from the infamous Enron corporation that researchers have been using to improve and evaluate techniques for analyzing email, e.g., NLP analysis, information extraction, sentiment detection, social network analysis, information flow, etc. It’s become important because it is the only substantial collection of real email that is public. In the ebiquity lab, for example, Akshay Java has worked with UMBC’s Institute for Language and Information Technologies to bring to bear their NLP technology on the messages.

InBoxer has put up an Enron Email site that lets anyone explore and search the collection on the Web. InBoxer is not a research group, but a company that sells an “anti-risk appliance” that is used to detect when email that is about to be sent or has been sent violates policy. (There should be a good market for this in the Government, too!).

You can also surf the corpus via a simple database interface at UC Berkeley.

William Cohen of CMU describes the collection:

This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation. … The dataset here does not include attachments, and some messages have been deleted “as part of a redaction effort due to requests from affected employees”.

Now it’s convenient to explore corporate malfeasance on the Web.


2 Responses to “Search the Enron email corpus online”

  1. mayfield Says:

    The Enron collection is an incredibly valuable resource for research. But not everyone whose email appears in the collection is a criminal, and none of them asked for or authorized their personal emails to be put on display (they came into the public domain because they were subpoenaed, and the records of the court case are now public). InBoxer has a couple of pretty vile contests up on their site, the likes of which I’d encourage ebiquitons to eschew in their own use of the data.

  2. Tim Finin Says:

    I’ll admit that I’m withholding my opinion on InBoxer. It’s ironic that they offer a product that’s sold to companies some of whom are motivated, I imagine, by a desire to be less transparent and vulnerable if they are accused of criminal or other wrong behavior.