infochimps Amazon Machine Image for data analysis and viz

February 14th, 2009

Infochimps has registered a community image for Amazon’s Elastic Compute Cloud (EC2) designed for data processing, analysis, and visualization. Great idea!

Doing experimental computer science research requires the right infrastructure — hardware, bandwidth, software environments and data — and tacking some interesting problems requires a lot. Cloud computing services, such as EC2, are a great boon to researchers who aren’t part of a well equipped lab already set up to support just the kind of research you want to do.

EC2 allows users to instantiate a virtual computer from a saved image, called an Amazon Machine Image, or AMI. Users can configure a system with the with the operating system, software packages, and pre-loaded data they want and then save it as a shared community AMI, making it available to others.

The initial announcement, Hacking through the Amazon with a shiny new MachetEC2, says

“MachetEC2 is an effort by a group of Infochimps to create an AMI for data processing, analysis, and visualization. If you create an instance of MachetEC2, you’ll be have an environment with tools designed for working with data ready to go. You can load in your own data, grab one of our datasets, or try grabbing the data from one of Amazon’s Public Data Sets. No matter what, you’ll be hacking in minutes.

We’re taking suggestions for what software the community would be most interested in having installed on the image … When we feel that the AMI is getting too bloated, we’ll split it up: MachetEC2-ML (machine learning), MachetEC2-viz, MachetEC2-lang, MachetEC2-bio, etc.”

And a second post gave some more details:

“When you SSH into an instance of machetEC2 (brief instructions after the jump), check the README files: they describe what’s installed, how to deal with volumes and Amazon Public Datasets, and how to use X11-based applications. You can also visit the the machetEC2 GitHub page to see the full list of packages installed, the list of gems, and the list of programs installed from source.

To launch an instance of machetEC2, log into the AWS Console, click “AMIs”, search for “machetEC2″ or ami-29ef0840, and click “Launch”. If you’re on the command-line, simply run

    $ ec2-run-instances ami-29ef0840 -k [your-keypair-name]

By the time you’ve grabbed some coffee, you’ll be able to access an EC2 instance with all the tools you need for working with data already installed, configured, and ready to hack.”

This is a valuable contribution to the data wrangling community and to the larger research community as an example of what can be done. I can imagine similar community AMIs to support research on the Semantic Web, social network analyss, game development or multi-agent systems.