Yahoo! Research has releases an open source version of Pig, an “infrastructure to support ad-hoc analysis of very large data sets” and to do it using massively parallel processing on clusters.
We are creating infrastructure to support ad-hoc analysis of very large data sets. Parallel processing is the name of the game. Our system runs on a cluster computing architecture, on top of which sit several layers of abstraction that ultimately bring the power of parallel computing into the hands of ordinary users. The layers in between automatically translate user queries into efficient parallel evaluation plans, and orchestrate their execution on the raw cluster hardware.