Yahoo PIG is aimed at parallel semantic search

April 28th, 2007

Yahoo PigYahoo! Research has releases an open source version of Pig, an “infrastructure to support ad-hoc analysis of very large data sets” and to do it using massively parallel processing on clusters.

We are creating infrastructure to support ad-hoc analysis of very large data sets. Parallel processing is the name of the game. Our system runs on a cluster computing architecture, on top of which sit several layers of abstraction that ultimately bring the power of parallel computing into the hands of ordinary users. The layers in between automatically translate user queries into efficient parallel evaluation plans, and orchestrate their execution on the raw cluster hardware.

As others have pointed out, Pig addesses some of the same issues as Google’s Sawzall and uses Hadoop, an open source clone of Google’s MapReduce.