Map Reduce for Scientific Applications

Tuesday, November 17, 2009, 10:15am - Tuesday, November 17, 2009, 11:30am

ITE 325 B

In this week's Ebiquity Lab meeting David Chapman will talk about Map Reduce for Scientific Applications

Abstract:
Map Reduce is a programming paradigm popularized by google for very large set computations. It is a meta algorithm generic enough to solve a large number of problems. However, unless care is taken, this generality can easily come at the price of a significant performance drop. Current infrastructure, such as Apache Hadoop, offers a practical solution for many data mining and text analysis problems, but leaves much performance to be desired for the scientific domain as a whole. Science presents several fundamentally new challenges for the paradigm. The relatively modest data scale exaggerates any initial overhead. Multi-dimensional data locality is not exploited by current implementations. Additional challenges are imposed by hybrid accelerated clusters, which use various multicore architectures to distribute massive floating point scientific workloads. Our approach is to develop and benchmark a new Map Reduce system using hybrid accelerated computing for science problems. We have currently tested the Map Reduce across 8 QS22 Cell B.E. blades for the infrared remote sensing gridding problem. As a work in progress we are discovering novel ways to improve performance. Separately, we have performed some elaboration as to the very definition of Map Reduce, and ideas toward new science problems that may be possible with this paradigm.

Hosted by: Tim Finin

OWL Tweet