Map Reduce for Scientific Applications
Speaker: David Chapman
Start: Tuesday, November 17, 2009, 10:15AM
End: Tuesday, November 17, 2009, 11:30AM
Location: ITE 325 B
Abstract: In this week's Ebiquity Lab meeting David Chapman will talk about Map Reduce for Scientific Applications
Abstract:
Map Reduce is a programming paradigm popularized by google for very large
set computations. It is a meta algorithm generic enough to solve a large
number of problems. However, unless care is taken, this generality can
easily come at the price of a significant performance drop. Current
infrastructure, such as Apache Hadoop, offers a practical solution for
many data mining and text analysis problems, but leaves much performance
to be desired for the scientific domain as a whole. Science presents
several fundamentally new challenges for the paradigm. The relatively
modest data scale exaggerates any initial overhead. Multi-dimensional
data locality is not exploited by current implementations. Additional
challenges are imposed by hybrid accelerated clusters, which use various
multicore architectures to distribute massive floating point scientific
workloads. Our approach is to develop and benchmark a new Map Reduce
system using hybrid accelerated computing for science problems. We have
currently tested the Map Reduce across 8 QS22 Cell B.E. blades for the
infrared remote sensing gridding problem. As a work in progress we are
discovering novel ways to improve performance. Separately, we have
performed some elaboration as to the very definition of Map Reduce, and
ideas toward new science problems that may be possible with this paradigm.
Host: Tim Finin
,