Archive for the 'MC2' Category
September 11th, 2010, by Tim Finin, posted in cloud computing, High performance computing, MC2
UMBC’s Multicore Computational Center will host the Second Workshop on Frontiers of Multi-Core Computing on 22-23 September 2010. The workshop will involve a wide range of people from universities, industry and government who will exchange ideas, discuss issues, and develop the strategies for coping with the challenges of parallel and multicore computing.
“Multi- (e.g., Intel Westmere and IBM Power7) and many-core (e.g., NVIDIA Tesla and AMD FireStream GPUs) microprocessors are enabling more compute- and data-intensive computation in desktop computers, clusters, and leadership supercomputers. However efficient utilization of these microprocessors is still a very challenging issue. Their differing architectures require significantly different programming paradigms when adapting real-world applications. The actual porting costs are actively debated, as well as the relative performance between GPUs and CPUs.”
The workshop is free but those interested should register online. See the workshop schedule for details on presentations and timing.
June 15th, 2009, by Tim Finin, posted in cloud computing, High performance computing, MC2
Joab Jackson (UMBC ’90) wrote a nice article on UMBC’s Multicore Computational Center for the current issue of UMBC Magazine. From The Power of Parallels:
“In July 2007, IBM gave UMBC computer science professors Milton Halem and Yelena Yesha a grant to launch the center with cash and equipment that have totaled more than $1 million over the past three years. Supporting funding from NASA also helped the effort.
“Not only are we ahead of the curve,” says Charles Nicholas, chair of the department of computer science and electrical engineering, “but we hope to stay ahead of the curve…. The partnerships with IBM will let us keep the technologies up to date.”
Halem says that government and private enterprise are in dire need of “trained graduate students who know how to apply the new methods of parallel programming to the problems they face,” Halem says. “We’re one of the few schools in the nation that is teaching these courses.”
April 9th, 2009, by Tim Finin, posted in cloud computing, High performance computing, MC2, Multicore Computation Center, Programming, Semantic Web
Hadoop has become one of the most popular frameworks to exploit parallelism on a computing cluster. You don’t actually need access to a cluster to try Hadoop, learn how to use it, and develop code to solve your own problems.
UMBC Ph.D student Vlad Korolev has written an excellent tutorial, Hadoop on Windows with Eclipse, showing how to install and use Hadoop on a single computer running Microsoft Windows. It also covers the Eclipse Hadoop plugin, which enables you to create and run Hadoop projects from Eclipse. In addition to step by step instructions, the tutorial has short videos documenting the process.
If you want to explore Hadoop and are comfortable developing Java programs in Eclipse on a Windows box, this tutorial will get you going. Once you have mastered Hadoop and had developed your first project using it, you can go about finding a cluster to run it on.
April 7th, 2009, by Tim Finin, posted in cloud computing, High performance computing, MC2
In tomorrow’s ebiquity meeting (10 am EDT Wed, April 8), PhD student David Chapman will talk about his work on Map Reduce on Heterogeneous Multi-Core Clusters. From the abstract:
“We have extended the Map Reduce programming paradigm to clusters with multicore accelerators. Map Reduce is a simple programming programming model designed for parallel computations with large distributed datasets. Google has reinforced the practical effectiveness of this approach with over 1000 commercial Map Reduce applications. Typical Map Reduce implementations, such as Apache Hadoop exploit parallel file systems for use in homogeneous clusters. Unfortunately, the multicore accelerators such as Cell B.E. used in modern supercomputers such as Roadrunner require additional layers of parallelism, which cannot be addressed from parallel file systems alone. Related work has explored Map Reduce on a single Cell B.E. accelerator machine using hash and sort based techniques. We are incorporating techniques from Apache Hadoop as well as early multicore Map Reduce research to produce an implementation optimized for a hybrid multicore cluster. We are evaluating our implementation on a cluster of 24 of Cell Q series nodes, and and 48 multicore PowerPC J series nodes at the UMBC Multicore Computational Center.”
We will stream the talk live and share the raw recording.
March 18th, 2009, by Tim Finin, posted in cloud computing, Google, High performance computing, MC2, Multicore Computation Center, Semantic Web, Social media
We are early in the era of big data (including social and/or semantic) and more and more of us need the tools to handle it. Monday’s NYT had a story, Hadoop, a Free Software Program, Finds Uses Beyond Search, on Hadoop and Cloudera, a new startup that offering its own Hadoop distribution that is designed to beasier to install and configure.
“In the span of just a couple of years, Hadoop, a free software program named after a toy elephant, has taken over some of the world’s biggest Web sites. It controls the top search engines and determines the ads displayed next to the results. It decides what people see on Yahoo’s homepage and finds long-lost friends on Facebook.”
Three top engineers from Google, Yahoo and Facebook, along with a former executive from Oracle, are betting it will. They announced a start-up Monday called Cloudera, based in Burlingame, Calif., that will try to bring Hadoop’s capabilities to industries as far afield as genomics, retailing and finance. The company has just released its own version of Hadoop. The software remains free, but Cloudera hopes to make money selling support and consulting services for the software. It has only a few customers, but it wants to attract biotech, oil and gas, retail and insurance customers to the idea of making more out of their information for less.
Cloudera’s distribution, curently based on Hadoop v0.18.3, uses RPM and comes with a Web-based configuration aide. The company also offers some free basic training in mapReduce concepts, using Hadoop, developing appropriate algorithms and using Hive.
February 8th, 2009, by Tim Finin, posted in cloud computing, Database, High performance computing, MC2
A Hadoop User Group (HUG) has formed for the Washington DC area via meetup.com.
“We’re a group of Hadoop & Cloud Computing technologists / enthusiasts / curious people who discuss emerging technologies, Hadoop & related software development (HBase, Hypertable, PIG, etc). Come learn from each other, meet nice people, have some food/drink.”
The group defines it’s geographic location as Columbia MD and their first HUG meetup was held last Wednesday at the BWI Hampton Inn. In addition to informal social interactions, it featured two presentations:
- Amir Youssefi from Yahoo! presented an overview of Hadoop. Amir is a member of the Cloud Computing and Data Infrastructure group at Yahoo!, and will be discussing Multi-Dataset Processing (Joins) using Hadoop and Hadoop Table.
- Introduction to complex, fault tolerant data processing workflows using Cascading and Hadoop by Scott Godwin & Bill Oley
If you’re in Maryland and interested you can join the group at meetup.com and get announcements for future meetings. It might provide a good way to learn more about new software to exploit computing clusters and cloud computing.
(Thanks to Chris Diehl for alerting me to this)
January 2nd, 2009, by Tim Finin, posted in cloud computing, MC2, Programming
The amount of free, interesting, and useful data is growing explosively. Luckily, computer are getting cheaper as we speak, they are all connected with a robust communication infrastructure, and software for analyzing data is better than ever. That’s why everyone is interested in easy to use frameworks like MapReduce for every-day programmers to run their data crunching in parallel.
octo.py is a very simple MapReduce like system inspired by Ruby’s Starfish.
“Octo.py doesn’t aim to meet all your distributed computing needs, but its simple approach is amendable to a large proportion of parallelizable tasks. If your code has a for-loop, there’s a good chance that you can make it distributed with just a few small changes. If you’re already using Python’s map() and reduce() functions, the changes needed are trivial!”
triangular.py is the simple example given in the documentation that is used with octo.py to compute the first 100 triangular numbers.
# triangular.py compute first 100 triangular numbers. Do
# 'octo.py server triangular.py' on server with address IP
# and 'octo.py client IP' on each client. Server uses source
# & final, sends tasks to clients, integrates results. Clients
# get tasks from server, use mapfn & reducefn, return results.
source = dict(zip(range(100), range(100)))
def final(key, value):
print key, value
def mapfn(key, value):
for i in range(value + 1):
yield key, i
def reducefn(key, value):
Put octo.py on all of the machines you want to use. On the machine you will use as a server (with ip address <ip>), also install triangular.py, and then execute:
python octo.py server triangular.py &
On each of your clients, run
python octo.py client <ip> &
You can try this out using the same machine to run the server process and one or more client processes, of course.
When the clients register with the server, they will get a copy of triangular.py and wait for tasks from the server. The server access the data from source and distributed tasks to the clients. These in turn use mapfn and reducefn to complete the tasks, returning the results. The server integrates these and, when all have completed, invokes final, which in this case just prints the answers, and halts. The clients continue to run, waiting for more tasks to do.
Octo.py is not a replacement for more sophisticated frameworks like Hadoop or Disco but if you are working in Python, its KISS approach is a good way to get started with the MapReduce paradigm and might be all you need for a small projects.
(Note: The package has not been updated since April 2008, so it’s status is not clear. But further development would run the risk of making it more complex and would be self-defeating.)
December 9th, 2008, by Tim Finin, posted in cloud computing, High performance computing, MC2, Multicore Computation Center, Programming
There’s a very interesting late addition to UMBC’s spring schedule — CMSC 491/691A, a special topics class on parallel programming. Programming multi-core and cell-based processors is likely to be an important skill in the coming years, especially for systems that require high performance such as those involving scientific computing, graphics and interactive games.
The class will meet Tu/Thr from 7:00pm to 8:15pm in the “Game Lab” in ECS 005A and will be taught by research professors John Dorband and Shujia Zhou. Both are very experienced in high-performance and parallel programming. Professor Dorband helped to design and build the first Beowulf cluster computer in the mid 1990s when he worked at the NASA’s Goddard Space Flight Center. Shujia Zhou has worked at Northrop Grumman and NASA/Goddard on a wide range of projects using high-performance and parallel computing for climate modeling and simulation.
CMSC 491/691a Special Topics in Computer Science:
Introduction to parallel computing emphasizing the
use of the IBM Cell B.E.
3 credits. Grade Method: REG/P-F/AUD Course meets in
ENG 005A. Prerequisites: CMSC 345 and CMSC 313 or
permission of instructor.
[7735/7736] 0101 TuTh 7:00pm- 8:15pm
August 18th, 2008, by Tim Finin, posted in cloud computing, MC2, Multicore Computation Center, UMBC
The UMBC Multicore Computation Center is hosting a free workshop on Frontiers of Multicore Computing 26-28 August 2008 at UMBC. The workshop will feature leading computational researchers who will share their current experiences with multicore applications. A number of computer architects and major vendors have also been invited to describe their road maps to near and long-term future system developments. The FMC workshop will focus on applications in the fields of geosciences, aerospace, defense, interactive digital media and bioinformatics. The workshop has no registration fees but you must register to attend. More information regarding hotel accommodations, tutorials, exhibits and access to the campus can also be found at the website.
Members of the UMBC ebiquity lab will make presentations on our current and planned use of multicore and cloud computing for research in exploiting Wikipedia as as knowledge base and also in extracting communities from very large social network graphs.
August 7th, 2008, by Anupam Joshi, posted in CS, GENERAL, High performance computing, MC2, Multicore Computation Center, Programming
My colleague Marc Olano recently blogged about the new Larrabee chip from Intel, which will be described in a SIGGRAPH paper in a session he is chairing. This chip, with multiple old Pentium type cores running at 1GHz, seems a logical culmination of the recent multi/many core trend. IBM’s plans with the Cell/BE, and perhaps with the newer generation Power Chips, are also headed in a similar direction. Short of material scientists doing some magic with high K dielectrics or airgaps or CNFETs or whatever, the trend seems to be away from a single CPU with more transistors running faster and faster to multicored chips not clocked very fast. There’s a good reason for it (heat), as anyone who’s had a high end laptop and actually put it on their laps can testify. Further down the road, even more complex parallel architectures are proposed, with MCMs on chip connecting optically, and perhaps even memory stacked on top of the CPU layer talking optically back and forth! In other words, a few years down the road, the default box on which a system builder will write code will be something other than a single cored CPU. Bernie Meyerson from IBM discusses such issues in his talks — I can’t lay my hands on a publicly available power point, but some of the ideas are discussed in a recent interview.
Do these developments mean that we should be rethinking Programming 1 and 2, especially for CS majors. Do students now need to think parallel or multi-threaded programming from day one? Can that be done without first doing standard imperative programming? Given the less than ideal state of high school CS education, is it realistic to expect that students will get Programming 1 (and maybe 2) in high school? In our department, we’re offering class on programming the Cell/BE, and a course related to GPU programming, but those are typically meant for seniors. How about courses further upstream. Should data structures and algorithms change — maybe concepts like transactional memory need to be introduced ? Should OS change — talk much more about virtualization, and redoing virtual memory when ample NVRAM is available and accessible from a core ?
May 5th, 2008, by Tim Finin, posted in GENERAL, High performance computing, MC2
Next Monday (3:00pm, May 13), Fabrizio Petrini will visit and give a presentation on Streaming Applications on the Cell B.E. Processor. Here’s the abstract:
“We increasingly need to process large and complex data volumes to enable near-real-time informed human decisions or automated response actions. Current limitations in I/O and processing capabilities hinder the timely acquisition, processing, and presentation information to decision makers for rapid response. Multi-core processors, such as the Cell B.E. processor, provide an unprecedented computational capability to curb this data deluge. In this talk I will describe the challenge in designing new data streaming algorithms for multi-core processors and and present some recent results obtained with the Cell B.E. processor.”
May 3rd, 2008, by Tim Finin, posted in Earth science, High performance computing, MC2
David Chapman will defend his MS thesis, A General Algorithm for Gridding Earth Sensing Scanning Instruments, at 10:00am Monday May 5 in room 325 ITE. The abstract is below.
Gridding in remote sensing must re-project observations from their original coordinate system based on satellite orbit and attitude to a grid defined by Earth coordinates. Primitive methods assume that observations are located at points on Earth and typically average observations in grid cells, or interpolate geolocated observations. These approaches are inaccurate, because they do not make use of the instrumentâ€™s footprint geometry, and spatial response. Observation Coverage (Obscov) gridding techniques make use of the satellite optics and geometry to more accurately describe coverage of a footprint on within each grid cell. Obscov gridding provides significant accuracy improvements exceeding 1 Kelvin Brightness Temperature over most regions on Earth for a 12 micron window channel on-board the Atmospheric Infrared Sounder (AIRS). Existing Obscov algorithms are only applicable to specific instruments and depend heavily on implicitly defined spatial response functions. We make use of raycasting and adaptive grid numerical integration to compute Obscov for the spatial response function of any instrument while processing streaming satellite observation data faster than 400 Megabits/second on a 6 machine cluster. We discuss the quality benefits of our algorithm by analyzing the results of gridded AIRS infrared sensor data with 324 operational spectral channels. We also address parallel processing issues to integrate AIRS Obscov gridding with SOAR, an on demand climate processing system built on a 122 processor blade server.