 | Multicore Computation Center 
Archive for the 'Multicore Computation Center' Category
February 22nd, 2008, by Tim Finin, posted in MC2, Multicore Computation Center
Sandia and Oak Ridge national laboratories have established the Institute for Advanced Architectures to work toward computers that are a million times faster than todays supercomputers.
“An exaflop is a thousand times faster than a petaflop, itself a thousand times faster than a teraflop. Teraflop computers —the first was developed 10 years ago at Sandia — currently are the state of the art. They do trillions of calculations a second. Exaflop computers would perform a million trillion calculations per second.” (link)
Initial funding of $7.4M is provided by congressional mandate from the National Nuclear Security Administration and the Department of Energy’s Office of Science.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
January 9th, 2008, by Tim Finin, posted in Google, Multicore Computation Center, Social media, Semantic Web
The latest CACM has an article by Google fellows Jeffrey Dean and Sanjay Ghemawat with interesting details on Google’s text processing engines. Niall Kennedy summarized it this way on his blog post, Google processes over 20 petabytes of data per day.
“Google currently processes over 20 petabytes of data per day through an average of 100,000 MapReduce jobs spread across its massive computing clusters. The average MapReduce job ran across approximately 400 machines in September 2007, crunching approximately 11,000 machine years in a single month.”
If big numbers numb your mind, 20 petabytes is 20,000,000,000,000,000 bytes (or 22,517,998,136,852,480 for the obsessive-compulsives among us) — enough data to fill up over five million 4G ipods a day, which, if laid end to end would …
Kevin Burton has a copy of the paper on his blog.
Jeffrey Dean and Sanjay Ghemawat, MapReduce: simplified data processing on large clusters, Communications of the ACM, pp 107-113, 51:1, January 2008.
MapReduce is a programming model and an associated implementation for processing and generating large datasets that is amenable to a broad variety of real-world tasks. Users specify the computation in terms of a map and a reduce function, and the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks. Programmers find the system easy to use: more than ten thousand distinct MapReduce programs have been implemented internally at Google over the past four years, and an average of one hundred thousand MapReduce jobs are executed on Google’s clusters every day, processing a total of more than twenty petabytes of data per day.
Dean and Ghemawat conclude their paper by summarizing the key reasons why MapReduce has worked so well for Google.
“First, the model is easy to use, even for programmers without experience with parallel and distributed systems, since it hides the details of parallelization, fault tolerance, locality optimization, and load balancing. Second, a large variety of problems are easily expressible as MapReduce computations. For example, MapReduce is used for the generation of data for Google’s production Web search service, for sorting, data mining, machine learning, and many other systems. Third, we have developed an implementation of MapReduce that scales to large clusters of machines comprising thousands of machines. The implementation makes efficient use of these machine resources and therefore is suitable for use on many of the large computational problems encountered at Google.”
Edit | Bookmark@del.icio.us | Trackback | 4 Comments »
December 26th, 2007, by Tim Finin, posted in Multicore Computation Center, NLP, AI, Semantic Web
The Web has become the repository of most the world’s pubic knowledge. Almost all of it is still bound up in text, images, audio and video, which are easy for people to understand but less accessible for machines. While the computer interpretation of visual and audio information is still challenging, text is within reach. The Web’s infrastructure makes access to all this information trivial, opening up tremendous opportunities to mine text to extract information that can be republished in a more structured representation (e.g., RDF, databases) or used by machine learning systems to discover new knowledge. Current technologies for human language understanding are far from perfect, but can harvest the low hanging fruit and are constantly improving. All that’s needed is an Internet connection and cycles — lots of them.
The latest approach to focusing lots of computing cycles on a problem is cloud computing, inspired in part by Google’s successful architecture and MapReduce software infrastructure.
Business Week had an article a few weeks ago, The Two Flavors of Google, that touches on some of the recent developments, including Hadoop and IBM and Google’s university cloud computing program. Hadoop is the produce of an Apache Lucene project that provides a Java-based software framework to distribute processing over a cluster of processors. The BW article notes
“Cutting, a 44-year-old search veteran, started developing Hadoop 18 months ago while running the nonprofit Nutch Foundation. After he later joined Yahoo, he says, the Hadoop project (named after his son’s stuffed elephant) was just “sitting in the corner.” But in short order, Yahoo saw Hadoop as a tool to enhance the operations of its own search engine and to power its own computing clouds.” (source)
and adds this significant anecdote
“In early November, for example, the tech team at The New York Times rented computing power on Amazon’s cloud and used Hadoop to convert 11 million archived articles, dating back to 1851, to digital and searchable documents. They turned around in a single day a job that otherwise would have taken months.” (source)
The NYT’s Derek Gottfrid described he process in some detail in a post on the NTY Open blog, Self-service, Prorated Super Computing Fun!.
The Hadoop Quickstart page describes how to run it on a single node, enabling any high school geek who knows Java and has a laptop to try it out before finding (or renting) time on a cluster. This is just what we need in for several upcoming projects and I am looking forward to trying it out soon. One requires processing the 1M documents in the Trec 8 collection and another the 10K documents in ACE 2008 collection.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
September 25th, 2007, by Tim Finin, posted in Multicore Computation Center, UMBC
An item on UMBC’s Multicore Computational Center was featured in engadget today. Technorati judges this to be the world’s most popular blog, receiving about one million pageviews a day. Engadget was said to get 10M on the day the iphone was released.
Last Friday UMBC held an event to launch MC**2, the UMBC Multicore Computational Center. MC**2 will focus on supercomputing research related to aerospace/defense, financial services, medical imaging and weather/climate change prediction. IBM awarded UMBC a significant gift to support the development of this new center, which researchers describe as an “orchestra” of one of the world’s most powerful supercomputing processors, the Cell Broadband Engine (Cell/B.E.). This was jointly developed by IBM, Sony, and Toshiba and is used in Sony’s PlayStation3.
Stories on engadget and WBAL skewed the facts somewhat. UMBC is not building a supercomputer out of PS3s. Rather IBM is giving UMBC a number of of their new Cell Broadband Engine processor blades that will be added to our existing IBM based beowulf system. The blades include QS20s and soon to be released QS21s. These have processors that are based on the the processor in the PS3 but with much higher performance characteristics. For example, the Qs21 has two 3.2 GHz Cell/B.E. processors, 2 GB XDR memory, integrated dual GB ethernet, and an InfiniBand adapter. One the goals in the IBM/UMBC partnership is to collaborate on exploring how cell processors can be used for business, science and engineering applications.
Edit | Bookmark@del.icio.us | Trackback | No Comments »
|  | You are currently browsing the archives for the Multicore Computation Center category.
  Home
|
Archive
|
Login
|
Feed
Recent postsThe Psychology of Social Networking on KQED Forum showStudents: brand yourself with a blogSocial Data on the Web workshop at ISWC 2008Petrini: Streaming Applications on the Cell BE Processor, 3pm 5/13 UMBCGossip-Based Outlier Detection for Mobile Ad Hoc Networks
Ebiquity communityFieldmarking data blog
Geospatial Semantic Web
Harry Chen thinks aloud
Planet social media research
Social media research blog
TrackForward by Kolari
UMBC GAIM
|  |