UMBC ebiquity
High performance computing

Archive for the 'High performance computing' Category

Tracking Provenance and Reproducibility of Big Data Experiments

February 8th, 2014, by Tim Finin, posted in Big data, High performance computing, Ontologies, Semantic Web

In the first Ebiquity meeting of the semester, Vlad Korolev will talk about his work on using RDF for to capture, represent and use provenance information for big data experiments.

PROB: A tool for Tracking Provenance and Reproducibility of Big Data Experiments

10-11:30am, ITE346, UMBC

Reproducibility of computations and data provenance are very important goals to achieve in order to improve the quality of one’s research. Unfortunately, despite some efforts made in the past, it is still very hard to reproduce computational experiments with high degree of certainty. The Big Data phenomenon in recent years makes this goal even harder to achieve. In this work, we propose a tool that aids researchers to improve reproducibility of their experiments through automated keeping of provenance records.

Public tutorials on high performance computing research and technologies

December 13th, 2012, by Tim Finin, posted in cloud computing, High performance computing, Machine Learning

 

The Center for Hybrid Multicore Productivity Research is a collaborative research center sponsored by the National Science Foundation with two university partners (UMBC and University of California San Diego), six government, and seven industry members. The Center's research is focused on addressing productivity, performance, and scalability issues in meeting the insatiable computational demands of its members' applications through the continuous evolution of multicore architectures and open source tools.

As part of its annual industrial advisory board meeting next week, the center will hold an afternoon of public tutorials from 1:00pm to 4:00pm on Monday, 17 December 2012 in room 456 of the ITE building at UMBC. The tutorials will be presented by students doing research sponsored by the Center and feature some of the underlying technologies being used and some of their applications. The tutorials are:

  • GPGPUs – Tim Blattner and Fahad Zafa
  • Cloud Policies – Karuna Joshi
  • Human Sensors Networks – Oleg Aulov
  • Machine Learning Disaster Warnings – Han Dong
  • Graph 500 – Tyler Simon
  • HBase – Phuong Nyguen

The tutorial talks are free and open to the public. If you plan to attend, please RSVP by email to Dr. Valerie L. Thomas, valeriet@umbc.edu.

Make mincemeat out of MapReduce with Python

October 1st, 2011, by Tim Finin, posted in cloud computing, High performance computing

mincemeat.py is a super-lightweight, open source Python implementation of the popular MapReduce distributed computing framework that only depend on the Python Standard Library.

Just install the single source file on a set of machines and invoke the script on them with a password (for authentication) and the IP address of the host and your workers are good to go. Then, using the same package, run simple server program that defines map, reduce and your data source.

While it’s only 350 lines of Python, the package looks great for teaching or experimenting with the MapReduce concept as well as being potentially useful if you work in Python.

Programming with Hadoop: a hands on introduction

September 20th, 2011, by Tim Finin, posted in High performance computing

In this week’s ebiquity meeting (10:30am Tue 9/20 in ITE 325b) we will dive right into writing MapReduce programs, and we skip all the gory details about Hadoop setup and MapReduce theory. In one hour, we will write a MapReduce Java program using Eclipse to create an inverted-index, test it on a local box, and run it on an already set up Hadoop cluster. If we have time, we will also see how to do the same using Python instead of Java.

You are encouraged to do the following before the meeting if you want to code along.

  • Review the Yahoo Introduction to MapReduce tutorial
  • Download a free virtual machine image with Hadoop pre-installed, so you can get started quickly. Options are available for Linux, Windows and Mac OS X.
  • Make sure you have JDK 1.6x and Eclipse (or your favourite IDE) installed on your laptop.

Addenda (9/19):

  • If you are planning to code along during the demo, download the latest stable release of Hadoop (0.20.2)
  • Some people have been having problems with Cloudera’s 64 bit VM image. If you do, try this virtual machine from Yahoo Developer Network that contains a pre-installed hadoop 0.20.
  • Even if you are not able to get the VM running for now, you can still run the program(s) locally on your laptop using Eclipse.

CloudCamp Baltimore, 6-10pm Wed Mar 9, 2011

February 24th, 2011, by Tim Finin, posted in cloud computing, High performance computing

There will be a free CloudCamp meeting in Baltimore from 6:000pm to 10:00pm Wednesday March 9th at the Baltimore Marriott Waterfront. Cloudcamps are participants-driven unconferences where users of Cloud Computing technologies meet to network and share ideas, experiences, challenges and solutions. The event is free but participants are asked to register to ensure there is enough food and refreshments.

CloudCampHere is the current, tentative schedule:

6:00pm – Registration & Networking (food/drink)
6:30pm – Opening Introductions
6:45pm – Lightning Talks (5 minutes each)
7:30pm – Unpanel
8:00pm – Organize Unconference
8:15pm – Unconference Breakout Session Round 1
9:00pm – Unconference Breakout Session Round 2
9:45pm – Wrap-up
10:00pm – Find somewhere for post-event networking

Contact the organizers if you are interested in giving a five minute lightning talk or lead breakout session.

Chinese Tianhe-1A is fastest supercomputer

October 28th, 2010, by Tim Finin, posted in High performance computing, Multicore Computation Center

Tianhe-1AChina’s Tianhe-1A is being recognized as the world’s fastest supercomputer. It has 7168 NVIDIA Tesla GPUs and achieved a Linpack score of 2.507 petaflops, a 40% speedup over Oak Ridge National Lab’s Jaguar, the previous top machine. Today’s WSJ has an article,

“Supercomputers are massive machines that help tackle the toughest scientific problems, including simulating commercial products like new drugs as well as defense-related applications such as weapons design and breaking codes. The field has long been led by U.S. technology companies and national laboratories, which operate systems that have consistently topped lists of the fastest machines in the world.

But Nvidia says the new system in Tianjin—which is being formally announced Thursday at an event in China—was able to reach 2.5 petaflops. That is a measure of calculating speed ordinarily translated into a thousand trillion operations per second. It is more than 40% higher than the mark set last June by a system called Jaguar at Oak Ridge National Laboratory that previously stood at No. 1 on a twice-yearly ranking of the 500 fastest supercomputers.”

The NYT and HPCwire also have good overview articles. The HPC article points out that the Tianhe-1A has a relatively low Linpack efficiency compaed to the Jaguar.

“Although the Linpack performance is a stunning 2.5 petaflops, the system left a lot of potential FLOPS in the machine. Its peak performance is 4.7 petaflops, yielding a Linpack efficiency of just over 50 percent. To date, this is a rather typical Linpack yield for GPGPU-accelerated supers. Because the GPUs are stuck on the relatively slow PCIe bus, the overhead of sending calculations to the graphics processors chews up quite a few cycles on both the CPUs and GPUs. By contrast, the CPU-only Jaguar has a Linpack/peak efficiency of 75 percent. Even so, Tianhe-1A draws just 4 megawatts of power, while Jaguar uses nearly 7 megawatts and yields 30 percent less Linpack.

The (unofficial) “official” list of the fastest supercomputers is TOP500 which seems to be inaccessible at the moment, due no doubt to the heavy load caused by the news stories above. The TOP500 list is due for a refresh next month.

UMBC hosts Frontiers of Multi-Core Computing Workshop

September 11th, 2010, by Tim Finin, posted in cloud computing, High performance computing, MC2

UMBC’s Multicore Computational Center will host the Second Workshop on Frontiers of Multi-Core Computing on 22-23 September 2010. The workshop will involve a wide range of people from universities, industry and government who will exchange ideas, discuss issues, and develop the strategies for coping with the challenges of parallel and multicore computing.

“Multi- (e.g., Intel Westmere and IBM Power7) and many-core (e.g., NVIDIA Tesla and AMD FireStream GPUs) microprocessors are enabling more compute- and data-intensive computation in desktop computers, clusters, and leadership supercomputers. However efficient utilization of these microprocessors is still a very challenging issue. Their differing architectures require significantly different programming paradigms when adapting real-world applications. The actual porting costs are actively debated, as well as the relative performance between GPUs and CPUs.”

The workshop is free but those interested should register online. See the workshop schedule for details on presentations and timing.

Can cloud computing be entirely trusted?

November 10th, 2009, by Tim Finin, posted in High performance computing, Privacy, Security, Semantic Web

The Economist has been running a series of online Oxford Union style debates on topical issues — CEO pay, healthcare, climate change, etc. The latest one is on the cloud computing: This house believes that the cloud can’t be entirely trusted.

In his opening remarks, moderator Ludwig Siegele says

“The participants in this debate, including the three guest speakers, all agree that computing is moving into the cloud. “We are experiencing a disruptive moment in the history of technology, with the expansion of the role of the internet and the advent of cloud-based computing”, says Stephen Elop, president of Microsoft’s business division, which generates about a third of the firm’s revenues ($13 billion) and more than half of its profits ($4.5 billion) in the most recent quarter. Marc Benioff, chief executive of Salesforce.com, the world’s largest SaaS provider with over $1.2 billion in sales in the past 12 months, is no less bullish: ‘Like the shift [from the mainframe to the client/server architecture] that roiled our industry in decades past, the transition to cloud computing is happening now because of major discontinuities in cost, value and function.'”

While the debate’s proposition suggests that security or privacy is its focus, it’s really a broader argument about how software services will be delivered in the future in which security is just one aspect.

“Whether and to what extent companies and consumers elect to hand their computing over to others, of course, depends on how much they trust the cloud. And customers still have many questions. How reliable are such services? What about privacy? Don’t I lose too much control? What if Salesforce.com, for instance, changes its service in a way I do not like? Are such web-based services really cheaper than traditional software? And how easy is it to get my data if I want to change providers? Are there open technical standards that would make this easier?”

UMBC Multicore Computational Center

June 15th, 2009, by Tim Finin, posted in cloud computing, High performance computing, MC2

Joab Jackson (UMBC ’90) wrote a nice article on UMBC’s Multicore Computational Center for the current issue of UMBC Magazine. From The Power of Parallels:

“In July 2007, IBM gave UMBC computer science professors Milton Halem and Yelena Yesha a grant to launch the center with cash and equipment that have totaled more than $1 million over the past three years. Supporting funding from NASA also helped the effort.

    “Not only are we ahead of the curve,” says Charles Nicholas, chair of the department of computer science and electrical engineering, “but we hope to stay ahead of the curve…. The partnerships with IBM will let us keep the technologies up to date.”

Halem says that government and private enterprise are in dire need of “trained graduate students who know how to apply the new methods of parallel programming to the problems they face,” Halem says. “We’re one of the few schools in the nation that is teaching these courses.”

Tutorial: Hadoop on Windows with Eclipse

April 9th, 2009, by Tim Finin, posted in cloud computing, High performance computing, MC2, Multicore Computation Center, Programming, Semantic Web

Hadoop has become one of the most popular frameworks to exploit parallelism on a computing cluster. You don’t actually need access to a cluster to try Hadoop, learn how to use it, and develop code to solve your own problems.

UMBC Ph.D student Vlad Korolev has written an excellent tutorial, Hadoop on Windows with Eclipse, showing how to install and use Hadoop on a single computer running Microsoft Windows. It also covers the Eclipse Hadoop plugin, which enables you to create and run Hadoop projects from Eclipse. In addition to step by step instructions, the tutorial has short videos documenting the process.

If you want to explore Hadoop and are comfortable developing Java programs in Eclipse on a Windows box, this tutorial will get you going. Once you have mastered Hadoop and had developed your first project using it, you can go about finding a cluster to run it on.

Map reduce on heterogeneous multicore clusters

April 7th, 2009, by Tim Finin, posted in cloud computing, High performance computing, MC2

In tomorrow’s ebiquity meeting (10 am EDT Wed, April 8), PhD student David Chapman will talk about his work on Map Reduce on Heterogeneous Multi-Core Clusters. From the abstract:

“We have extended the Map Reduce programming paradigm to clusters with multicore accelerators. Map Reduce is a simple programming programming model designed for parallel computations with large distributed datasets. Google has reinforced the practical effectiveness of this approach with over 1000 commercial Map Reduce applications. Typical Map Reduce implementations, such as Apache Hadoop exploit parallel file systems for use in homogeneous clusters. Unfortunately, the multicore accelerators such as Cell B.E. used in modern supercomputers such as Roadrunner require additional layers of parallelism, which cannot be addressed from parallel file systems alone. Related work has explored Map Reduce on a single Cell B.E. accelerator machine using hash and sort based techniques. We are incorporating techniques from Apache Hadoop as well as early multicore Map Reduce research to produce an implementation optimized for a hybrid multicore cluster. We are evaluating our implementation on a cluster of 24 of Cell Q series nodes, and and 48 multicore PowerPC J series nodes at the UMBC Multicore Computational Center.”

We will stream the talk live and share the raw recording.

Cloudera offers a simpler Hadoop distribution

March 18th, 2009, by Tim Finin, posted in cloud computing, Google, High performance computing, MC2, Multicore Computation Center, Semantic Web, Social media

We are early in the era of big data (including social and/or semantic) and more and more of us need the tools to handle it. Monday’s NYT had a story, Hadoop, a Free Software Program, Finds Uses Beyond Search, on Hadoop and Cloudera, a new startup that offering its own Hadoop distribution that is designed to beasier to install and configure.

“In the span of just a couple of years, Hadoop, a free software program named after a toy elephant, has taken over some of the world’s biggest Web sites. It controls the top search engines and determines the ads displayed next to the results. It decides what people see on Yahoo’s homepage and finds long-lost friends on Facebook.”

Three top engineers from Google, Yahoo and Facebook, along with a former executive from Oracle, are betting it will. They announced a start-up Monday called Cloudera, based in Burlingame, Calif., that will try to bring Hadoop’s capabilities to industries as far afield as genomics, retailing and finance. The company has just released its own version of Hadoop. The software remains free, but Cloudera hopes to make money selling support and consulting services for the software. It has only a few customers, but it wants to attract biotech, oil and gas, retail and insurance customers to the idea of making more out of their information for less.

Cloudera’s distribution, curently based on Hadoop v0.18.3, uses RPM and comes with a Web-based configuration aide. The company also offers some free basic training in mapReduce concepts, using Hadoop, developing appropriate algorithms and using Hive.

You are currently browsing the archives for the High performance computing category.

  Home | Archive | Login | Feed