Tutorial: Hadoop on Windows with Eclipse

April 9th, 2009

Hadoop has become one of the most popular frameworks to exploit parallelism on a computing cluster. You don’t actually need access to a cluster to try Hadoop, learn how to use it, and develop code to solve your own problems.

UMBC Ph.D student Vlad Korolev has written an excellent tutorial, Hadoop on Windows with Eclipse, showing how to install and use Hadoop on a single computer running Microsoft Windows. It also covers the Eclipse Hadoop plugin, which enables you to create and run Hadoop projects from Eclipse. In addition to step by step instructions, the tutorial has short videos documenting the process.

If you want to explore Hadoop and are comfortable developing Java programs in Eclipse on a Windows box, this tutorial will get you going. Once you have mastered Hadoop and had developed your first project using it, you can go about finding a cluster to run it on.

Map reduce on heterogeneous multicore clusters

April 7th, 2009

In tomorrow’s ebiquity meeting (10 am EDT Wed, April 8), PhD student David Chapman will talk about his work on Map Reduce on Heterogeneous Multi-Core Clusters. From the abstract:

“We have extended the Map Reduce programming paradigm to clusters with multicore accelerators. Map Reduce is a simple programming programming model designed for parallel computations with large distributed datasets. Google has reinforced the practical effectiveness of this approach with over 1000 commercial Map Reduce applications. Typical Map Reduce implementations, such as Apache Hadoop exploit parallel file systems for use in homogeneous clusters. Unfortunately, the multicore accelerators such as Cell B.E. used in modern supercomputers such as Roadrunner require additional layers of parallelism, which cannot be addressed from parallel file systems alone. Related work has explored Map Reduce on a single Cell B.E. accelerator machine using hash and sort based techniques. We are incorporating techniques from Apache Hadoop as well as early multicore Map Reduce research to produce an implementation optimized for a hybrid multicore cluster. We are evaluating our implementation on a cluster of 24 of Cell Q series nodes, and and 48 multicore PowerPC J series nodes at the UMBC Multicore Computational Center.”

We will stream the talk live and share the raw recording.

Nano-content: 1st 2 words

April 6th, 2009

Not only do you have to choose title of your papers, posts and web pages well, their first two words should be chosen to carry the message. Jakob Nielsen reports on UI research showing that the first 11 characters of links and headlines are important in forming some idea of what the item is about.

First 2 Words: A Signal for the Scanning Eye
“Our newest usability study … tests how well users understand the first 11 characters of a website’s links and headlines. For example, we’d represent this article by the “First 2 Wor” string. … Why test text that’s so severely truncated? Because online reading is often dominated by the F-pattern. That is, people read the first few listed items somewhat thoroughly — thus the cross-bars of the “F” — but read less and less as they continue down the list, eventually passing their eyes down the text’s left side in a fairly straight line. At this point, users see only the very beginning of the items in a list. …”

Nielsen calls the initial few words in a title “nano-content”. While it’s hard to pack some ideas into 11 characters, it sounds like a good goal.

Choosing the words for a link or title carefully is a key to influencing search engines — these words are given higher weight when indexing the associated content. But search engines don’t scan like humans, so putting the most relevant early in the string helps when a person is shown a list of results.

When every byte counts: URL shortening service review

April 4th, 2009

Searchengineland has a useful post on URL shorenting services, Analysis: Which URL Shortening Service Should You Use?.

“URL shortening services are experiencing a renaissance in the age of Twitter. When every character counts, these services reduce long URLs to tiny forms. But which is the best to use, when so many are offered and new ones seem to appear each day? Below, issues to consider and a breakdown of popular services, including recommendations and services to avoid (the new DiggBar being one of these).”

They review 15 services and discuss some of the underlying issues (e.g., 301 vs. 302 redirects).

The venerable Tinyurl.com? Too long! The popular bit.ly made the cut even though it’s longer than tr.im.

Twitter vs. Facebook: fad vs. need?

April 3rd, 2009

Earlier this week the Baltimore Sun’s Andrew Ratner had a story on Twitter, When did Twitter take over the universe?. The story had this interesting quote from UMBC’s Zeynep Tufekci:

Some people who study technology aren’t sure Twitter will endure.

“Frankly, I think a lot of twittering is somewhat faddish, whereas I never thought Facebook was. … People I interviewed and surveyed would talk of serious feeling of deprivation without Facebook and I’ve hardly heard anyone say that about twitter,” Zeynep Tufekci, an assistant professor who teaches the sociology of technology at the University of Maryland, Baltimore County, wrote in an e-mail. “Will people Twitter five years from now? Perhaps, but I would not be surprised if they did not, or at least as much.”

Scantegrity cryptographic voting system to be used in binding governmental election

April 2nd, 2009

This November will be the first time any end-to-end cryptographic system will be used in a binding governmental election.

UMBC Professor Alan Sherman and his students have been helping develop the Scantegrity open source election verification technology for optical scan voting systems. It uses privacy preserving confirmation numbers to allow each voter to verify her vote is counted and that all the votes were counted correctly.

The group has been working with Takoma Park MD to use this in a binding governmental election later this year. Alan recently wrote:

“On Saturday April 11, there will be a mock election in Takoma Park, MD, using the Scantegrity II high-integrity voting system being developed in part at the UMBC Cyber Defense Lab. Anyone is welcome to come and vote – polls will be open 10am-2pm in the Community Center at 7500 Maple Ave. This mock election is preparation for the Nov 2009 municipal election in Takoma Park which will also use Scantegrity – the first time any end-to-end cryptographic system will have been used in a binding governmental election.”

Here’s the text a short article on the election from the April 2009 Takoma Park newsletter.

This Arbor Day: Plant the Seeds for Election Verifiability

Election integrity is a major issue both nationally and internationally. During the City’s annual Arbor Day celebration, Takoma Park will try out what may be one solution. From 10 a.m. until 2 p.m. on April 11, City residents and their families and friends are invited to participate in a mock election administered by the City and its Board of Elections. The point of this mock election is to give voters an opportunity to test out and provide feedback to the City on the voting system it will use in the November 2009 municipal elections.

First among the many characteristics that set this system apart from those previously used by the City is that voters will be able to confirm that their ballots were counted.

As part of their ballot, voters will receive a confirmation code that they can write down, take home and check online to make sure their votes were counted. The confirmation number does not say how you voted and your vote remains private. What it does say, however, is that your vote is included in the final tally and that the machine read your vote correctly.

The system is paper-based and works like an optical scan voting system, making it easy to use. The only difference is that when you vote, instead of a completely black bubble, you will see the confirmation number appear as shown in the illustration above.

Writing down and checking the confirmation number is optional. So, this Arbor Day, while enjoying the festivities, drop by the Community Center Azalea Room to see how the system works. Try it out, ask questions, give feedback, and enjoy the refreshments!

To obtain more information on the Arbor Day Mock Election, visit the City’s website at www.takomaparkmd. gov. Questions may also be addressed to the City Clerk’s office at 301-891-7267 or Clerk@takomagov.org.