1. Introduction
  2. Prerequistes
  3. Set Environment Variables
  4. Setup SSH daemon
  5. Download hadoop and place it in the home directory.
  6. Unpack hadoop
  7. Configure Hadoop
  8. Format the namenode
  9. Setup hadoop plugin
  10. Start the cluster
  11. Setup hadoop location
  12. Upload data
  13. Create and run a test project.
Bookmark and Share

Prerequisites

Before we begin, make sure the following components are installed on your workstation:

This tutorial had been written for and tested with the Hadoop version 0.19.1 if you are using another version some things might not work for you.

Make sure that you have exactly the same versions of the software as shown above. Hadoop will not work with versions of Java prior to 1.6 and it will not work with the versions of Eclipse after 3.3.2 due to plugin API incompatibility.

 

Installing Cygwin

After you made sure that the above prerequisites are installed the next step would be to install the cygwin environment. The cygwin is a set of UNIX packages ported to Microsoft Windows. It is needed to run Hadoop supplied scripts since they are all written for the UNIX platform.

To install the cygwin environment follow these steps:

  1. Download cygwin installer from here.
  2. Run the downloaded file. You will see the window shown on the screenshots below.


    Cygwin installer

    Cygwin Installer
  3. When you see the above screen shot keep pressing 'Next' button until you see the package selection screen shown below. Make sure that you have package 'openssh' selected. This package is required for the correct functioning of the hadoop cluster and eclipse plugin.

Click here to see larger version

  1. After you selected these packages press the 'Next' button to complete the installation.

Continue

Bookmark and Share