Saturday, November 27, 2010

How to install standalone Hadoop for development and debugging purposes

It is very easy to setup a standalone hadoop installation for development and testing purposes. Infact having a standlone hadoop installation on your local linux machine can be of great help in debugging issues.

The following are the steps involved in setting up the standalone installation of Hadoop 0.20 on Java 5. Please note that setting up a Hadoop cluster is very different than setting up a standalone version.

INSTALL
- Get a linux machine
   - Suppose your username is jeka and home directory is /home/jeka or ~
   - Create directories ~/java and ~/hadoop

- Download required software if required
   - Download java release (file jdk-6u22-linux-i586.bin) from here  to directory ~/java
   - Download Hadoop release(file hadoop-0.20.2.tar.gz) from here to directory~/hadoop

- Install Java
   - Make java file executable
             chmod a+x java/jdk-6u22-linux-i586.bin
   - Install it
            ~/java/jdk-6u22-linux-i586.bin

- Install Hadoop
   - Unzip
          gunzip ~/hadoop/hadoop-0.20.2.tar.gz
    - Untar
           tar -xvf ~/hadoop/hadoop-0.20.2.tar.gz

Thats it!. The installation is done and hadoop is ready to be used but to make life a little easier we should set up some environment variables.

CONFIGURE
Both Java and Hadoop provides command line clients  (or executables) java and hadoop respectively. These executables can found in the bin directory of the installation.
- Create a file ~/.hadoop_profile and add following lines in it

export JAVA_HOME="~/java/jdk1.6.0_22"
export HADOOP_HOME="~/hadoop/hadoop-0.20.2"
export PATH=${PATH}:${JAVA_HOME}:${HADOOP_HOME}

Save this file and source it 
    source ~/.hadoop_profile

Now instead of running hadoop like ~/hadoop/hadoop-0.20.2/bin/hadoop you can simple use it as hadoop. 


RUN YOUR FIRST HADOOP JOB
Note: This job will run on your local machine and not HDFS
File ~/hadoop/hadoop-0.20.2/hadoop-0.20.2-examples.jar comes with some examples. We can use one of the examples "grep" from that.

In the following example, we will use one of the map reduce examples to read the number of times the work "copyright" appeared in file  LICENSE.txt. 


 cd ~/hadoop/hadoop-0.20.2
 hadoop jar hadoop-0.20.2-examples.jar grep LICENSE.txt ~/tmp/out "copyright"

Output: 4
cat ~/tmp/out/* 

It's very simple to create your own jar and run it instead of using the examples jar. See blog post
http://hadoop-blog.blogspot.com/2010/11/how-to-run-and-compile-hadoop-program.html for more details





No comments:

Post a Comment