Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode

Download and install the package

1. Click for download, for a Lucid system or Precise system.

2. Save the package to a directory to which you have to access.

3. Go to the directory where you installed the package and run this command :

$ sudo dpkg -i cdh4-repository_1.0_all.deb

Install CHD4

1. Add a repository key & add the Cloudera Public GPG Key to your repository by executing the following command:

For Ubuntu Precise systems:

$ curl -s http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/archive.key

| sudo apt-key add -

For Ubuntu Lucid systems:

$ curl -s http://archive.cloudera.com/cdh4/ubuntu/lucid/amd64/cdh/archive.key | sudo apt-key add -

2. Install Hadoop in pseudo-distributed mode:

$ sudo apt-get update $ sudo apt-get install hadoop-0.20-conf-pseudo

Write Y, to continue

3. Starting Hadoop and Verifying it is Working Properly:

$ dpkg -L hadoop-0.20-conf-pseudo

To start Hadoop, proceed as follows.

Step 1: Format the Name-Node.

Before starting the Name-Node for the first time you must format the file system by following command.

$ sudo -u hdfs hdfs namenode -format

Step 2: Start HDFS by following command :

for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done

To verify services have started, you can check the web console. The Name-Node provides a web console http://localhost:50070/ for viewing your Distributed File System (DFS) capacity, number of Data-Nodes, and logs. In this pseudo-distributed configuration, you should see one live Data-Node named localhost.

Step 3: Create the /tmp Directory

Create the `/temp` directory and set permissions:

$ sudo -u hdfs hadoop fs -mkdir /temp

 Change owner of the temp directory :

$ sudo -u hdfs hadoop fs -chmod -R 777 /temp

Step 4: Create the MapReduce system directories

sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chmod 777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred

Step 5: Verify the HDFS File Structure :

$ sudo -u hdfs hadoop fs -ls -R /

Step 6: Start MapReduce

for x in `cd /etc/init.d ; ls hadoop-0.20-mapreduce-*` ; do sudo service $x start ; done

To verify services have started, you can check the web console. The JobTracker provides a web console http://localhost:50030/ for viewing and running completed and failed jobs with logs.

Step 7: Create User Directories

Create a home directory for each MapReduce user. It is best to do this on the NameNode; for example:

$ sudo -u hdfs hadoop fs -mkdir /user/<user>

$ sudo -u hdfs hadoop fs -chown <user> /user/<user>

where <user> is the Linux username of each user.

1.	Introduction on Hadoop
2.	How Hadoop Works
3.	Hadoop Installation
4.	How to start Hadoop in UNIX
5.	HDFS Commands
6.	Reading and Writing the data from Data-Node
7.	Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode

Learn Big Data Analyics Techniques

Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode

$ sudo dpkg -i cdh4-repository_1.0_all.deb

2. Install Hadoop in pseudo-distributed mode:

3. Starting Hadoop and Verifying it is Working Properly:

To start Hadoop, proceed as follows.

Step 1: Format the Name-Node.

Step 2: Start HDFS by following command :

Step 3: Create the /tmp Directory

Create the `/temp` directory and set permissions:

Step 4: Create the MapReduce system directories

$ sudo -u hdfs hadoop fs -ls -R /

Create a home directory for each MapReduce user. It is best to do this on the NameNode; for example:

where <user> is the Linux username of each user.

Popular Posts

Learn Big Data Analyics Techniques

Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode

$ sudo dpkg -i cdh4-repository_1.0_all.deb

2. Install Hadoop in pseudo-distributed mode:

3. Starting Hadoop and Verifying it is Working Properly:

To start Hadoop, proceed as follows.

Step 1: Format the Name-Node.

Step 2: Start HDFS by following command :

Step 3: Create the /tmp Directory

Create the /temp directory and set permissions:

Step 4: Create the MapReduce system directories

$ sudo -u hdfs hadoop fs -ls -R /

Create a home directory for each MapReduce user. It is best to do this on the NameNode; for example:

where <user> is the Linux username of each user.

Popular Posts

Create the `/temp` directory and set permissions: