Installing CDH4 with MapReduce on a Single Linux Node in Pseudo-distributed mode

Download and install the package

1. Click for download for a Lucid system or Precise system.




2. Save the package to a directory to which you have to access.

3. Go to the directory where you installed the package and run this command :

  $ sudo dpkg -i cdh4-repository_1.0_all.deb



Install CHD4

1. Add a repository key & add the Cloudera Public GPG Key to your repository by executing the following command: 

For Ubuntu Precise systems:

$ curl -s http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/archive.key 
| sudo apt-key add -


For Ubuntu Lucid systems:

$ curl -s http://archive.cloudera.com/cdh4/ubuntu/lucid/amd64/cdh/archive.key | sudo apt-key add -


2. Install Hadoop in pseudo-distributed mode:


$ sudo apt-get update $ sudo apt-get install hadoop-0.20-conf-pseudo


 Write Y, to continue


3. Starting Hadoop and Verifying it is Working Properly: 

 

   $ dpkg -L hadoop-0.20-conf-pseudo

 

To start Hadoop, proceed as follows. 

 

 Step 1: Format the Name-Node. 

 

Before starting the Name-Node for the first time you must format the file system by following command. 

         $ sudo -u hdfs hdfs namenode -format
 
 

 Step 2: Start HDFS by following command :


 for x in `cd /etc/init.d ; ls hadoop-hdfs-*` ; do sudo service $x start ; done

 

 
To verify services have started, you can check the web console. The Name-Node provides a web console http://localhost:50070/ for viewing your Distributed File System (DFS) capacity, number of Data-Nodes, and logs. In this pseudo-distributed configuration, you should see one live Data-Node named localhost

 

 

Step 3: Create the /tmp Directory 

  

Create the /temp directory and set permissions:


  $ sudo -u hdfs hadoop fs -mkdir /temp 

 

 Change owner of the temp directory :
 
$ sudo -u hdfs hadoop fs -chmod -R 777 /temp

 

 

Step 4: Create the MapReduce system directories

 

sudo -u hdfs hadoop fs -mkdir -p /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chmod 777 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
sudo -u hdfs hadoop fs -chown -R mapred /var/lib/hadoop-hdfs/cache/mapred
  
Step 5: Verify the HDFS File Structure :

  $ sudo -u hdfs hadoop fs -ls -R /

 


Step 6: Start MapReduce 

for x in `cd /etc/init.d ; ls hadoop-0.20-mapreduce-*` ; do sudo service $x start ; done



To verify services have started, you can check the web console. The JobTracker provides a web console http://localhost:50030/ for viewing and running completed and failed jobs with logs. 


Step 7: Create User Directories 

 

Create a home directory for each MapReduce user. It is best to do this on the NameNode; for example:  

$ sudo -u hdfs hadoop fs -mkdir /user/<user> 
$ sudo -u hdfs hadoop fs -chown <user> /user/<user>
 

where <user> is the Linux username of each user. 

 



Popular Posts