Sunday, April 15, 2012

Hadoop Clusters Setup

Pre-setup
1. install jdk/jre1.6 or up
2. install ssh
     a. create master sshkey
         ssh-keygen -t dsa -P ""
         cat id_dsa.pub>>authorized_keys
     b. copy master public key to slaves
         scp id_dsa.pub slaveN:~/.ssh/master.pub
      c. add master pub key to authorized_keys
         cat master.pub>>authorized_keys
      d. from master, ssh to slaveN and check if a passphrase is needed.
3. edit /etc/hosts & /etc/hostname

Setup
1. setup env.xml (export JAVA_HOME)

2. core-site (specify name node and jobtracker) --for master & slaves         

                       fs.default.name
                       hdfs://master
       
3. hdfs-site.xml (data node) --for master & slave     

                       dfs.name.dir
                       /home/hduser/hddata/name
             
                       dfs.data.dir
                       /home/hduser/hddata/data
         
4. mapred-site.xml (jobtracker) --for master & slaves   

                       mapred.job.tracker
                       master:54311
           
5. list all slaves to conf/slaves --for master/jobtracker only

6. chmod g-w to all data and name directories

** start-dfs.sh will consult slaves on name-node and start all data nodes on slaves.
** start-mapred.sh will consult salves on job-tracker-node and start all task-trackers on slaves.

Startup
 1. execute "hadoop namenode -format" on name node site

 2. execute "start-dfs.sh" on name node site

 3. execute "start-mapred.sh" on job tracker site

Shutdown
 1. execute "stop-mapred.sh" on job tracker site

 2. execute "stop-dfs.sh" on name node site




No comments:

Post a Comment