2013年2月16日 星期六

Install Ubuntu 12.04 + hadoop 1.0 on Xen server

This article setup a hadoop cluster on Xen, with Ubuntu 12.04 installed.  Only simple description is shown here.  For more other information, console the documents and Google for the answers.

Step 1: Select Ubuntu Precise Pangolin 12.04 (32-bit) Template.  We install Ubuntu via the Internet with the following URL:
http://tw.archive.ubuntu.net/ubuntu
Note that Other similar server may be chosen according to the distance between you and servers. The CPU unit is set to 1, and the Memory is 512MB.  Disk size is set as default (8GB).

Step 2: Boot Ubuntu on Xen.  Time zone is Taipei/Taiwan, while the Locale is set en-us.  Setup the network connection if no DHCP server is found.  Note that downloading and installing from server may takes a while, be patient.  After entering user name and password, setup the disk with only swap space and one ext4 space  for root (/).  Luckily, that setting is default.  Select LVM if disk space expansion is needed in the future.  No automatic update allowed for easier experiment test bed maintenance.  Some additional Software should be selected, including: OpenSSH server.

Step 3: Install Xen Server Tools with following commands.
sudo mount /dev/xvdd /mnt
sudo /mnt/Linux/install.sh
sudo reboot

Step 4: Install OpenJDK.  According to this wiki page, there are no differences to use OpenJDK, except that building the hadoop library.  ssh also needs to be install
sudo apt-get install openjdk-6-jre openjdk-6-jdk
sudo apt-get install ssh rsync

Step 5: Install hadoop.  The version we use is hadoop 1.0.4, which is current stable version.  Use following command to un-compress the package.
tar -xzvf hadoop-1.0.4.tar.gz

Step 6: Copy other environment to the new machine with scp.  Copy the /etc/hosts.

Step 7: Close the VM, and export it.  Make replicas on Xen.

Step 8: Setup hostname, hosts, network setting.

Step 9: Install Ganglia.  Steps are described as following:
Step 9.1: Install Ganglia Monitor Daemon on each machine that you want to monitor
sudo apt-get install ganglia-monitor
Edit /etc/ganglia/gmond.conf as following:
cluster {
  name = "HDP Group"
  owner = "CTFan"
  latlong = "unspecified"
  url = "url.to.here"
}
host {
  location = "this machine name"
}
udp_send_channel {
  host = hdpNameNode
  port = 8649
  ttl = 1
}
udp_recv_channel {
  port = 8649
}
Step 9.2: Install Ganglia Metadata Daemon on the machine where you want to access via webpage
sudo apt-get install gmetad ganglia-webfrontend rrdtool
Edit /etc/ganglia/gmetad.conf as following:
data_source "HDP Group" hdpNameNode
Step 9.3: Copy Ganglia web page
sudo mkdir -p /var/www/ganglia sudo cp /usr/share/ganglia-webfrontend/* /var/www/ganglia
Access localhost/ganglia to see it works

Step 10: Setup hadoop
conf/masters
hdp201
conf/slaves
hdp202
hdp203
hdp204
conf/mapred-site.xml
mapred.job.tracker → hdp201:9001
conf/core-site.xml
fs.default.name → hdfs:/hdp201:9000
hadoop.tmp.dir → /home/hadoop/tmp
conf/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java-6-openjdk-i386/jre/
conf/hdfs-site.xml
dfs.replication → 1
dfs.data.dir → /home/hadoop/dfs/data
dfs.name.dir → /home/hadoop/dfs/name

Format the namenode, and run.

This is the end of this article!