The following are the procedures of installing hadoop version 2.7.1.
1. Install Ubuntu 14.04.3At present, Ubuntu 14.04.3 is the long term stable version that is recommended for long term tests.
In order to have a static IP, two network adapters installs on each VM. Adapter 1 uses the NAT, which allows the VM to update its packages. Adapter 2 uses Host-only Adapter, which allows the host computer (Windows 7) to access directly. Usually, Windows 7 has the IP 192.168.56.1. The setting can be obtained from Windows 7's Ethernet Adapter named "VirtualBox Host-Only Network".
2. Upgrade all the packages and install some necessary packages
The following packages are required to run hadoop.
- openjdk-7-jre openjdk-7-jdk
- ssh rsync
3. Install hadoop 2.7.1
Go to the hadoop website and download the binary file of hadoop. Extract the files by following command.
tar -xzvf hadoop-2.7.1.tar.gz
mv hadoop-2.7.1 hadoop
4. Setup hadoop's configuration
All the setting files are under the hadoop/etc/hadoop folder. The followings are files that I have modified and seems to work on my computer.
The following notations are used due to laziness.
name of the property → value of the property
In the core-site.xml.
fs.default.name → hdfs://master:9000
In the hadoop-env.sh. Note that this tells the hadoop system where the JRE is. It may be differed according to the system. In my example, 64-bit openjdk is installed.
In the hdfs-site.xml.
dfs.namenode.name.dir → /home/hdfs
dfs.datanode.data.dir → /home/hdfs
dfs.permissions → false
In the mapred-site.xml.
mapreduce.framework.name → yarn
In the slave.
You should enter each slave's host name
In the yarn-site.xml
yarn.nodemanager.aux-services → mapreduce_shuffle
yarn.resourcemanager.hostname → master
5. Create another account for hdfs
According to the apache hadoop's document, hdfs and yarn should be executed under different account. Using the following commands to add an account named "hdfs".
sudo useradd -d /home/hdfs -m hdfsNext, modify the line of hdfs in /etc/passwd as the original user. The following is an example.
sudo passwd hdfs
sudo usermod -aG sudo hdfs
6. Copy the hadoop folder in step 4 to hdfs account
Notice. It is possible that you have to use sudo command to copy the directory. However, you may use chusr and chgrp to change the ownership and the owner group of the directory.
7. Setup /etc/hosts
Use the /etc/hosts to help you distinguish the VMs by their host name
The following python code can be used to generate a list of hostname.
f = open('mHosts', 'w')
for i in range(0, 256):
After that, just insert the generated file (mHosts) to the beginning of /etc/hosts. The line in the original /etc/hosts that contains itself's host name should be removed. The IP of the localhost is suggested to use the true IP, not the 127.0.0.1 or 192.168.* or 127.0.* to prevent some unwilling problems.
8. Setup the ssh channel for password-less login
Use the following commands to create a password-less login.
ssh-keygen -t rsa
cat .ssh/id_dsa.pub ＞＞ .ssh/authorized_keys
9. Copy VM
Yes. Now we copy the VM. Since individually setup each minor parts is difficult. When duplicating the VM, remember to give the adapter with a new MAC address. The following are something that should be modified after the VM is duplicated
Note that there is a trick when modifying the /etc/hosts. The localhost should 1) set to the dedicate IP of the VM and 2) set after the line that contains the IP of itself. It is suggested to remove the 'localhost' line which is initially created one at the end of the file.
After these files are updated, reboot the VM, and ssh from the master to every slaves for both account in case of some disturbing messages.
10. Formate Namenode and start hdfs
Use the following command to format Namenode
hadoop/bin/hdfs namenode -formatUse the following command to start hdfs under the hdfs account
11. Start yarn
Use the following command to start yarn
12. Try to run an example to verify the installationUse the following command to verify the installation.
hadoop/bin/hadoop jar hadoop/share/hadoop/mapreduce/hadoop-mapreduce-example-2.7.1.jar pi 5 5This is the end of this article. The upon procedure has been re-tested to verify no problem inside.