In this blog post, we will setup Hortonworks single node Hadoop cluster for learning, testing or just playing around with this awesome distributed file system. Production setup is very similar to the process described here - the only difference is in the security aspect which should be taken much more seriously.
Hortonworks also provides sandbox images for VirtualBox, VMware, and Docker which you can use to skip all steps described below - but if you want to know what is going on “under the hood” and what you’ll need to do for production setup - stick around.
Unfortunately, at the time of writing this blog HDP (Hortonworks Data Platform) does not support Ubuntu 16, so we’ll set up our single node cluster on Ubuntu 14 (server, 64bit) using Hyper-V VM with 8GB of RAM memory and 50GB of free disk space. You can go with a minimum of 4GB of RAM memory, but we wouldn’t recommend anything below 8GB.
Switch to root user
We’ll use Ambari - an awesome and easy-to-use Hadoop management web UI which can save you hours of work on Hadoop setup and maintenance.
Ambari requires root user to setup cluster nodes. This can be an issue for most developers who are trying to install Hadoop on Ubuntu using Ambari node installation. If you’re not willing to use this approach, you’ll need to install Ambari Agents manually.
Switch to the root user and change the password:
$ sudo su root $ sudo passwd
Install and setup SSH server
$ sudo apt-get install ssh $ sudo vi /etc/ssh/sshd_config
sshd_config file set PermitRootLogin to Yes.
Save and exit. Restart ssh service.
$ service ssh restart
Check which JDK version you currently have on machine using:
$ java -version $ javac -version
If you don’t have installed JDK on your server, use the following command to install it:
$ sudo apt-get install openjdk-7-jdk
After JDK installation is done, check your JDK versions again.
Change hostname and setup FQDN
Check the current hostname:
$ cat /etc/hostname
Set up a new hostname:
$ sudo vi /etc/hostname
We entered “node.monocluster.com” as our hostname. Save and exit the file.
To setup FQDN (Fully Qualified Domain Name) you’ll need to find your IP using:
$ ip addr show
Save and exit. Open
hosts file and enter your IP and FQDN (e.g. “192.168.0.30 node.monocluster.com”) using:
$ sudo vi /etc/hosts
hosts file is edited, set up the machine hostname:
$ hostname <your FQDN name>
In our case that would be:
$ hostname node.monocluster.com
Check your hostnames using:
$ hostname $ hostname -f
Hostname now should match to entered names in “hostname” and “hosts” files.
Generate SSH keys
.ssh folder and generate ssh keys:
$ cd /root/.ssh $ ssh-keygen $ sudo cat id_rsa.pub >> authorized_keys
Try to ssh with the newly created keys:
$ ssh root@<hostname>
Check firewall status:
$ sudo ufw status
If the firewall is active, turn it off:
$ sudo ufw disable
$ apparmor_status $ sudo /etc/init.d/apparmor stop $ sudo update-rc.d -f apparmor remove
$ sudo wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/188.8.131.52/ambari.list -O /etc/apt/sources.list.d/ambari.list $ sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD $ apt-get update
Check downloaded packages:
$ apt-cache showpkg ambari-server $ apt-cache showpkg ambari-agent $ apt-cache showpkg ambari-metrics-assembly
Install Hadoop using Ambari
If all ambari packages are in place - we’re ready to install and run ambari server:
$ apt-get install ambari-server $ ambari-server setup $ ambari-server start $ ambari-server status
Accept all default values in the ambari setup process. Now we’re ready to install our node(s) using ambari dashboard.
Setup cluster using Ambari
Fire up your browser and go to *< your-server-ip >:8080 *
Log in using the default credentials (admin/admin) and click the [Launch Install Wizard] button.
Enter the cluster name and click next. Choose the desired HDP version and proceed to the next step.
In Target Host field you’ll need to enter hostname from the
hosts file (
In multi-node installation you would enter hostnames of all your nodes. Later, you’ll have the option to assign master (name) and slave (worker) nodes.
For SSH private key, use generated
/root/.ssh folder. User name should be “root” and port set to 22.
Click [Next] and wait until HDP installation process is done (this can take some time).
After HDP is successfully installed, choose which services you would like to install, and go to the next step. Note: For “SmartSense” service you’ll need “Customer account name” and “SmartSense ID,” so we’ll uncheck this service alongside with HBase and Accumulo.
Because we’re installing single node cluster and everything will be on one server - there is nothing interesting for us in next two steps (Assign Master and Assign Slaves and Clients).
In this step you will need to provide:
- Hive (Advanced tab) - password for the new MySQL Database
- Oozie - password for the new Derby Database
- Knox - knox master secret
Proceed to the next step, review your setup and click Deploy. That’s it!
Congratulations on your first Hadoop cluster setup.