In this blog post, we will setup Hortonworks single node Hadoop cluster for learning, testing or just playing around with this awesome distributed file system. Production setup is very similar to the process described here - the only difference is in the security aspect which should be taken much more seriously.

Hortonworks also provides sandbox images for VirtualBox, VMware, and Docker which you can use to skip all steps described below - but if you want to know what is going on “under the hood” and what you’ll need to do for production setup - stick around.

Minimum requirements

Unfortunately, at the time of writing this blog HDP (Hortonworks Data Platform) does not support Ubuntu 16, so we’ll set up our single node cluster on Ubuntu 14 (server, 64bit) using Hyper-V VM with 8GB of RAM memory and 50GB of free disk space. You can go with a minimum of 4GB of RAM memory, but we wouldn’t recommend anything below 8GB.

Setup environment

Switch to root user

We’ll use Ambari - an awesome and easy-to-use Hadoop management web UI which can save you hours of work on Hadoop setup and maintenance.

Ambari requires root user to setup cluster nodes. This can be an issue for most developers who are trying to install Hadoop on Ubuntu using Ambari node installation. If you’re not willing to use this approach, you’ll need to install Ambari Agents manually.

Switch to the root user and change the password:

$ sudo su root
$ sudo passwd

Install and setup SSH server

$ sudo apt-get install ssh
$ sudo vi /etc/ssh/sshd_config

In sshd_config file set PermitRootLogin to Yes. Save and exit. Restart ssh service.

 $ service ssh restart

Install JDK

Check which JDK version you currently have on machine using:

 $ java -version
 $ javac -version

If you don’t have installed JDK on your server, use the following command to install it:

$ sudo apt-get install openjdk-7-jdk

After JDK installation is done, check your JDK versions again.

Change hostname and setup FQDN

Check the current hostname:

$ cat /etc/hostname

Set up a new hostname:

$ sudo vi /etc/hostname

We entered “node.monocluster.com” as our hostname. Save and exit the file.

To setup FQDN (Fully Qualified Domain Name) you’ll need to find your IP using:

$ ip addr show

Save and exit. Open hosts file and enter your IP and FQDN (e.g. “192.168.0.30 node.monocluster.com”) using:

$ sudo vi /etc/hosts

After the hosts file is edited, set up the machine hostname:

$ hostname <your FQDN name>

In our case that would be:

$ hostname node.monocluster.com

Check your hostnames using:

$ hostname 
$ hostname -f

Hostname now should match to entered names in “hostname” and “hosts” files.

Generate SSH keys

Go to .ssh folder and generate ssh keys:

$ cd /root/.ssh
$ ssh-keygen
$ sudo cat id_rsa.pub >> authorized_keys

Try to ssh with the newly created keys:

$ ssh root@<hostname>

Disable firewall

Check firewall status:

$ sudo ufw status

If the firewall is active, turn it off:

$ sudo ufw disable 

Remove SE-Linuxl

$ apparmor_status
$ sudo /etc/init.d/apparmor stop
$ sudo update-rc.d -f apparmor remove

Download Ambari

$ sudo wget -nv http://public-repo-1.hortonworks.com/ambari/ubuntu14/2.x/updates/2.2.1.0/ambari.list -O /etc/apt/sources.list.d/ambari.list
$ sudo apt-key adv --recv-keys --keyserver keyserver.ubuntu.com B9733A7A07513CAD
$ apt-get update

Check downloaded packages:

$ apt-cache showpkg ambari-server
$ apt-cache showpkg ambari-agent
$ apt-cache showpkg ambari-metrics-assembly

Install Hadoop using Ambari

If all ambari packages are in place - we’re ready to install and run ambari server:

$ apt-get install ambari-server
$ ambari-server setup
$ ambari-server start
$ ambari-server status

Accept all default values in the ambari setup process. Now we’re ready to install our node(s) using ambari dashboard.

Setup cluster using Ambari

Fire up your browser and go to *< your-server-ip >:8080 *

Log in using the default credentials (admin/admin) and click the [Launch Install Wizard] button.

Ambari Dashboard

Enter the cluster name and click next. Choose the desired HDP version and proceed to the next step.

In Target Host field you’ll need to enter hostname from the hosts file (hostname -f). In multi-node installation you would enter hostnames of all your nodes. Later, you’ll have the option to assign master (name) and slave (worker) nodes.

Ambari Hosts

For SSH private key, use generated id_rsa from /root/.ssh folder. User name should be “root” and port set to 22. Click [Next] and wait until HDP installation process is done (this can take some time).

Ambari Installation

After HDP is successfully installed, choose which services you would like to install, and go to the next step. Note: For “SmartSense” service you’ll need “Customer account name” and “SmartSense ID,” so we’ll uncheck this service alongside with HBase and Accumulo.

Ambari Services

Because we’re installing single node cluster and everything will be on one server - there is nothing interesting for us in next two steps (Assign Master and Assign Slaves and Clients).

Customize Services

Ambari Services Customization

In this step you will need to provide:

  • Hive (Advanced tab) - password for the new MySQL Database
  • Oozie - password for the new Derby Database
  • Knox - knox master secret

Proceed to the next step, review your setup and click Deploy. That’s it!

Congratulations on your first Hadoop cluster setup.

More articles

Your opinion matters!

comments powered by Disqus