Skip to main content

Ansible Playbook - Setup Hadoop CDH5 Using tarball.

Ansible Playbook - Setup Hadoop CDH5 Using tarball.


Table of Contents


This is a simple Hadoop playbook, to quickly start hadoop running in a cluster.
Here is the Script Location on Github: https://github.com/zubayr/ansible_hadoop_tarball
Below are the steps to get started.

Get the script from Github.

Below is the command to clone.
ahmed@ahmed-server ~]$ git clone https://github.com/zubayr/ansible_hadoop_tarball

Before we start.

Download hadoop-2.3.0-cdh5.1.2.tar.gz to file_archives directory.
Download jdk-7u75-linux-x64.tar.gz to file_archives directory.

Details about each Playbook 'Roles'.

Details about each Role.

commons

This role is used to update OS parameters and will update the below files.
  1. sysctl.conf Update swapiness, networking and more. Info in defaults/main.yml
  2. limits.conf Update soft and hard limits.
  3. 90-nproc.conf Update user based limits and adding hadoop_user limits file.
  4. /etc/hosts Update hosts file on the server - from host_name in hosts file.
/etc/hosts file will get the server information from the [allnodes] group in the hosts file.
NOTE : Commons will update the HOSTNAME of the server as well as per these entries.

jdk

This role install jdk1.7. Installation path - from group_vars/all with variable java_home.

ssh_known_hosts

This role will create ssh known hosts for all the hosts in the hosts file.

ssh_password_lss

This role will make hadoop_user passwordless user for hadoop nodes.

cdh5_hadoop_commons_tarball

This role will install and configure hadoop installation. Update files.
  1. core-site.xml Add Namenode.
  2. hdfs-site.xml Update hdfs parameters - default/main.yml.
  3. mapred-site.xml Update MR information.
  4. yarn-site.xml Update Yarn.
  5. slaves Update slaves information - hosts file.
  6. hadoop-env.sh Update JAVA_HOME - group_vars.

post_install_setups

This is hadoop user creation after installation. If we need more users then we need to add them in rolepost_install_setups.
Current we will create a user called stormadmin. More details inroles/post_install_setups/tasks/create_hadoop_user.yml
#
# Creating a Storm User on Namenode/ This will eventually be a edge node.
#
- hosts: namenodes
  remote_user: root
  roles:
    - post_install_setups

Step 1. Update below variables as per requirement.

Global Vars can be found in the location group_vars/all.
# --------------------------------------
# USERs
# --------------------------------------

hadoop_user: hdadmin
hadoop_group: hdadmin
hadoop_password: $6$rounds=40000$1qjG/hovLZOkcerH$CK4Or3w8rR3KabccowciZZUeD.nIwR/VINUa2uPsmGK/2xnmOt80TjDwbof9rNvnYY6icCkdAR2qrFquirBtT1

# Common Location information.
common:
  install_base_path: /usr/local
  soft_link_base_path: /opt

Step 2. User information come from group_vars.

Username can be changed in the Global Vars, zookeeper_user. Currently the password ishdadmin@123
Password can be generated using the below python snippet.
# Password Generated using python command below.
python -c "from passlib.hash import sha512_crypt; import getpass; print sha512_crypt.encrypt(getpass.getpass())"
Here is the execution. After entering the password you will get the encrypted password which can be used in the user creation.
ahmed@ahmed-server ~]$ python -c "from passlib.hash import sha512_crypt; import getpass; print sha512_crypt.encrypt(getpass.getpass())"
Enter Password: *******
$6$rounds=40000$1qjG/hovLZOkcerH$CK4Or3w8rR3KabccowciZZUeD.nIwR/VINUa2uPsmGK/2xnmOt80TjDwbof9rNvnYY6icCkdAR2qrFquirBtT1
ahmed@ahmed-server ~]$

Step 3. Update Host File.

IMPORTANT update contents of hosts file. In hosts file host_name is used to create the /etc/hostsfile.
#
# All pre-prod nodes. 
#
[allnodes]
10.10.18.30 host_name=ahmd-namenode
10.10.18.31 host_name=ahmd-datanode-01
10.10.18.32 host_name=ahmd-datanode-02
10.10.18.34 host_name=ahmd-resourcemanager
10.10.18.93 host_name=ahmd-secondary-namenode
10.10.18.94 host_name=ahmd-datanode-03
10.10.18.95 host_name=ahmd-datanode-04


# 
# hadoop cluster
#

[namenodes]
10.10.18.30

[secondarynamenode]
10.10.18.93

[resourcemanager]
10.10.18.34

[jobhistoryserver]
10.10.18.34

[datanodes]
10.10.18.31
10.10.18.32
10.10.18.94
10.10.18.95

[hadoopcluster:children]
namenodes
secondarynamenode
resourcemanager
jobhistoryserver
datanodes

#
# sshknown hosts list.
#

[sshknownhosts:children]
hadoopcluster

Step 4. Executing yml.

Execute below command.
ansible-playbook ansible_hadoop.yml -i hosts --ask-pass

Comments

Post a Comment

Popular posts from this blog

Cloudera Manager - Duplicate entry 'zookeeper' for key 'NAME'.

We had recently built a cluster using cloudera API’s and had all the services running on it with Kerberos enabled. Next we had a requirement to add another kafka cluster to our already exsisting cluster in cloudera manager. Since it is a quick task to get the zookeeper and kafka up and running. We decided to get this done using the cloudera manager instead of the API’s. But we faced the Duplicate entry 'zookeeper' for key 'NAME' issue as described in the bug below. https://issues.cloudera.org/browse/DISTRO-790 I have set up two clusters that share a Cloudera Manger. The first I set up with the API and created the services with capital letter names, e.g., ZOOKEEPER, HDFS, HIVE. Now, I add the second cluster using the Wizard. Add Cluster->Select Hosts->Distribute Parcels->Select base HDFS Cluster install On the next page i get SQL errros telling that the services i want to add already exist. I suspect that the check for existing service names does n

Zabbix History Table Clean Up

Zabbix history table gets really big, and if you are in a situation where you want to clean it up. Then we can do so, using the below steps. Stop zabbix server. Take table backup - just in case. Create a temporary table. Update the temporary table with data required, upto a specific date using epoch . Move old table to a different table name. Move updated (new temporary) table to original table which needs to be cleaned-up. Drop the old table. (Optional) Restart Zabbix Since this is not offical procedure, but it has worked for me so use it at your own risk. Here is another post which will help is reducing the size of history tables - http://zabbixzone.com/zabbix/history-and-trends/ Zabbix Version : Zabbix v2.4 Make sure MySql 5.1 is set with InnoDB as innodb_file_per_table=ON Step 1 Stop the Zabbix server sudo service zabbix-server stop Script. echo "------------------------------------------" echo " 1. Stopping Zabbix Server &quo

Access Filter in SSSD `ldap_access_filter` [SSSD Access denied / Permission denied ]

Access Filter Setup with SSSD ldap_access_filter (string) If using access_provider = ldap , this option is mandatory. It specifies an LDAP search filter criteria that must be met for the user to be granted access on this host. If access_provider = ldap and this option is not set, it will result in all users being denied access. Use access_provider = allow to change this default behaviour. Example: access_provider = ldap ldap_access_filter = memberOf=cn=allowed_user_groups,ou=Groups,dc=example,dc=com Prerequisites yum install sssd Single LDAP Group Under domain/default in /etc/sssd/sssd.conf add: access_provider = ldap ldap_access_filter = memberOf=cn=Group Name,ou=Groups,dc=example,dc=com Multiple LDAP Groups Under domain/default in /etc/sssd/sssd.conf add: access_provider = ldap ldap_access_filter = (|(memberOf=cn=System Adminstrators,ou=Groups,dc=example,dc=com)(memberOf=cn=Database Users,ou=Groups,dc=example,dc=com)) ldap_access_filter accepts standa