Skip to main content

Creating a Multi-node Cassandra Cluster on Centos 6.5.

Creating a Multi-node Cassandra Cluster on Centos 6.5.


Table of Contents


This is a basic multi-node cassandra setup.

Initial Server Setup

Hardware Information

All the server were with below configuration.
CPU : 40 Cores
RAM : 192GB

Setting Host for cassandra

Setting up the servers and update /etc/hosts as below.
#Adding CASSANDRA NODES
10.10.18.35    CASSANDRA01      #SEED
10.10.18.93    CASSANDRA02      #Worker
10.10.18.98    CASSANDRA03      #Worker

Updating hostname on all servers.

Update hostnames as required.
sudo vim /etc/sysconfig/network
Update hostname as below, do the same in all servers [CASSANDRA01CASSANDRA02,CASSANDRA03].
NETWORKING=yes
HOSTNAME=CASSANDRA01
To update the hostname without a reboot execute below command.
sudo hostname CASSANDRA01
NOTE : hostname command will keep the hostname till the next reboot. So its required that we update/etc/sysconfig/network file.

Creating cassandra user with sudo permissions.

Have a script which will create a user on server.
wget https://raw.githubusercontent.com/zubayr/create_user_script/master/create_user_script.sh
sh create_user_script.sh -s cassandra
This will create a cassendra user, with sudo permissions.

Creating passwordless entry from SEED (CASSANDRA01) to other servers.

Create a rsa key on CASSANDRA01
ssh-keygen -t rsa
Create .ssh directory on other 2 servers.
ssh cassandra@CASSANDRA02 mkdir -p .ssh
ssh cassandra@CASSANDRA03 mkdir -p .ssh
Add the id_rsa.pub to authorized_keys
cat ~/.ssh/id_rsa.pub | ssh cassandra@CASSANDRA02 'cat >> .ssh/authorized_keys'
cat ~/.ssh/id_rsa.pub | ssh cassandra@CASSANDRA03 'cat >> .ssh/authorized_keys'
Make sure we have the right permissions.
ssh cassandra@CASSANDRA02 chmod 744 -R .ssh 
ssh cassandra@CASSANDRA03 chmod 744 -R .ssh 
Testing.
ssh cassandra@CASSANDRA02
ssh cassandra@CASSANDRA03

Extracting Files.

Extracting Files to opt and creating a link.
sudo tar xvzf apache-cassandra-2.1.3-bin.tar.gz -C /opt
sudo ln -s /opt/apache-cassandra-2.1.3 /opt/cassandra
sudo chown cassandra:cassandra -R /opt/cassandra
sudo chown cassandra:cassandra -R /opt/apache-cassandra-2.1.3
Creating Required Directories.
sudo mkdir -p /data1/cassandra/commitlog
sudo mkdir -p /data1/cassandra/data
sudo mkdir -p /data1/cassandra/saved_cahes

Updating Configuration File.

Setting initial_token as below.

Node 0: 0 Node 1: 3074457345618258602 Node 2: 6148914691236517205

On Node CASSANDRA01

cluster_name: 'MyCassandraCluster'
initial_token: 0
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
         - seeds: "10.10.18.35"
listen_address: 10.10.18.35
endpoint_snitch: SimpleSnitch

data_file_directories:
    - /data1/cassandra/data

commitlog_directory: /data1/cassandra/commitlog
saved_caches_directory: /data1/cassandra/saved_caches

On Node CASSANDRA02

cluster_name: 'MyCassandraCluster'
initial_token: 3074457345618258602
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
         - seeds: "10.10.18.35"
listen_address: 10.10.18.93
endpoint_snitch: SimpleSnitch

data_file_directories:
    - /data1/cassandra/data

commitlog_directory: /data1/cassandra/commitlog
saved_caches_directory: /data1/cassandra/saved_caches

On Node CASSANDRA03

cluster_name: 'MyCassandraCluster'
initial_token: 6148914691236517205
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
         - seeds: "10.10.18.35"
listen_address: 10.10.18.98
endpoint_snitch: SimpleSnitch

data_file_directories:
    - /data1/cassandra/data

commitlog_directory: /data1/cassandra/commitlog
saved_caches_directory: /data1/cassandra/saved_caches

Starting cassandra.

On Server CASSANDRA01.
sh /opt/cassandra/bin/cassandra
Wait till the server initialize and then start rest of nodes.
On Server CASSANDRA02.
sh /opt/cassandra/bin/cassandra
On Server CASSANDRA03.
sh /opt/cassandra/bin/cassandra

Checking Cluster Information.

[cassandra@CASSANDRA01 bin]$ ./nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens  Owns (effective)  Host ID                               Rack
UN  10.10.18.98  72.09 KB   1       33.3%             1a5a0c77-b5e6-4057-87b4-a8e788786244  rack1
UN  10.10.18.35  46.24 KB   1       83.3%             67de1b1f-8070-48c1-ad88-2c0d4dd7a988  rack1
UN  10.10.18.93  55.64 KB   1       83.3%             7fba7cd0-6f99-4ce8-8194-c9a8b23488cd  rack1

Logging into CQL Shell.

We need to export CQLSH_HOST
[cassandra@CASSANDRA01 bin]$ export CQLSH_HOST=10.10.18.35
[cassandra@CASSANDRA01 bin]$ cqlsh
Connected to CassandraJIOCluster at 10.10.18.35:9042.
[cqlsh 5.0.1 | Cassandra 2.1.3 | CQL spec 3.2.0 | Native protocol v3]
Use HELP for help.
cqlsh>

Data Location on CASSANDRA01, CASSANDRA02, CASSANDRA03

[cassandra@CASSANDRA01 bin]$ ls -l /data1/cassandra/
total 12
drwxr-xr-x 2 cassandra cassandra 4096 Mar 19 14:23 commitlog
drwxr-xr-x 4 cassandra cassandra 4096 Mar 19 14:23 data
drwxr-xr-x 2 cassandra cassandra 4096 Mar 19 13:18 saved_caches
[cassandra@CASSANDRA01 bin]$

Performace Tuning.

Updating cassandra.yaml file.

# For workloads with more data than can fit in memory, Cassandra's
# bottleneck will be reads that need to fetch data from
# disk. "concurrent_reads" should be set to (16 * number_of_drives) in
# order to allow the operations to enqueue low enough in the stack
# that the OS and drives can reorder them. Same applies to
# "concurrent_counter_writes", since counter writes read the current
# values before incrementing and writing them back.
#
# On the other hand, since writes are almost never IO bound, the ideal
# number of "concurrent_writes" is dependent on the number of cores in
# your system; (8 * number_of_cores) is a good rule of thumb.

#concurrent_reads: 32
#concurrent_writes: 32

# Change as we had a 40core machine which calculates to 240.
concurrent_reads: 32
concurrent_writes: 240
concurrent_counter_writes: 32

Updating cassandra-env.sh file.

# Override these to set the amount of memory to allocate to the JVM at
# start-up. For production use you may wish to adjust this for your
# environment. MAX_HEAP_SIZE is the total amount of memory dedicated
# to the Java heap; HEAP_NEWSIZE refers to the size of the young
# generation. Both MAX_HEAP_SIZE and HEAP_NEWSIZE should be either set
# or not (if you set one, set the other).
#
# The main trade-off for the young generation is that the larger it
# is, the longer GC pause times will be. The shorter it is, the more
# expensive GC will be (usually).
#
# The example HEAP_NEWSIZE assumes a modern 8-core+ machine for decent pause
# times. If in doubt, and if you do not particularly want to tweak, go with
# 100 MB per physical CPU core.

# Important is the HEAP_NEWSIZE 100MB * number of Core (40 cores in our case)

#MAX_HEAP_SIZE="4G"
#HEAP_NEWSIZE="800M"
MAX_HEAP_SIZE="15G"
HEAP_NEWSIZE="4G"

Updating cassandra-topology.properties file.

If the server are in Data Center which in different location then we need to update this file as well. Also specify rack in that DC.
Cassandra
{{Node IP}}={{Data Center}}:{{Rack}}.
NOTE : This has to match with the cassendra-rackdc.properties file.
10.10.18.35=DC1:RAC1
10.10.18.93=DC2:RAC1
10.10.18.98=DC2:RAC2
When using this format we need to update cassendra-rackdc.properties and use endpoint_snitch: asGossipingPropertyFileSnitch in the cassandra.yaml

Useful Links

https://www.digitalocean.com/community/tutorials/how-to-configure-a-multi-node-cluster-with-cassandra-on-a-ubuntu-vps
https://www.datastax.com/documentation/cassandra/1.2/cassandra/initialize/initializeSingleDS.html
http://www.rackspace.com/knowledge_center/article/centos-hostname-change
http://www.datastax.com/documentation/cassandra/2.0/cassandra/initialize/initializeSingleDS.html
http://www.datastax.com/documentation/cassandra/2.1/cassandra/configuration/configCassandra_yaml_r.html
http://whatizee.blogspot.in/2013/12/passwordless-login-from-ahmedamd-to.html?q=passwordless

Comments

Popular posts from this blog

Cloudera Manager - Duplicate entry 'zookeeper' for key 'NAME'.

We had recently built a cluster using cloudera API’s and had all the services running on it with Kerberos enabled. Next we had a requirement to add another kafka cluster to our already exsisting cluster in cloudera manager. Since it is a quick task to get the zookeeper and kafka up and running. We decided to get this done using the cloudera manager instead of the API’s. But we faced the Duplicate entry 'zookeeper' for key 'NAME' issue as described in the bug below. https://issues.cloudera.org/browse/DISTRO-790 I have set up two clusters that share a Cloudera Manger. The first I set up with the API and created the services with capital letter names, e.g., ZOOKEEPER, HDFS, HIVE. Now, I add the second cluster using the Wizard. Add Cluster->Select Hosts->Distribute Parcels->Select base HDFS Cluster install On the next page i get SQL errros telling that the services i want to add already exist. I suspect that the check for existing service names does n

Zabbix History Table Clean Up

Zabbix history table gets really big, and if you are in a situation where you want to clean it up. Then we can do so, using the below steps. Stop zabbix server. Take table backup - just in case. Create a temporary table. Update the temporary table with data required, upto a specific date using epoch . Move old table to a different table name. Move updated (new temporary) table to original table which needs to be cleaned-up. Drop the old table. (Optional) Restart Zabbix Since this is not offical procedure, but it has worked for me so use it at your own risk. Here is another post which will help is reducing the size of history tables - http://zabbixzone.com/zabbix/history-and-trends/ Zabbix Version : Zabbix v2.4 Make sure MySql 5.1 is set with InnoDB as innodb_file_per_table=ON Step 1 Stop the Zabbix server sudo service zabbix-server stop Script. echo "------------------------------------------" echo " 1. Stopping Zabbix Server &quo

Access Filter in SSSD `ldap_access_filter` [SSSD Access denied / Permission denied ]

Access Filter Setup with SSSD ldap_access_filter (string) If using access_provider = ldap , this option is mandatory. It specifies an LDAP search filter criteria that must be met for the user to be granted access on this host. If access_provider = ldap and this option is not set, it will result in all users being denied access. Use access_provider = allow to change this default behaviour. Example: access_provider = ldap ldap_access_filter = memberOf=cn=allowed_user_groups,ou=Groups,dc=example,dc=com Prerequisites yum install sssd Single LDAP Group Under domain/default in /etc/sssd/sssd.conf add: access_provider = ldap ldap_access_filter = memberOf=cn=Group Name,ou=Groups,dc=example,dc=com Multiple LDAP Groups Under domain/default in /etc/sssd/sssd.conf add: access_provider = ldap ldap_access_filter = (|(memberOf=cn=System Adminstrators,ou=Groups,dc=example,dc=com)(memberOf=cn=Database Users,ou=Groups,dc=example,dc=com)) ldap_access_filter accepts standa