Skip to main content

Getting Started with Cloudera API

This is a basic steps to get connected with cloudera manager.
Here are some of the cool things you can do with Cloudera Manager via the API:
  • Deploy an entire Hadoop cluster programmatically. Cloudera Manager supports HDFS, MapReduce, YARN, ZooKeeper, HBase, Hive, Oozie, Hue, Flume, Impala, Solr, Sqoop, Spark and Accumulo.
  • Configure various Hadoop services and get config validation.
  • Take admin actions on services and roles, such as start, stop, restart, failover, etc. Also available are the more advanced workflows, such as setting up high availability and decommissioning.
  • Monitor your services and hosts, with intelligent service health checks and metrics.
  • Monitor user jobs and other cluster activities.
  • Retrieve timeseries metric data.
  • Search for events in the Hadoop system.
  • Administer Cloudera Manager itself.
  • Download the entire deployment description of your Hadoop cluster in a json file.
    Additionally, with the appropriate licenses, the API lets you:
  • Perform rolling restart and rolling upgrade.
  • Audit user activities and accesses in Hadoop.
  • Perform backup and cross data-center replication for HDFS and Hive.
  • Retrieve per-user HDFS usage report and per-user MapReduce resource usage report.

API Installations.

Getting Connected To Cloudera Manager.

First we get a API handle to use to connect to Cloudera Manager Services and Cluster Services. config is coming from a yaml file.
 @property
 def cm_api_handle(self):

     """
         This method is to create a handle to CM.
     :return: cm_api_handle
     """
     if self._cm_api_handle is None:
         self._cm_api_handle = ApiResource(self.config['cm']['host'],
                                           self.config['cm']['port'],
                                           self.config['cm']['username'],
                                           self.config['cm']['password'],
                                           self.config['cm']['tls'],
                                           version=self.config['cm']['api-version'])
     return self._cm_api_handle
A simple way to write it would be as below. (I am using version=13 here)
 cm_api_handle = api = ApiResource(cm_host, username="admin", password="admin", version=13)
Now we can use this handle to connect to CM or Cluster. Now lets look at what we can do once we have the cloudera_manager object.
We can do all these method calls on this. CM Class
 cloudera_manager = cm_api_handle.get_cloudera_manager()
 cloudera_manager.get_license()

 cm_api_response = cloudera_manager.get_services()
Here API response cm_api_response would be APIService
Similarly we get cluster methods for the cluster but in a List format as there are many services on the cluster.
 cloudera_cluster = cm_api_handle.get_cluster("CLUSTER_NAME")
 cluster_api_response = cloudera_cluster.get_all_services()
Again the response would be a ApiService.

Example Code to get started.

First lets create a yaml file.
 # Cloudera Manager config
 cm:
   host: 127.0.0.1
   port: 7180
   username: admin
   password: admin
   tls: false
   version: 13

 # Basic cluster information
 cluster:
   name: AutomatedHadoopCluster
   version: CDH5
   fullVersion: 5.8.3
   hosts:
      - 127.0.0.1
Next we create script to process that data.
 import yaml
 import sys
 from cm_api.api_client import ApiResource, ApiException


 def fail(msg):
     print (msg)
     sys.exit(1)

 if __name__ == '__main__':

     try:
         with open('cloudera.yaml', 'r') as cluster_yaml:
             config = yaml.load(cluster_yaml)

         api_handle = ApiResource(config['cm']['host'],
                                  config['cm']['port'],
                                  config['cm']['username'],
                                  config['cm']['password'],
                                  config['cm']['tls'],
                                  version=config['cm']['version'])

         # Checking CM services
         cloudera_manager = api_handle.get_cloudera_manager()
         cm_api_response = cloudera_manager.get_service()

         print "\nCLOUDERA MANAGER SERVICES\n----------------------------"
         print "Complete ApiService: " + str(cm_api_response)
         print "Check URL for details : https://cloudera.github.io/cm_api/apidocs/v15/ns0_apiService.html"
         print "name: " + str(cm_api_response.name)
         print "type: " + str(cm_api_response.type)
         print "serviceUrl: " + str(cm_api_response.serviceUrl)
         print "roleInstancesUrl: " + str(cm_api_response.roleInstancesUrl)
         print "displayName: " + str(cm_api_response.displayName)

         # Checking Cluster services
         cm_cluster = api_handle.get_cluster(config['cluster']['name'])
         cluster_api_response = cm_cluster.get_all_services()
         print "\n\nCLUSTER SERVICES\n----------------------------"
         for api_service_list in cluster_api_response:
             print "Complete ApiService: " + str(api_service_list)
             print "Check URL for details : https://cloudera.github.io/cm_api/apidocs/v15/ns0_apiService.html"
             print "name: " + str(api_service_list.name)
             print "type: " + str(api_service_list.type)
             print "serviceUrl: " + str(api_service_list.serviceUrl)
             print "roleInstancesUrl: " + str(api_service_list.roleInstancesUrl)
             print "displayName: " + str(api_service_list.displayName)

     except IOError as e:
         fail("Error creating cluster {}".format(e))
Output
 CLOUDERA MANAGER SERVICES
 ----------------------------
 Complete ApiService: : mgmt (cluster: None)
 Check URL for details : https://cloudera.github.io/cm_api/apidocs/v15/ns0_apiService.html
 name: mgmt
 type: MGMT
 serviceUrl: http://mycmhost.ahmed.com:7180/cmf/serviceRedirect/mgmt
 roleInstancesUrl: http://mycmhost.ahmed.com:7180/cmf/serviceRedirect/mgmt/instances
 displayName: Cloudera Management Service


 CLUSTER SERVICES
 ----------------------------
 Complete ApiService: : ZOOKEEPER (cluster: AutomatedHadoopCluster)
 Check URL for details : https://cloudera.github.io/cm_api/apidocs/v15/ns0_apiService.html
 name: ZOOKEEPER
 type: ZOOKEEPER
 serviceUrl: http://mycmhost.ahmed.com:7180/cmf/serviceRedirect/ZOOKEEPER
 roleInstancesUrl: http://mycmhost.ahmed.com:7180/cmf/serviceRedirect/ZOOKEEPER/instances
 displayName: ZOOKEEPER
This is the basics, we will be build on top of this in coming blog posts.

Comments

Popular posts from this blog

Cloudera Manager - Duplicate entry 'zookeeper' for key 'NAME'.

We had recently built a cluster using cloudera API’s and had all the services running on it with Kerberos enabled. Next we had a requirement to add another kafka cluster to our already exsisting cluster in cloudera manager. Since it is a quick task to get the zookeeper and kafka up and running. We decided to get this done using the cloudera manager instead of the API’s. But we faced the Duplicate entry 'zookeeper' for key 'NAME' issue as described in the bug below. https://issues.cloudera.org/browse/DISTRO-790 I have set up two clusters that share a Cloudera Manger. The first I set up with the API and created the services with capital letter names, e.g., ZOOKEEPER, HDFS, HIVE. Now, I add the second cluster using the Wizard. Add Cluster->Select Hosts->Distribute Parcels->Select base HDFS Cluster install On the next page i get SQL errros telling that the services i want to add already exist. I suspect that the check for existing service names does n

Zabbix History Table Clean Up

Zabbix history table gets really big, and if you are in a situation where you want to clean it up. Then we can do so, using the below steps. Stop zabbix server. Take table backup - just in case. Create a temporary table. Update the temporary table with data required, upto a specific date using epoch . Move old table to a different table name. Move updated (new temporary) table to original table which needs to be cleaned-up. Drop the old table. (Optional) Restart Zabbix Since this is not offical procedure, but it has worked for me so use it at your own risk. Here is another post which will help is reducing the size of history tables - http://zabbixzone.com/zabbix/history-and-trends/ Zabbix Version : Zabbix v2.4 Make sure MySql 5.1 is set with InnoDB as innodb_file_per_table=ON Step 1 Stop the Zabbix server sudo service zabbix-server stop Script. echo "------------------------------------------" echo " 1. Stopping Zabbix Server &quo

Access Filter in SSSD `ldap_access_filter` [SSSD Access denied / Permission denied ]

Access Filter Setup with SSSD ldap_access_filter (string) If using access_provider = ldap , this option is mandatory. It specifies an LDAP search filter criteria that must be met for the user to be granted access on this host. If access_provider = ldap and this option is not set, it will result in all users being denied access. Use access_provider = allow to change this default behaviour. Example: access_provider = ldap ldap_access_filter = memberOf=cn=allowed_user_groups,ou=Groups,dc=example,dc=com Prerequisites yum install sssd Single LDAP Group Under domain/default in /etc/sssd/sssd.conf add: access_provider = ldap ldap_access_filter = memberOf=cn=Group Name,ou=Groups,dc=example,dc=com Multiple LDAP Groups Under domain/default in /etc/sssd/sssd.conf add: access_provider = ldap ldap_access_filter = (|(memberOf=cn=System Adminstrators,ou=Groups,dc=example,dc=com)(memberOf=cn=Database Users,ou=Groups,dc=example,dc=com)) ldap_access_filter accepts standa