Skip to main content

SFTP Data Collector

SFTP Data Collector
Easy way to collect files recursively over a sftp server is to connect to the server over scp and do scp -r.
Problem was that the device we were connecting to did not support recursive :( over a regEx expression.

Example:

scp -r ahmed@remote-host:/home/ahmed/*file_123*
Was not working. Here is a simple SFTP Data Collector script.
This script can be used, if the source device/server is unable to get file recursively.

Step in this script:

  1. get listing of the files present.
  2. select files required from the list using a reg_ex or a pattern.
  3. Download select files. Below is the command usage.

Usage:

usage: sftp_data_collector.py [-h] -sh SRC_HOST_NAME -su SRC_USERNAME
                              (-sp SRC_PASSWORD | -es) -dh DEST_HOST_NAME -du
                              DEST_USERNAME (-dp DEST_PASSWORD | -ed)
                              [-c SRC_DIRECTORY] [-y DEST_DIRECTORY]
                              (-t YYYYMMDD_HH | -p PATTERN_IN_FILE) [-d]
                              [--version]

SFTP Data Collector.
----------------------

This script can be used, if the source device/server is unable to "get" file recursively.
To use password from environment, set values for :

    PASS_SRC for source.
    PASS_DEST for destination.

Steps in this script:

    1. "get" listing of the files present.
    2. select files required from the list using a reg_ex or a pattern. (currently this is yyyymmdd_hh)
    3. Download select files.

----------------------

optional arguments:
  -h, --help            show this help message and exit
  -sh SRC_HOST_NAME, --src-host-name SRC_HOST_NAME
                        Source Host name to get Files from.
  -su SRC_USERNAME, --src-username SRC_USERNAME
                        Source Host - Username.
  -sp SRC_PASSWORD, --src-password SRC_PASSWORD
                        Source Host - Password.
  -es, --src-env-password
                        Source Host - Password. Pick From Environment Variable
                        PASS_SRC
  -dh DEST_HOST_NAME, --dest-host-name DEST_HOST_NAME
                        Destination Host name to send Files to.
  -du DEST_USERNAME, --dest-username DEST_USERNAME
                        Destination Host - Username.
  -dp DEST_PASSWORD, --dest-password DEST_PASSWORD
                        Destination Host - Password.
  -ed, --dest-env-password
                        Destination Host - Password. Pick From Environment
                        Variable PASS_DEST
  -c SRC_DIRECTORY, --cd-src-directory SRC_DIRECTORY
                        Source Directory, If not provided then "."
  -y DEST_DIRECTORY, --cd-dest-directory DEST_DIRECTORY
                        Destination Directory, If not provided then "."
  -t YYYYMMDD_HH, --date-hour YYYYMMDD_HH
                        Enter date_hour in yyyymmdd_hh format,File has
                        date_hour pattern in the filename
  -p PATTERN_IN_FILE, --pattern-in-file PATTERN_IN_FILE
                        Enter pattern in filename which needs to be collected
                        from sFTP server.
  -d, --debug           Running Debug mode - More Verbose
  --version             show program's version number and exit

Code Usage:

Call function with parameters. Might want to do a check for the pattern here.
sftp_data_collector.get_file_from_src (src_host_name, dest_host_name, src_username, src_passwd,
                      dest_username, dest_passwd, pattern,
                      cd_src_directory_args=current_dir_src, cd_dest_directory_args=current_dir_dest)

Code Location:

Code can be found on github : https://github.com/zubayr/sftp_data_collector

Comments

Popular posts from this blog

Cloudera Manager - Duplicate entry 'zookeeper' for key 'NAME'.

We had recently built a cluster using cloudera API’s and had all the services running on it with Kerberos enabled. Next we had a requirement to add another kafka cluster to our already exsisting cluster in cloudera manager. Since it is a quick task to get the zookeeper and kafka up and running. We decided to get this done using the cloudera manager instead of the API’s. But we faced the Duplicate entry 'zookeeper' for key 'NAME' issue as described in the bug below. https://issues.cloudera.org/browse/DISTRO-790 I have set up two clusters that share a Cloudera Manger. The first I set up with the API and created the services with capital letter names, e.g., ZOOKEEPER, HDFS, HIVE. Now, I add the second cluster using the Wizard. Add Cluster->Select Hosts->Distribute Parcels->Select base HDFS Cluster install On the next page i get SQL errros telling that the services i want to add already exist. I suspect that the check for existing service names does n

Zabbix History Table Clean Up

Zabbix history table gets really big, and if you are in a situation where you want to clean it up. Then we can do so, using the below steps. Stop zabbix server. Take table backup - just in case. Create a temporary table. Update the temporary table with data required, upto a specific date using epoch . Move old table to a different table name. Move updated (new temporary) table to original table which needs to be cleaned-up. Drop the old table. (Optional) Restart Zabbix Since this is not offical procedure, but it has worked for me so use it at your own risk. Here is another post which will help is reducing the size of history tables - http://zabbixzone.com/zabbix/history-and-trends/ Zabbix Version : Zabbix v2.4 Make sure MySql 5.1 is set with InnoDB as innodb_file_per_table=ON Step 1 Stop the Zabbix server sudo service zabbix-server stop Script. echo "------------------------------------------" echo " 1. Stopping Zabbix Server &quo

Access Filter in SSSD `ldap_access_filter` [SSSD Access denied / Permission denied ]

Access Filter Setup with SSSD ldap_access_filter (string) If using access_provider = ldap , this option is mandatory. It specifies an LDAP search filter criteria that must be met for the user to be granted access on this host. If access_provider = ldap and this option is not set, it will result in all users being denied access. Use access_provider = allow to change this default behaviour. Example: access_provider = ldap ldap_access_filter = memberOf=cn=allowed_user_groups,ou=Groups,dc=example,dc=com Prerequisites yum install sssd Single LDAP Group Under domain/default in /etc/sssd/sssd.conf add: access_provider = ldap ldap_access_filter = memberOf=cn=Group Name,ou=Groups,dc=example,dc=com Multiple LDAP Groups Under domain/default in /etc/sssd/sssd.conf add: access_provider = ldap ldap_access_filter = (|(memberOf=cn=System Adminstrators,ou=Groups,dc=example,dc=com)(memberOf=cn=Database Users,ou=Groups,dc=example,dc=com)) ldap_access_filter accepts standa