Return-Path: X-Original-To: apmail-hawq-commits-archive@minotaur.apache.org Delivered-To: apmail-hawq-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4FB3A18D3B for ; Thu, 17 Dec 2015 03:42:36 +0000 (UTC) Received: (qmail 40635 invoked by uid 500); 17 Dec 2015 03:42:36 -0000 Delivered-To: apmail-hawq-commits-archive@hawq.apache.org Received: (qmail 40594 invoked by uid 500); 17 Dec 2015 03:42:36 -0000 Mailing-List: contact commits-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list commits@hawq.incubator.apache.org Received: (qmail 40585 invoked by uid 99); 17 Dec 2015 03:42:36 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Dec 2015 03:42:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A59C11A0C22 for ; Thu, 17 Dec 2015 03:42:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.247 X-Spam-Level: * X-Spam-Status: No, score=1.247 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.554, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id vaKxDSWhzgmJ for ; Thu, 17 Dec 2015 03:42:20 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with SMTP id 4DD41203BF for ; Thu, 17 Dec 2015 03:42:19 +0000 (UTC) Received: (qmail 40198 invoked by uid 99); 17 Dec 2015 03:42:18 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Dec 2015 03:42:18 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 42EA5DFCF2; Thu, 17 Dec 2015 03:42:18 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: rlei@apache.org To: commits@hawq.incubator.apache.org Date: Thu, 17 Dec 2015 03:42:18 -0000 Message-Id: X-Mailer: ASF-Git Admin Mailer Subject: [1/3] incubator-hawq git commit: HAWQ-254. Fix init standby fails if segments in 'slaves' file not inicialized Repository: incubator-hawq Updated Branches: refs/heads/master 3e46d3d9b -> d9e1578c4 HAWQ-254. Fix init standby fails if segments in 'slaves' file not inicialized Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq/commit/3fac65fe Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq/tree/3fac65fe Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq/diff/3fac65fe Branch: refs/heads/master Commit: 3fac65fe1ba0d95c0f0d883f63539828b2cd24d4 Parents: 3e46d3d Author: rlei Authored: Thu Dec 10 15:36:42 2015 +0800 Committer: rlei Committed: Thu Dec 17 11:41:27 2015 +0800 ---------------------------------------------------------------------- tools/bin/hawq | 2 +- tools/bin/hawq_ctl | 182 ++++++++++++++++++------------ tools/bin/hawqpylib/hawqlib.py | 75 ++++++++++-- tools/bin/hawqstate | 6 +- tools/bin/lib/hawq_bash_functions.sh | 18 +-- tools/bin/lib/hawqinit.sh | 26 ++--- 6 files changed, 199 insertions(+), 110 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/3fac65fe/tools/bin/hawq ---------------------------------------------------------------------- diff --git a/tools/bin/hawq b/tools/bin/hawq index 2931fb9..5254608 100755 --- a/tools/bin/hawq +++ b/tools/bin/hawq @@ -83,7 +83,7 @@ def main(): print START_HELP sys.exit(1) cmd = "%s; hawq_ctl stop %s" % (source_hawq_env, sub_args) - result = local_ssh(cmd) + check_return_code(local_ssh(cmd)) cmd = "%s; hawq_ctl start %s" % (source_hawq_env, sub_args) result = local_ssh(cmd) elif hawq_command == "activate": http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/3fac65fe/tools/bin/hawq_ctl ---------------------------------------------------------------------- diff --git a/tools/bin/hawq_ctl b/tools/bin/hawq_ctl index 993dc57..7ffdb83 100755 --- a/tools/bin/hawq_ctl +++ b/tools/bin/hawq_ctl @@ -16,23 +16,25 @@ # specific language governing permissions and limitations # under the License. -import os -import sys -import re -import logging, time -from optparse import OptionParser -import subprocess -import threading -import Queue -import signal try: + import os + import sys + import re + import logging, time + import subprocess + import threading + import Queue + import signal + from optparse import OptionParser from gppylib.gplog import setup_hawq_tool_logging, quiet_stdout_logging, enable_verbose_logging from gppylib.commands.unix import getLocalHostname, getUserName from gppylib.commands import gp from gppylib import userinput + from gppylib.commands import unix from hawqpylib.hawqlib import local_ssh, HawqCommands, HawqXMLParser, parse_hosts_file,\ - remove_property_xml, sync_hawq_site, check_return_code + remove_property_xml, sync_hawq_site, check_return_code, check_file_exist, check_postgres_running, \ + check_syncmaster_running, create_cluster_directory from hawqpylib.HAWQ_HELP import * from gppylib.db import dbconn from pygresql.pg import DatabaseError @@ -271,14 +273,14 @@ class HawqInit: def _resync_standby(self): logger.info("Re-sync standby") - cmd = "%s; hawq stop cluster -a" % source_hawq_env + cmd = "%s; hawq stop master -a" % source_hawq_env check_return_code(local_ssh(cmd, logger), logger, "Stop hawq cluster failed, exit") cmd = "cd %s; %s; %s/bin/lib/pysync.py -x gpperfmon/data -x pg_log -x db_dumps %s %s:%s" % \ (self.master_data_directory, source_hawq_env, self.GPHOME, self.master_data_directory, self.standby_host_name, self.master_data_directory) result = local_ssh(cmd, logger) check_return_code(result, logger, "Re-sync standby master failed, exit") - cmd = "%s; hawq start cluster -a" % source_hawq_env + cmd = "%s; hawq start master -a" % source_hawq_env result = local_ssh(cmd, logger) check_return_code(result, logger, "Start hawq cluster failed") @@ -299,7 +301,17 @@ class HawqInit: logger.info("Start to init master node: '%s'" % self.master_host_name) check_return_code(local_ssh(master_cmd), logger, "Master init failed, exit", \ "Master init successfully") + if self.standby_host_name.lower() not in ('', 'none'): + check_return_code(self._init_standby(), logger, \ + "Init standby failed, exit", \ + "Init standby successfully") + check_return_code(self._init_all_segments(), logger, \ + "Segments init failed, exit", \ + "Segments init successfully on nodes '%s'" % self.host_list) + logger.info("Init HAWQ cluster successfully") + + def _init_all_segments(self): segment_cmd_str = self._get_segment_init_cmd() # Execute segment init command on each segment nodes. logger.info("Init segments in list: %s" % self.host_list) @@ -310,22 +322,12 @@ class HawqInit: scpcmd = "scp %s/etc/_mgmt_config %s:%s/etc/_mgmt_config > /dev/null" % (self.GPHOME, host, self.GPHOME) local_ssh(scpcmd) work_list.append({"func":remote_ssh,"args":(segment_cmd_str, host, self.user, q)}) - work_list.append({"func":check_progress,"args":(q, self.hosts_count_number, 'init', self.quiet)}) - node_init = HawqCommands(name='HAWQ-SEG-INIT') + work_list.append({"func":check_progress,"args":(q, self.hosts_count_number, 'init', 0, self.quiet)}) + node_init = HawqCommands(name='HAWQ', action_name = 'init', logger = logger) node_init.get_function_list(work_list) node_init.start() - if node_init.return_flag != 0: - logger.info("Segments init failed, exit") - sys.exit(1) - else: - logger.info("Segments init successfully on nodes '%s'" % self.host_list) - if self.standby_host_name.lower() not in ('', 'none'): - check_return_code(self._init_standby(), logger, \ - "Init standby failed, exit", \ - "Init standby successfully") - logger.info("Init HAWQ cluster successfully") - return None + return node_init.return_flag def run(self): if self.node_type == "master": @@ -416,6 +418,7 @@ class HawqStart: logger.info("No standby host configured") self.standby_host_name = '' + def _start_master_cmd(self): logger.info("Start master service") if self.masteronly: @@ -474,22 +477,10 @@ class HawqStart: check_return_code(self.start_master(), logger, "Master start failed, exit", \ "Master started successfully") - segment_cmd_str = self._start_segment_cmd() - logger.info("Start segments in list: %s" % self.host_list) - work_list = [] - q = Queue.Queue() - for host in self.host_list: - work_list.append({"func":remote_ssh,"args":(segment_cmd_str, host, self.user, q)}) - work_list.append({"func":check_progress,"args":(q, self.hosts_count_number, 'start', self.quiet)}) - node_init = HawqCommands() - node_init.get_function_list(work_list) - node_init.start() - if node_init.return_flag != 0: - logger.error("Segments start failed") - else: - logger.info("Segments started successfully") + segments_return_flag = self._start_all_segments() + if segments_return_flag: logger.info("HAWQ cluster started successfully") - return node_init.return_flag + return segments_return_flag def _start_all_segments(self): logger.info("Start all the segments in hawq cluster") @@ -499,8 +490,8 @@ class HawqStart: q = Queue.Queue() for host in self.host_list: work_list.append({"func":remote_ssh,"args":(segment_cmd_str, host, self.user, q)}) - work_list.append({"func":check_progress,"args":(q, self.hosts_count_number, 'start', self.quiet)}) - node_init = HawqCommands(name='HAWQ-SEG-START') + work_list.append({"func":check_progress,"args":(q, self.hosts_count_number, 'start', 0, self.quiet)}) + node_init = HawqCommands(name = 'HAWQ', action_name = 'start', logger = logger) node_init.get_function_list(work_list) node_init.start() logger.info("Total threads return value is : %d" % node_init.return_flag) @@ -555,6 +546,7 @@ class HawqStop: self.hawq_dict = hawq_dict self.hawq_reload = opts.hawq_reload self.lock = threading.Lock() + self.skip_segments = [] self._get_config() def _get_config(self): @@ -585,6 +577,34 @@ class HawqStop: logger.info("No standby host configured") self.standby_host_name = '' + + def _check_segment_running(self, host): + + segment_running = True + segment_pid_file_path = self.segment_data_directory + '/postmaster.pid' + + if check_file_exist(segment_pid_file_path, host, logger): + if not check_postgres_running(self.GPHOME, self.segment_data_directory, self.user, host, logger): + logger.warning("Have a postmaster.pid file but no segment process running") + + lockfile="/tmp/.s.PGSQL.%s" % self.segment_port + logger.info("Clearing segment instance lock files and pid file") + cmd = "rm -rf %s %s" % (lockfile, segment_pid_file_path) + remote_ssh(cmd, host, self.user) + segment_running = False + else: + segment_running = True + + else: + if check_postgres_running(self.GPHOME, self.segment_data_directory, self.user, host, logger): + logger.warning("postmaster.pid file does not exist, but hawq process running.") + segment_running = True + else: + logger.warning("HAWQ segment seems not running on %s, skip" % host) + segment_running = False + + return segment_running + def _stop_master_cmd(self): logger.info("Stop hawq master") if self.hawq_reload: @@ -615,9 +635,14 @@ class HawqStop: return cmd_str def _stop_segment(self): - cmd = self._stop_segment_cmd() - result = remote_ssh(cmd, 'localhost', self.user) - return result + segment_running = self._check_segment_running('localhost') + if segment_running: + cmd = self._stop_segment_cmd() + result = remote_ssh(cmd, 'localhost', self.user) + return result + else: + logger.warning('') + return True def _stop_standby_cmd(self): logger.info("Stop hawq standby master") @@ -638,47 +663,44 @@ class HawqStop: def _stopAll(self): logger.info("Stop hawq cluster") - result = self._stop_master() - if result != 0: + master_result = self._stop_master() + if master_result != 0: logger.error("Master stop failed") else: logger.info("Master stopped successfully") if self.standby_host_name.lower() not in ('', 'none'): - result = self._stop_standby() - if result != 0: + standby_result = self._stop_standby() + if standby_result != 0: logger.error("Standby master stop failed") else: logger.info("Standby master stopped successfully") - segment_cmd_str = self._stop_segment_cmd() - # Execute segment start command on each segment nodes. - logger.info("Stop segments in list: %s" % self.host_list) - work_list = [] - q = Queue.Queue() - for host in self.host_list: - work_list.append({"func":remote_ssh,"args":(segment_cmd_str, host, self.user, q)}) - work_list.append({"func":check_progress,"args":(q, self.hosts_count_number, 'stop', self.quiet)}) - node_init = HawqCommands() - node_init.get_function_list(work_list) - node_init.start() - if node_init.return_flag != 0: - logger.error("Segments stop failed") + # Execute segment stop command on each node. + segments_return_flag = self._stopAllSegments() + cluster_result = master_result + standby_result + segments_return_flag + if cluster_result != 0: + logger.error("Cluster stop failed") else: - logger.info("Segments stopped successfully") - return node_init.return_flag + logger.info("Cluster stopped successfully") + return cluster_result - def _stopAllSegments(self): - logger.info("Stop hawq cluster") + def _stopAllSegments(self): segment_cmd_str = self._stop_segment_cmd() - # Execute segment start command on each segment nodes. + # Execute segment stop command on each nodes. logger.info("Stop segments in list: %s" % self.host_list) work_list = [] + self.running_segment_num = self.hosts_count_number q = Queue.Queue() for host in self.host_list: - work_list.append({"func":remote_ssh,"args":(segment_cmd_str, host, self.user, q)}) - work_list.append({"func":check_progress,"args":(q, self.hosts_count_number, 'stop', self.quiet)}) - node_init = HawqCommands() + if self._check_segment_running(host): + work_list.append({"func":remote_ssh,"args":(segment_cmd_str, host, self.user, q)}) + else: + self.skip_segments.append(host) + self.running_segment_num = self.running_segment_num - 1 + + work_list.append({"func":check_progress,"args":(q, self.running_segment_num, 'stop', len(self.skip_segments), self.quiet)}) + node_init = HawqCommands(name = 'HAWQ', action_name = 'stop', logger = logger) node_init.get_function_list(work_list) node_init.start() if node_init.return_flag != 0: @@ -729,6 +751,8 @@ def get_args(): opts.node_type = ARGS[1] if opts.node_type in ['master', 'standby', 'segment', 'cluster', 'allsegments'] and opts.hawq_command in ['start', 'stop', 'restart', 'init', 'activate']: + if opts.log_dir and not os.path.exists(opts.log_dir): + os.makedirs(opts.log_dir) global logger, log_filename if opts.verbose: enable_verbose_logging() @@ -763,6 +787,17 @@ def get_args(): hawqsite = HawqXMLParser(opts.GPHOME) hawqsite.get_all_values() hawq_dict = hawqsite.hawq_dict + cluster_host_list = list() + cluster_host_list.append(hawq_dict['hawq_master_address_host']) + + if 'hawq_standby_address_host' in hawq_dict: + cluster_host_list.append(hawq_dict['hawq_standby_address_host']) + + segments_host_list = parse_hosts_file(opts.GPHOME) + for host in segments_host_list: + cluster_host_list.append(host) + + create_cluster_directory(opts.log_dir, cluster_host_list) Capital_Action = opts.hawq_command.title() logger.info("%s hawq with args: %s" % (Capital_Action, ARGS)) @@ -793,7 +828,7 @@ def remote_ssh_nowait(cmd, host, user): return result -def check_progress(q, total_num, action, quiet=False): +def check_progress(q, total_num, action, skip_num = 0, quiet=False): working_num = total_num success_num = 0 pnum = 0 @@ -811,7 +846,10 @@ def check_progress(q, total_num, action, quiet=False): time.sleep(1) if not quiet: sys.stdout.write("\n") - logger.info("%d of %d segments %s successfully" % (success_num, total_num, action)) + if skip_num != 0: + logger.info("%d of %d segments %s successfully, %d segments %s skipped" % (success_num, total_num, action, skip_num, action)) + else: + logger.info("%d of %d segments %s successfully" % (success_num, total_num, action)) return 0 http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/3fac65fe/tools/bin/hawqpylib/hawqlib.py ---------------------------------------------------------------------- diff --git a/tools/bin/hawqpylib/hawqlib.py b/tools/bin/hawqpylib/hawqlib.py index 23fb381..59f909b 100755 --- a/tools/bin/hawqpylib/hawqlib.py +++ b/tools/bin/hawqpylib/hawqlib.py @@ -25,17 +25,22 @@ from gppylib.db import dbconn class HawqCommands(object): - def __init__(self, function_list=None, name='HAWQ'): + def __init__(self, function_list=None, name='HAWQ', action_name = 'execute', logger = None): self.function_list = function_list self.name = name + self.action_name = action_name self.return_flag = 0 self.thread_list = [] + if logger: + self.logger = logger def get_function_list(self, function_list): self.function_list = function_list def exec_function(self, func, *args, **kwargs): result = func(*args, **kwargs) + if result != 0 and self.logger and func.__name__ == 'remote_ssh': + self.logger.error("%s %s failed on %s" % (self.name, self.action_name, args[1])) self.return_flag += result def start(self): @@ -113,6 +118,7 @@ def local_ssh(cmd, logger = None, warning = False): def remote_ssh(cmd, host, user): + if user == "": remote_cmd_str = "ssh -o 'StrictHostKeyChecking no' %s \"%s\"" % (host, cmd) else: @@ -133,19 +139,41 @@ def check_return_code(result, logger = None, error_msg = None, info_msg = None, sys.exit(0) return result -def parse_hosts_file(GPHOME): - host_file = "%s/etc/slaves" % GPHOME - host_list = list() - with open(host_file) as f: - hosts = f.readlines() - for host in hosts: - host = host.split("#",1)[0].strip() - if host: - host_list.append(host) - return host_list + +def check_postgres_running(GPHOME, data_directory, user, host = 'localhost', logger = None): + cmd='ps -ef | grep postgres | grep %s | grep -v grep > /dev/null || exit 1;' % data_directory + result = remote_ssh(cmd, host, user) + if result == 0: + return True + else: + if logger: + logger.debug("postgres process is not running on %s" % host) + return False + + +def check_syncmaster_running(GPHOME, data_directory, user, host = 'localhost', logger = None): + cmd='ps -ef | grep gpsyncmaster | grep %s | grep -v grep > /dev/null || exit 1;' % data_directory + result = remote_ssh(cmd, host, user) + if result == 0: + return True + else: + if logger: + logger.debug("syncmaster process is not running on %s" % host) + return False + + +def check_file_exist(file_path, host = 'localhost', logger = None): + cmd = "if [ -f %s ]; then exit 0; else exit 1;fi" % file_path + result = remote_ssh(cmd, host, '') + if result == 0: + return True + else: + if logger: + logger.debug("%s not exist on %s." % (file_path, host)) + return False -def check_file_exist(file_path, hostlist, user): +def check_file_exist_list(file_path, hostlist, user): if user == "": user = os.getenv('USER') file_exist_host_list = {} @@ -156,6 +184,29 @@ def check_file_exist(file_path, hostlist, user): return file_exist_host_list +def create_cluster_directory(directory_path, hostlist, user = ''): + if user == "": + user = os.getenv('USER') + file_exist_host_list = {} + for host in hostlist: + try: + remote_ssh("if [ ! -d %s ]; then mkdir -p %s; fi;" % (directory_path, directory_path), host, user) + except : + pass + + +def parse_hosts_file(GPHOME): + host_file = "%s/etc/slaves" % GPHOME + host_list = list() + with open(host_file) as f: + hosts = f.readlines() + for host in hosts: + host = host.split("#",1)[0].strip() + if host: + host_list.append(host) + return host_list + + def remove_property_xml(property_name, org_config_file): tree = ElementTree() tree.parse(org_config_file) http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/3fac65fe/tools/bin/hawqstate ---------------------------------------------------------------------- diff --git a/tools/bin/hawqstate b/tools/bin/hawqstate index a445196..3344f20 100755 --- a/tools/bin/hawqstate +++ b/tools/bin/hawqstate @@ -19,7 +19,7 @@ import os import sys from optparse import Option, OptionParser -from hawqpylib.hawqlib import HawqXMLParser, parse_hosts_file, check_file_exist +from hawqpylib.hawqlib import HawqXMLParser, parse_hosts_file, check_file_exist_list from gppylib.db import dbconn from pygresql.pg import DatabaseError from gppylib.gplog import get_default_logger, setup_hawq_tool_logging, quiet_stdout_logging, enable_verbose_logging @@ -89,7 +89,7 @@ def show_brief_status(hawq_site, segment_list, standby_host): total_seg_num_valid = len(valid_seg_host_list) total_seg_num_failure = total_seg_num - total_seg_num_valid seg_pid_file_path = hawq_site.hawq_dict['hawq_segment_directory'] + "/postmaster.pid" - total_seg_pid_file_found = len(check_file_exist(seg_pid_file_path, segment_list, '' )) + total_seg_pid_file_found = len(check_file_exist_list(seg_pid_file_path, segment_list, '' )) total_seg_pid_file_miss = total_seg_num - total_seg_pid_file_found logger.info("-HAWQ instance status summary") logger.info("-----------------------------------------------------") @@ -141,4 +141,4 @@ if __name__ == '__main__': if options.show_brief_status: show_brief_status(hawq_site, segment_list, standby_host="None") else: - show_brief_status(hawq_site, segment_list, standby_host="None") \ No newline at end of file + show_brief_status(hawq_site, segment_list, standby_host="None") http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/3fac65fe/tools/bin/lib/hawq_bash_functions.sh ---------------------------------------------------------------------- diff --git a/tools/bin/lib/hawq_bash_functions.sh b/tools/bin/lib/hawq_bash_functions.sh index 2a4f51e..b4f5765 100755 --- a/tools/bin/lib/hawq_bash_functions.sh +++ b/tools/bin/lib/hawq_bash_functions.sh @@ -17,15 +17,15 @@ # under the License. #Check that SHELL is /bin/bash -if [ $SHELL != /bin/bash ] && [ `ls -al /bin/sh|grep -c bash` -ne 1 ];then - echo "[FATAL]:-Scripts must be run by a user account that has SHELL=/bin/bash" - if [ -f /bin/bash ];then - echo "[INFO]:-/bin/bash exists, please update user account shell" - else - echo "[WARN]:-/bin/bash does not exist, does bash need to be installed?" - fi - exit 2 -fi +#if [ $SHELL != /bin/bash ] && [ `ls -al /bin/sh|grep -c bash` -ne 1 ];then +# echo "[FATAL]:-Scripts must be run by a user account that has SHELL=/bin/bash" +# if [ -f /bin/bash ];then +# echo "[INFO]:-/bin/bash exists, please update user account shell" +# else +# echo "[WARN]:-/bin/bash does not exist, does bash need to be installed?" +# fi +# exit 2 +#fi declare -a CMDPATH CMDPATH=(/usr/kerberos/bin /usr/sfw/bin /opt/sfw/bin /usr/local/bin /bin /usr/bin /sbin /usr/sbin /usr/ucb /sw/bin) http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/3fac65fe/tools/bin/lib/hawqinit.sh ---------------------------------------------------------------------- diff --git a/tools/bin/lib/hawqinit.sh b/tools/bin/lib/hawqinit.sh index 61d6766..741b525 100755 --- a/tools/bin/lib/hawqinit.sh +++ b/tools/bin/lib/hawqinit.sh @@ -284,16 +284,7 @@ standby_init() { fi LOG_MSG "" - LOG_MSG "[INFO]:-Stopping HAWQ cluster" - ${SSH} -o 'StrictHostKeyChecking no' ${hawqUser}@${master_host_name} \ - "${SOURCE_PATH}; hawq stop allsegments -a -M fast;" >> ${STANDBY_LOG_FILE} 2>&1 - if [ $? -ne 0 ] ; then - LOG_MSG "[ERROR]:-Stop segments failed" verbose - exit 1 - else - LOG_MSG "[INFO]:-HAWQ segments stopped" verbose - fi - + LOG_MSG "[INFO]:-Stopping HAWQ master" ${SSH} -o 'StrictHostKeyChecking no' ${hawqUser}@${master_host_name} \ "${SOURCE_PATH}; hawq stop master -a -M fast;" >> ${STANDBY_LOG_FILE} 2>&1 if [ $? -ne 0 ] ; then @@ -361,12 +352,21 @@ standby_init() { fi ${SSH} -o 'StrictHostKeyChecking no' ${hawqUser}@${master_host_name} \ - "${SOURCE_PATH}; hawq start cluster -a;" >> ${STANDBY_LOG_FILE} + "${SOURCE_PATH}; hawq start master -a;" >> ${STANDBY_LOG_FILE} + if [ $? -ne 0 ] ; then + LOG_MSG "[ERROR]:-Start HAWQ master failed" verbose + exit 1 + else + LOG_MSG "[INFO]:-HAWQ master started" verbose + fi + + ${SSH} -o 'StrictHostKeyChecking no' ${hawqUser}@${master_host_name} \ + "${SOURCE_PATH}; hawq start standby -a;" >> ${STANDBY_LOG_FILE} if [ $? -ne 0 ] ; then - LOG_MSG "[ERROR]:-Start HAWQ cluster failed" verbose + LOG_MSG "[ERROR]:-Start HAWQ standby failed" verbose exit 1 else - LOG_MSG "[INFO]:-HAWQ cluster started" verbose + LOG_MSG "[INFO]:-HAWQ standby started" verbose fi ${SSH} -o 'StrictHostKeyChecking no' ${hawqUser}@${master_host_name} \