Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C2208200B52 for ; Mon, 11 Jul 2016 07:20:04 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C0808160A69; Mon, 11 Jul 2016 05:20:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 13DEC160A66 for ; Mon, 11 Jul 2016 07:20:03 +0200 (CEST) Received: (qmail 14543 invoked by uid 500); 11 Jul 2016 05:20:03 -0000 Mailing-List: contact commits-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list commits@hawq.incubator.apache.org Received: (qmail 14534 invoked by uid 99); 11 Jul 2016 05:20:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2016 05:20:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9FB25C000A for ; Mon, 11 Jul 2016 05:20:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx2-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id QKjhI2Mrao_u for ; Mon, 11 Jul 2016 05:20:00 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx2-lw-us.apache.org (ASF Mail Server at mx2-lw-us.apache.org) with SMTP id 164B15F23D for ; Mon, 11 Jul 2016 05:19:59 +0000 (UTC) Received: (qmail 14504 invoked by uid 99); 11 Jul 2016 05:19:59 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2016 05:19:59 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 3DAC5E09C5; Mon, 11 Jul 2016 05:19:59 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: rlei@apache.org To: commits@hawq.incubator.apache.org Message-Id: <8668dd0ac75f4cd3be3da0ab0fe030bc@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: incubator-hawq git commit: HAWQ-901 Add retries to standby master start check Date: Mon, 11 Jul 2016 05:19:59 +0000 (UTC) archived-at: Mon, 11 Jul 2016 05:20:04 -0000 Repository: incubator-hawq Updated Branches: refs/heads/master e3ea4896b -> c5a3f42fd HAWQ-901 Add retries to standby master start check Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq/commit/c5a3f42f Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq/tree/c5a3f42f Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq/diff/c5a3f42f Branch: refs/heads/master Commit: c5a3f42fdbc98715294dd2add72c79611814398a Parents: e3ea489 Author: rlei Authored: Mon Jul 11 10:22:29 2016 +0800 Committer: rlei Committed: Mon Jul 11 13:18:46 2016 +0800 ---------------------------------------------------------------------- tools/bin/hawq_ctl | 2 +- tools/sbin/hawqstandbywatch.py | 22 ++++++++++++++++------ 2 files changed, 17 insertions(+), 7 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/c5a3f42f/tools/bin/hawq_ctl ---------------------------------------------------------------------- diff --git a/tools/bin/hawq_ctl b/tools/bin/hawq_ctl index 50070f6..211f599 100755 --- a/tools/bin/hawq_ctl +++ b/tools/bin/hawq_ctl @@ -638,7 +638,7 @@ class HawqStart: cmd = self._start_standby_cmd() check_return_code(remote_ssh(cmd, self.standby_host_name, self.user)) cmd = "%s; %s/sbin/hawqstandbywatch.py %s debug" % (source_hawq_env, self.GPHOME, self.master_data_directory) - result = remote_ssh(cmd, self.standby_host_name, self.user) + result = remote_ssh_nowait(cmd, self.standby_host_name, self.user) return result def _check_standby_sync(self): http://git-wip-us.apache.org/repos/asf/incubator-hawq/blob/c5a3f42f/tools/sbin/hawqstandbywatch.py ---------------------------------------------------------------------- diff --git a/tools/sbin/hawqstandbywatch.py b/tools/sbin/hawqstandbywatch.py index 82cf699..ca7ad1d 100755 --- a/tools/sbin/hawqstandbywatch.py +++ b/tools/sbin/hawqstandbywatch.py @@ -102,7 +102,7 @@ class SyncmasterWatcher: self.handles = {} self.maxlines = 1000 - self.timelimit = 5 + self.timelimit = 3 self.delay = 0.1 @@ -188,10 +188,20 @@ class SyncmasterWatcher: break logger.info("checking if syncmaster is running") - pid = gp.getSyncmasterPID('localhost', self.datadir) - if not pid > 0: - logger.warning("syncmaster not running") - return 1 + count = 0 + counter = 20 + while True: + pid = gp.getSyncmasterPID('localhost', self.datadir) + if not pid > 0: + if count >= counter: + logger.error("Standby master start timeout") + return 1 + else: + logger.warning("syncmaster not running, waiting...") + else: + break + count += 1 + time.sleep(3) # syncmaster is running and there are no obvious errors in the log logger.info("syncmaster appears ok, pid %s" % pid) @@ -219,7 +229,7 @@ if __name__ == '__main__': # watch syncmaster logs if len(sys.argv) > 2 and sys.argv[2] == 'debug': - print "Checking standby master status" + logger.info("Checking standby master status") watcher = SyncmasterWatcher( sys.argv[1] ) rc = watcher.monitor_logs() watcher.close()