Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id B8F22200CE0 for ; Thu, 27 Jul 2017 02:49:05 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id B7510169DFB; Thu, 27 Jul 2017 00:49:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 08DE8169DFA for ; Thu, 27 Jul 2017 02:49:04 +0200 (CEST) Received: (qmail 76567 invoked by uid 500); 27 Jul 2017 00:49:04 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 76553 invoked by uid 99); 27 Jul 2017 00:49:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 27 Jul 2017 00:49:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 8C0D5C02BE for ; Thu, 27 Jul 2017 00:49:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id fiUam-FIAayb for ; Thu, 27 Jul 2017 00:49:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 9012E5FCC6 for ; Thu, 27 Jul 2017 00:49:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E3B80E06BD for ; Thu, 27 Jul 2017 00:49:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 4E3F724822 for ; Thu, 27 Jul 2017 00:49:00 +0000 (UTC) Date: Thu, 27 Jul 2017 00:49:00 +0000 (UTC) From: "Michael Han (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Assigned] (ZOOKEEPER-2849) Quorum port binding needs exponential back-off retry MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 27 Jul 2017 00:49:05 -0000 [ https://issues.apache.org/jira/browse/ZOOKEEPER-2849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Han reassigned ZOOKEEPER-2849: -------------------------------------- Assignee: Brian Lininger > Quorum port binding needs exponential back-off retry > ---------------------------------------------------- > > Key: ZOOKEEPER-2849 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2849 > Project: ZooKeeper > Issue Type: Improvement > Components: quorum > Affects Versions: 3.4.6, 3.5.3 > Reporter: Brian Lininger > Assignee: Brian Lininger > Priority: Minor > > Recently we upgraded the AWS instance type we use for running out ZooKeeper nodes, and by doing so we're intermittently hitting an issue where ZooKeeper cannot bind to the server election port because the IP is incorrect. This is due to name resolution in Route53 not being in sync when ZooKeeper starts on the more powerful EC2 instances. Currently in QuorumCnxManager.Listener, we only attempt to bind 3 times with a 1s sleep between retries, which is not long enough. > I'm proposing to change this to follow an exponential back-off type strategy where each failed attempt causes a longer sleep between retry attempts. This would allow for Zookeeper to gracefully recover when the host is misconfigured, and subsequently corrected, without requiring the process to be restarted while also minimizing the impact to the running instance. -- This message was sent by Atlassian JIRA (v6.4.14#64029)