Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id DADAA200B46 for ; Fri, 1 Jul 2016 12:45:12 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id D98BD160A5D; Fri, 1 Jul 2016 10:45:12 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2A81D160A61 for ; Fri, 1 Jul 2016 12:45:12 +0200 (CEST) Received: (qmail 25502 invoked by uid 500); 1 Jul 2016 10:45:11 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 25476 invoked by uid 99); 1 Jul 2016 10:45:11 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2016 10:45:11 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 148C02C02A1 for ; Fri, 1 Jul 2016 10:45:11 +0000 (UTC) Date: Fri, 1 Jul 2016 10:45:11 +0000 (UTC) From: "Vishal Khandelwal (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-2447) Zookeeper adds good delay when one of the quorum host is not reachable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 01 Jul 2016 10:45:13 -0000 [ https://issues.apache.org/jira/browse/ZOOKEEPER-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15358775#comment-15358775 ] Vishal Khandelwal commented on ZOOKEEPER-2447: ---------------------------------------------- [~eribeiro] : I will also be happy to look at the fix. It's always good to fix the problem in the right fashion. I also got the limitation of "InetAddress#isReachable()" method wrt firewall but could not find a better solution for that. even with socket connect in my case did not work :(. There would additional requirement of changing the design little as there is not provision to add the failed host back into the list yet. I opened JIRA here but did not get time to work on that : ZOOKEEPER-2449 May be [~dbenediktson] fix will help to get rid of "ZOOKEEPER-2449" as well. > Zookeeper adds good delay when one of the quorum host is not reachable > ----------------------------------------------------------------------- > > Key: ZOOKEEPER-2447 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2447 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.4.6, 3.5.0 > Reporter: Vishal Khandelwal > Assignee: Vishal Khandelwal > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2447.3.5.patch, withfix.txt, withoutFix.txt > > > StaticHostProvider --> resolveAndShuffle method adds all of the address which are valid in the quorum to the list, shuffles them and sends back to client connection class. If after shuffling if first node appear to be the one which is not reachable, Clientcnx.SendThread.run will keep on connecting to the failure till a timeout and the moves to a different node. This adds up random delay in zookeeper connection in case a host is down. Rather we could check if host is reachable in StaticHostProvider and ignore isReachable is false. Same as we do for UnknownHostException Exception. > This can tested using following test code by providing a valid host which is not reachable. for quick test comment Collections.shuffle(tmpList, sourceOfRandomness); in StaticHostProvider.resolveAndShuffle > {code} > @Test > public void test() throws Exception { > EventsWatcher watcher = new EventsWatcher(); > QuorumUtil qu = new QuorumUtil(1); > qu.startAll(); > > ZooKeeper zk = > new ZooKeeper(" > watcher.waitForConnected(CONNECTION_TIMEOUT * 5); > Assert.assertTrue("connection Established", watcher.isConnected()); > zk.close(); > } > {code} > Following fix can be added to StaticHostProvider.resolveAndShuffle > {code} > if(taddr.isReachable(4000 // can be some value)) { > tmpList.add(new InetSocketAddress(taddr, address.getPort())); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)