Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 78584200B49 for ; Wed, 3 Aug 2016 19:12:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 76F8F160A86; Wed, 3 Aug 2016 17:12:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id BED8F160A5D for ; Wed, 3 Aug 2016 19:12:26 +0200 (CEST) Received: (qmail 18469 invoked by uid 500); 3 Aug 2016 17:12:20 -0000 Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@zookeeper.apache.org Delivered-To: mailing list dev@zookeeper.apache.org Received: (qmail 18191 invoked by uid 99); 3 Aug 2016 17:12:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Aug 2016 17:12:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 8208D2C0D64 for ; Wed, 3 Aug 2016 17:12:20 +0000 (UTC) Date: Wed, 3 Aug 2016 17:12:20 +0000 (UTC) From: "Dan Benediktson (JIRA)" To: dev@zookeeper.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (ZOOKEEPER-2447) Zookeeper adds good delay when one of the quorum host is not reachable MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 03 Aug 2016 17:12:27 -0000 [ https://issues.apache.org/jira/browse/ZOOKEEPER-2447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15406240#comment-15406240 ] Dan Benediktson commented on ZOOKEEPER-2447: -------------------------------------------- Correct, I don't think there is a patch in a completed state. The patch I provided was just to avoid a specific problem during the most recent proposed solution: the problem where you try to connect using only a slice of the connection time, but the chosen slice is too small to reasonably expect a connection to succeed. I think [~eribeiro] offered to look at a comprehensive solution, so I'm assigning over to him for the time being. > Zookeeper adds good delay when one of the quorum host is not reachable > ----------------------------------------------------------------------- > > Key: ZOOKEEPER-2447 > URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2447 > Project: ZooKeeper > Issue Type: Bug > Affects Versions: 3.4.6, 3.5.0 > Reporter: Vishal Khandelwal > Assignee: Dan Benediktson > Fix For: 3.5.3, 3.6.0 > > Attachments: ZOOKEEPER-2447-MinConnectTimeoutOnly.patch, ZOOKEEPER-2447.3.5.patch, withfix.txt, withoutFix.txt > > > StaticHostProvider --> resolveAndShuffle method adds all of the address which are valid in the quorum to the list, shuffles them and sends back to client connection class. If after shuffling if first node appear to be the one which is not reachable, Clientcnx.SendThread.run will keep on connecting to the failure till a timeout and the moves to a different node. This adds up random delay in zookeeper connection in case a host is down. Rather we could check if host is reachable in StaticHostProvider and ignore isReachable is false. Same as we do for UnknownHostException Exception. > This can tested using following test code by providing a valid host which is not reachable. for quick test comment Collections.shuffle(tmpList, sourceOfRandomness); in StaticHostProvider.resolveAndShuffle > {code} > @Test > public void test() throws Exception { > EventsWatcher watcher = new EventsWatcher(); > QuorumUtil qu = new QuorumUtil(1); > qu.startAll(); > > ZooKeeper zk = > new ZooKeeper(" > watcher.waitForConnected(CONNECTION_TIMEOUT * 5); > Assert.assertTrue("connection Established", watcher.isConnected()); > zk.close(); > } > {code} > Following fix can be added to StaticHostProvider.resolveAndShuffle > {code} > if(taddr.isReachable(4000 // can be some value)) { > tmpList.add(new InetSocketAddress(taddr, address.getPort())); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)