Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4B0BC103FE for ; Wed, 30 Apr 2014 21:36:54 +0000 (UTC) Received: (qmail 8558 invoked by uid 500); 30 Apr 2014 21:36:52 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 8485 invoked by uid 500); 30 Apr 2014 21:36:51 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 8477 invoked by uid 99); 30 Apr 2014 21:36:51 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Apr 2014 21:36:51 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of mckenzie.cam@gmail.com designates 209.85.217.171 as permitted sender) Received: from [209.85.217.171] (HELO mail-lb0-f171.google.com) (209.85.217.171) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 30 Apr 2014 21:36:45 +0000 Received: by mail-lb0-f171.google.com with SMTP id u14so1705431lbd.30 for ; Wed, 30 Apr 2014 14:36:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=p8NyqMqoWpUmaSj3qhxmxp+oBwSb2G0f6OpoQVR72Ng=; b=cfQ1xfkEDowlppXtnqMejGzGBtKJoPOIbnQocfzMhh0+/91jVDOEtYELMpx7e/K9Gg PBDXeT1KFiDs6x9rBf3u2b7rXw/nJ92EPso5V9ZJRcvnYnz2pUnKcCXc4uvXl1ozhHVe g61cTWcDfv7LBs+6HaGfpFc5I2m4+/j0r62VhA4AC/E9t+Fb3rrSmt8HaKzkGhC8PjV7 dZiF4DI747V8n6+GJUKE7uY8WOh0rpJMuQDxTiDqyhPqDE5Q13QE55GteO5C01z3H45f fMwCmoh19Udai69lQnGuEgNFfOwp38IgEc3oqxNQoISPPavUg9c4yo+hjPcamOykgVy8 xCVA== MIME-Version: 1.0 X-Received: by 10.152.1.199 with SMTP id 7mr4579700lao.24.1398893782755; Wed, 30 Apr 2014 14:36:22 -0700 (PDT) Received: by 10.112.7.7 with HTTP; Wed, 30 Apr 2014 14:36:22 -0700 (PDT) In-Reply-To: References: <024201cf644b$7fcbc820$7f635860$@yahoo.com> Date: Thu, 1 May 2014 07:36:22 +1000 Message-ID: Subject: Re: ZOOKEEPER-900 / 901 / 1678 From: Cameron McKenzie To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=089e013c6ef6ffecfc04f8495658 X-Virus-Checked: Checked by ClamAV on apache.org --089e013c6ef6ffecfc04f8495658 Content-Type: text/plain; charset=UTF-8 I've done a bit more testing this morning, and it appears that the leader election is actually completing, but then just after the election has completed, the connection attempt to the dead host times out, and this seems to cause another leader election. The same thing happens the next leader election. etc. 2014-04-30 04:07:25,383 [myid:3] - INFO [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2183:Leader@358] - LEADING - LEADER ELECTION TOOK - 14662 2014-04-30 04:07:25,756 [myid:3] - WARN [WorkerSender[myid=3]:QuorumCnxManager@382] - Cannot open channel to 2 at election address /10.0.0.0:3889 java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339) at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200) at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182) at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) at java.net.Socket.connect(Socket.java:579) at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:368) at org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:341) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:449) at org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:430) at java.lang.Thread.run(Thread.java:744) 2014-04-30 04:07:25,757 [myid:3] - INFO [WorkerReceiver[myid=3]:FastLeaderElection@597] - Notification: 1 (message format version), 3 (n.leader), 0xc00000001 (n.zxid), 0xb (n.round), LOOKING (n.state), 3 (n.sid), 0xd (n.peerEpoch) LEADING (my state) cheers On Wed, Apr 30, 2014 at 6:48 PM, Cameron McKenzie wrote: > hey Flavio, > Thanks for the quick reply. > > I'm running ZK 3.4.6. Having looked into the code a bit more, I think that > I was slightly presumptuous about the root cause. The actual socket > connects seem to be passing a timeout correctly, and based on the logs, I > can see the timeouts on connect occurring. > > I can reproduce the issue on a VM running two instances of ZK. These > instances are configured in a 3 node cluster (with the 2 real ZK instances, > and one bogus IP address that will not resolve to anything useful). > Specifically, this bogus host is configured 2nd in the server list. When I > configured it third, the cluster would occasionally form a quorum (though > still not consistently). I've attached the config and logs from both of the > ZK instances. > > Any help would be much appreciated! > cheers > > > > > On Wed, Apr 30, 2014 at 6:09 PM, FPJ wrote: > >> Hi Cameron, >> >> Which version of ZK are you using? Also, if you can share logs, then it >> might be easier for us to help you out. >> >> -Flavio >> >> > -----Original Message----- >> > From: Cameron McKenzie [mailto:mckenzie.cam@gmail.com] >> > Sent: 30 April 2014 08:44 >> > To: zookeeper-user@hadoop.apache.org >> > Subject: ZOOKEEPER-900 / 901 / 1678 >> > >> > ZooKeeper users, >> > Does anyone know the status of these issues? They don't seem to have had >> > anything done to them since late 2010? >> > >> > I think that we're experiencing the same issue currently. If we have a >> 3 node >> > cluster for example, and 1 of these nodes is completely dead (i.e the >> entire >> > host is not contactable due to a power outage), I would expect that a >> > quorum could still be formed, but this does not appear to be the case. >> > >> > I haven't delved into the code too much, but it appears that blocking >> IO is >> > being used for the connect. This doesn't respect the socket SO timeout >> being >> > set, so it means that the connect() call can block for some arbitrary >> amount of >> > time (based on the OS level TCP settings?). This in turn means that >> leader >> > election will fail because it times out before the socket connect does, >> even >> > though there are enough live hosts present to form a quorum. >> > >> > This seems like a fairly fundamental problem, unless I'm missing >> something. >> > If a single host goes down due to a power failure for example, it can >> prevent >> > any further hosts joining the cluster. In addition, if after a power >> failure, >> > enough hosts come back online to form a quorum, but some don't, that a >> > quorum may still not be able to be formed. >> > cheers >> > Cam >> >> > --089e013c6ef6ffecfc04f8495658--