Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48A5E1077F for ; Tue, 11 Mar 2014 00:40:59 +0000 (UTC) Received: (qmail 86621 invoked by uid 500); 11 Mar 2014 00:40:58 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 86569 invoked by uid 500); 11 Mar 2014 00:40:57 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 86561 invoked by uid 99); 11 Mar 2014 00:40:57 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar 2014 00:40:57 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mutsuzaki@gmail.com designates 209.85.220.173 as permitted sender) Received: from [209.85.220.173] (HELO mail-vc0-f173.google.com) (209.85.220.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar 2014 00:40:53 +0000 Received: by mail-vc0-f173.google.com with SMTP id il7so2806543vcb.4 for ; Mon, 10 Mar 2014 17:40:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:reply-to:sender:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=pb5C10kOZMUd3rq5ubMF1DbsJZm/quO1pwv8qR6n5jk=; b=DG7K9KZWFeqktFAScgqj3QnPyXN5t4WSpvv9LRlm9WV6xjY85QrTH8QjClowDfgI9B 47TF/rMdksgkXfNvGEsFvjpBd6e9RegNZzimUEykvw9ZXGBqp8mzfbNCznEGif0ukTup oYXQfwmIrIcinnG7tDrPYGpgg9XKhUpjyZhTVavKFxg823c3/D5yHgjCX2YTTeEN577a 2g+Ttj3MdETNFQYsNwPUnfF6SbUymaz1Kn+TdZXVn9cXdZtWXqyjSwd+WvJ5IXNyD5ZX QM8k629xJ3MijzSk25GFWciVkyFoCyZGncfC70IVTWJXaipBgTLYZtK+rNHmwamQVBk6 tW/A== MIME-Version: 1.0 X-Received: by 10.52.243.167 with SMTP id wz7mr35188vdc.47.1394498432426; Mon, 10 Mar 2014 17:40:32 -0700 (PDT) Reply-To: michi@cs.stanford.edu Sender: mutsuzaki@gmail.com Received: by 10.58.196.232 with HTTP; Mon, 10 Mar 2014 17:40:32 -0700 (PDT) In-Reply-To: References: Date: Mon, 10 Mar 2014 17:40:32 -0700 X-Google-Sender-Auth: 6-zugpN6-IQqHSPI43LVTVXzInY Message-ID: Subject: Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie" From: Michi Mutsuzaki To: "user@zookeeper.apache.org" Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org StandaloneDisabledTest.startSingleServerTest seems to be failing from the same issue. We should fix this soon. https://issues.apache.org/jira/browse/ZOOKEEPER-1870 On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap wrote: > Hello, > > Another query regarding 1805. > I am observing zookeeper rolling upgrade is always succeeds when I apply > 1805 patch. > When I apply both 1810 and 1805 patch rolling upgrade fails due to an > issue mentioned earlier. > > Please advise, if it's fine to use only patch 1805 for the trunk? > > Thanks & Regards, > Deepak > > > On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap wrote: > >> Hi German, >> >> I have applied patch 1810 and 1805 against trunk revision 1574686 (recent >> revision against which 1810 patch build succeeded). >> But observing following error in the zookeeper log on the new node joining >> quorum: >> >> 2014-03-10 21:11:25,126 [myid:1] - INFO >> [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server >> identifier, so dropping the connection: (3, 1) >> 2014-03-10 21:11:25,127 [myid:1] - INFO [/169.254.44.1:3888 >> :QuorumCnxManager$Listener@540] - Received connection request / >> 169.254.44.3:51507 >> 2014-03-10 21:11:25,193 [myid:1] - ERROR >> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread >> Thread[WorkerReceiver[myid=1],5,main] died >> java.lang.OutOfMemoryError: Java heap space >> at >> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273) >> at java.lang.Thread.run(Unknown Source) >> >> Followed by these messages getting printed repeatedly: >> 2014-03-10 21:11:25,328 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 400 >> 2014-03-10 21:11:25,729 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 800 >> 2014-03-10 21:11:26,530 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 1600 >> 2014-03-10 21:11:28,131 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 3200 >> 2014-03-10 21:11:31,332 [myid:1] - INFO >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - >> Notification time out: 6400 >> >> Thanks & Reagrds, >> Deepak >> >> >> >> >> >> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap wrote: >> >>> Hi, >>> >>> I have applied only 1805 patch, not 1810. >>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5). >>> It was failing very consistently in our environment, and after 1805 patch >>> it went smoothly. >>> >>> Regards, >>> Deepak >>> >>> >>> >>> >>> >>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco < >>> german.blanco.blanco@gmail.com> wrote: >>> >>>> Hello, >>>> >>>> do you mean ZOOKEEPER-1810 patch? >>>> That one alone doesn't solve the problem. On the other hand, the problem >>>> doesn't happen always, so after a rolling start it might get solved. >>>> We need 1818 as well, but it is easier to go step by step and get 1810 in >>>> trunk first. >>>> I hope that as soon as 3.4.6 is out this might get some attention. >>>> >>>> Regards, >>>> >>>> German. >>>> >>>> >>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap >>> >wrote: >>>> >>>> > Hi, >>>> > >>>> > Please ignore the previous comment, I used wrong jar file and hence >>>> rolling >>>> > upgrade failed. >>>> > After applying patch for bug on zookeeper-3.5.0.1562289 >>>> > revision, rolling upgrade went fine. >>>> > >>>> > I have patched in house zookeeper version, but it would be convenient >>>> if we >>>> > apply patch on trunk and use the latest trunk. >>>> > Please advise if I can apply the patch on the trunk and test it for >>>> you. >>>> > >>>> > Thanks & Regards, >>>> > Deepak >>>> > >>>> > >>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap < >>>> deepak.jagtap@maxta.com >>>> > >wrote: >>>> > >>>> > > Hi German, >>>> > > >>>> > > I tried applying patch for 1805 but problem still persists. >>>> > > Following are the notification messages logged repeatedly by the node >>>> > > which fails to join the quorum: >>>> > > >>>> > > >>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO >>>> > > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] - >>>> > > Notification time out: 51200 >>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2 >>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 >>>> (n.sid), >>>> > 0x0 >>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version) >>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 >>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING >>>> (n.state), 1 >>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version) >>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 >>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), >>>> LEADING >>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 >>>> (n.config >>>> > > version) >>>> > > >>>> > > >>>> > > >>>> > > Patch for 1732 is already included in the trunk. >>>> > > >>>> > > >>>> > > Thanks & Regards, >>>> > > Deepak >>>> > > >>>> > > >>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap < >>>> deepak.jagtap@maxta.com >>>> > >wrote: >>>> > > >>>> > >> Hi Flavio, German, >>>> > >> >>>> > >> Since this fix is critical for zookeeper rolling upgrade is it ok >>>> if I >>>> > >> apply this patch to 3.5.0 trunk? >>>> > >> Is it straightforward to apply this patch to trunk? >>>> > >> >>>> > >> Thanks & Regards, >>>> > >> Deepak >>>> > >> >>>> > >> >>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap < >>>> > deepak.jagtap@maxta.com>wrote: >>>> > >> >>>> > >>> Thanks German! >>>> > >>> Just wondering is there any chance that this patch may be applied >>>> to >>>> > >>> trunk in near future? >>>> > >>> If it's fine with you guys, I would be more than happy to apply the >>>> > >>> fixes (from 3.4.5) to trunk and test them. >>>> > >>> >>>> > >>> Thanks & Regards, >>>> > >>> Deepak >>>> > >>> >>>> > >>> >>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco < >>>> > >>> german.blanco.blanco@gmail.com> wrote: >>>> > >>> >>>> > >>>> Hello Deepak, >>>> > >>>> >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some >>>> cases in >>>> > >>>> which an ensemble can be formed so that it doesn't allow any other >>>> > >>>> zookeeper server to join. >>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in >>>> trunk >>>> > >>>> yet. >>>> > >>>> Check if the Notifications sent around contain different values >>>> for >>>> > the >>>> > >>>> vote in the members of the ensemble. >>>> > >>>> If you force a new election (e.g. by killing the leader) I guess >>>> > >>>> everything >>>> > >>>> should work normally, but don't take my word for it. >>>> > >>>> Flavio should know more about this. >>>> > >>>> >>>> > >>>> Cheers, >>>> > >>>> >>>> > >>>> German. >>>> > >>>> >>>> > >>>> >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap < >>>> > deepak.jagtap@maxta.com >>>> > >>>> >wrote: >>>> > >>>> >>>> > >>>> > Hi, >>>> > >>>> > >>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum. >>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976 >>>> version. >>>> > >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730. >>>> > >>>> > When I am trying to replace Node2 with the same zookeeper >>>> version. >>>> > >>>> > I couldn't start zookeeper server on Node2 as it is continuously >>>> > >>>> stuck in >>>> > >>>> > leader election loop printing following messages: >>>> > >>>> > >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO >>>> > >>>> > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] >>>> - >>>> > >>>> > Notification time out: 60000 >>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO >>>> > >>>> > [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller >>>> server >>>> > >>>> > identifier, so dropping the connection: (5, 3) >>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO >>>> > >>>> > [WorkerReceiver[myid=3]:FastLeaderElection@605] - >>>> Notification: 3 >>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 >>>> > >>>> (n.sid), 0x0 >>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version) >>>> > >>>> > >>>> > >>>> > >>>> > >>>> > Network connections and configuration of the node being >>>> upgraded are >>>> > >>>> fine. >>>> > >>>> > The other 2 nodes in the quorum are fine and serving the >>>> request. >>>> > >>>> > >>>> > >>>> > Any idea what might be causing this? >>>> > >>>> > >>>> > >>>> > Thanks & Regards, >>>> > >>>> > Deepak >>>> > >>>> > >>>> > >>>> >>>> > >>> >>>> > >>> >>>> > >> >>>> > > >>>> > >>>> >>> >>> >>