Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 233ED1089B for ; Tue, 11 Mar 2014 01:12:21 +0000 (UTC) Received: (qmail 47572 invoked by uid 500); 11 Mar 2014 01:12:18 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 47500 invoked by uid 500); 11 Mar 2014 01:12:18 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 47492 invoked by uid 99); 11 Mar 2014 01:12:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar 2014 01:12:18 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of deepak.jagtap@maxta.com does not designate 209.85.214.182 as permitted sender) Received: from [209.85.214.182] (HELO mail-ob0-f182.google.com) (209.85.214.182) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 11 Mar 2014 01:12:12 +0000 Received: by mail-ob0-f182.google.com with SMTP id uz6so7938556obc.27 for ; Mon, 10 Mar 2014 18:11:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=zwjDgyQRmyuYWC5bvuX1vE3IU48VwvJ8rFvbZE9Ge1o=; b=UnfV3XfdE2C+QwVUFoZc/tnojRG10K/Zd3nkVhgO1M7oQGxiPW2UJ3HZLAj7nTOjuY uwdZ83ZlEt1duvKjcE6XJJlU6KKePmZUkbBaLu39B8DqTTHvaSQh0Kfdn8r8v6XXfo2Y fcKGMNn9HMNDxz0/CazIjdyP+FKd89TFSLJzD8zLZ8ay+ati1Kj9C/oIQE1x5St2ywZ8 6IQa0+Ms5xntcvVQJvCaMwvOKRrsZhLIF4ZIgNsBoUQFFyOCXD33/xCJfZq60ZDtIzV7 CZt4cHjsYbFiOXEyuDAuzHQaxKN+it9UmOTP3oOdMpcNsu0AcVuEwqPVdQBBaIYadT5k SlQw== X-Gm-Message-State: ALoCoQkgB+iYKD5/tbVAlfu3cbsvhC1O5Xi6Dg8Bh2Eo20fDdjaGf9XLUcRvHtODnkM5Knntknq2 MIME-Version: 1.0 X-Received: by 10.60.15.38 with SMTP id u6mr22650779oec.26.1394500310701; Mon, 10 Mar 2014 18:11:50 -0700 (PDT) Received: by 10.60.46.5 with HTTP; Mon, 10 Mar 2014 18:11:50 -0700 (PDT) In-Reply-To: References: Date: Mon, 10 Mar 2014 18:11:50 -0700 Message-ID: Subject: Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie" From: Deepak Jagtap To: user@zookeeper.apache.org, michi@cs.stanford.edu Content-Type: multipart/alternative; boundary=089e013cc024a8ba6004f44a67b7 X-Virus-Checked: Checked by ClamAV on apache.org --089e013cc024a8ba6004f44a67b7 Content-Type: text/plain; charset=ISO-8859-1 Thanks Michi! On Mon, Mar 10, 2014 at 5:40 PM, Michi Mutsuzaki wrote: > StandaloneDisabledTest.startSingleServerTest seems to be failing from > the same issue. We should fix this soon. > > https://issues.apache.org/jira/browse/ZOOKEEPER-1870 > > On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap > wrote: > > Hello, > > > > Another query regarding 1805. > > I am observing zookeeper rolling upgrade is always succeeds when I apply > > 1805 patch. > > When I apply both 1810 and 1805 patch rolling upgrade fails due to an > > issue mentioned earlier. > > > > Please advise, if it's fine to use only patch 1805 for the trunk? > > > > Thanks & Regards, > > Deepak > > > > > > On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap >wrote: > > > >> Hi German, > >> > >> I have applied patch 1810 and 1805 against trunk revision 1574686 > (recent > >> revision against which 1810 patch build succeeded). > >> But observing following error in the zookeeper log on the new node > joining > >> quorum: > >> > >> 2014-03-10 21:11:25,126 [myid:1] - INFO > >> [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server > >> identifier, so dropping the connection: (3, 1) > >> 2014-03-10 21:11:25,127 [myid:1] - INFO [/169.254.44.1:3888 > >> :QuorumCnxManager$Listener@540] - Received connection request / > >> 169.254.44.3:51507 > >> 2014-03-10 21:11:25,193 [myid:1] - ERROR > >> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread > >> Thread[WorkerReceiver[myid=1],5,main] died > >> java.lang.OutOfMemoryError: Java heap space > >> at > >> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273) > >> at java.lang.Thread.run(Unknown Source) > >> > >> Followed by these messages getting printed repeatedly: > >> 2014-03-10 21:11:25,328 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 400 > >> 2014-03-10 21:11:25,729 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 800 > >> 2014-03-10 21:11:26,530 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 1600 > >> 2014-03-10 21:11:28,131 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 3200 > >> 2014-03-10 21:11:31,332 [myid:1] - INFO > >> [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] - > >> Notification time out: 6400 > >> > >> Thanks & Reagrds, > >> Deepak > >> > >> > >> > >> > >> > >> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap >wrote: > >> > >>> Hi, > >>> > >>> I have applied only 1805 patch, not 1810. > >>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5). > >>> It was failing very consistently in our environment, and after 1805 > patch > >>> it went smoothly. > >>> > >>> Regards, > >>> Deepak > >>> > >>> > >>> > >>> > >>> > >>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco < > >>> german.blanco.blanco@gmail.com> wrote: > >>> > >>>> Hello, > >>>> > >>>> do you mean ZOOKEEPER-1810 patch? > >>>> That one alone doesn't solve the problem. On the other hand, the > problem > >>>> doesn't happen always, so after a rolling start it might get solved. > >>>> We need 1818 as well, but it is easier to go step by step and get > 1810 in > >>>> trunk first. > >>>> I hope that as soon as 3.4.6 is out this might get some attention. > >>>> > >>>> Regards, > >>>> > >>>> German. > >>>> > >>>> > >>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap < > deepak.jagtap@maxta.com > >>>> >wrote: > >>>> > >>>> > Hi, > >>>> > > >>>> > Please ignore the previous comment, I used wrong jar file and hence > >>>> rolling > >>>> > upgrade failed. > >>>> > After applying patch for bug on zookeeper-3.5.0.1562289 > >>>> > revision, rolling upgrade went fine. > >>>> > > >>>> > I have patched in house zookeeper version, but it would be > convenient > >>>> if we > >>>> > apply patch on trunk and use the latest trunk. > >>>> > Please advise if I can apply the patch on the trunk and test it for > >>>> you. > >>>> > > >>>> > Thanks & Regards, > >>>> > Deepak > >>>> > > >>>> > > >>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap < > >>>> deepak.jagtap@maxta.com > >>>> > >wrote: > >>>> > > >>>> > > Hi German, > >>>> > > > >>>> > > I tried applying patch for 1805 but problem still persists. > >>>> > > Following are the notification messages logged repeatedly by the > node > >>>> > > which fails to join the quorum: > >>>> > > > >>>> > > > >>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO > >>>> > > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] > - > >>>> > > Notification time out: 51200 > >>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO > >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: > 2 > >>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 > >>>> (n.sid), > >>>> > 0x0 > >>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version) > >>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO > >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: > 3 > >>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING > >>>> (n.state), 1 > >>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version) > >>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO > >>>> > > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: > 3 > >>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), > >>>> LEADING > >>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 > >>>> (n.config > >>>> > > version) > >>>> > > > >>>> > > > >>>> > > > >>>> > > Patch for 1732 is already included in the trunk. > >>>> > > > >>>> > > > >>>> > > Thanks & Regards, > >>>> > > Deepak > >>>> > > > >>>> > > > >>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap < > >>>> deepak.jagtap@maxta.com > >>>> > >wrote: > >>>> > > > >>>> > >> Hi Flavio, German, > >>>> > >> > >>>> > >> Since this fix is critical for zookeeper rolling upgrade is it ok > >>>> if I > >>>> > >> apply this patch to 3.5.0 trunk? > >>>> > >> Is it straightforward to apply this patch to trunk? > >>>> > >> > >>>> > >> Thanks & Regards, > >>>> > >> Deepak > >>>> > >> > >>>> > >> > >>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap < > >>>> > deepak.jagtap@maxta.com>wrote: > >>>> > >> > >>>> > >>> Thanks German! > >>>> > >>> Just wondering is there any chance that this patch may be > applied > >>>> to > >>>> > >>> trunk in near future? > >>>> > >>> If it's fine with you guys, I would be more than happy to apply > the > >>>> > >>> fixes (from 3.4.5) to trunk and test them. > >>>> > >>> > >>>> > >>> Thanks & Regards, > >>>> > >>> Deepak > >>>> > >>> > >>>> > >>> > >>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco < > >>>> > >>> german.blanco.blanco@gmail.com> wrote: > >>>> > >>> > >>>> > >>>> Hello Deepak, > >>>> > >>>> > >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some > >>>> cases in > >>>> > >>>> which an ensemble can be formed so that it doesn't allow any > other > >>>> > >>>> zookeeper server to join. > >>>> > >>>> This has been fixed in branch 3.4, but it hasn't been fixed in > >>>> trunk > >>>> > >>>> yet. > >>>> > >>>> Check if the Notifications sent around contain different values > >>>> for > >>>> > the > >>>> > >>>> vote in the members of the ensemble. > >>>> > >>>> If you force a new election (e.g. by killing the leader) I > guess > >>>> > >>>> everything > >>>> > >>>> should work normally, but don't take my word for it. > >>>> > >>>> Flavio should know more about this. > >>>> > >>>> > >>>> > >>>> Cheers, > >>>> > >>>> > >>>> > >>>> German. > >>>> > >>>> > >>>> > >>>> > >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap < > >>>> > deepak.jagtap@maxta.com > >>>> > >>>> >wrote: > >>>> > >>>> > >>>> > >>>> > Hi, > >>>> > >>>> > > >>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum. > >>>> > >>>> > Initially all zookeeper serves were running 3.5.0.1515976 > >>>> version. > >>>> > >>>> > I successfully replaced Node3 with newer version > 3.5.0.1551730. > >>>> > >>>> > When I am trying to replace Node2 with the same zookeeper > >>>> version. > >>>> > >>>> > I couldn't start zookeeper server on Node2 as it is > continuously > >>>> > >>>> stuck in > >>>> > >>>> > leader election loop printing following messages: > >>>> > >>>> > > >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO > >>>> > >>>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] > >>>> - > >>>> > >>>> > Notification time out: 60000 > >>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO > >>>> > >>>> > [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller > >>>> server > >>>> > >>>> > identifier, so dropping the connection: (5, 3) > >>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO > >>>> > >>>> > [WorkerReceiver[myid=3]:FastLeaderElection@605] - > >>>> Notification: 3 > >>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 > >>>> > >>>> (n.sid), 0x0 > >>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version) > >>>> > >>>> > > >>>> > >>>> > > >>>> > >>>> > Network connections and configuration of the node being > >>>> upgraded are > >>>> > >>>> fine. > >>>> > >>>> > The other 2 nodes in the quorum are fine and serving the > >>>> request. > >>>> > >>>> > > >>>> > >>>> > Any idea what might be causing this? > >>>> > >>>> > > >>>> > >>>> > Thanks & Regards, > >>>> > >>>> > Deepak > >>>> > >>>> > > >>>> > >>>> > >>>> > >>> > >>>> > >>> > >>>> > >> > >>>> > > > >>>> > > >>>> > >>> > >>> > >> > --089e013cc024a8ba6004f44a67b7--