Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58B1D1003A for ; Wed, 5 Mar 2014 01:18:18 +0000 (UTC) Received: (qmail 58180 invoked by uid 500); 5 Mar 2014 01:17:45 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 58080 invoked by uid 500); 5 Mar 2014 01:17:44 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 58067 invoked by uid 99); 5 Mar 2014 01:17:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Mar 2014 01:17:43 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_SOFTFAIL X-Spam-Check-By: apache.org Received-SPF: softfail (nike.apache.org: transitioning domain of deepak.jagtap@maxta.com does not designate 209.85.214.182 as permitted sender) Received: from [209.85.214.182] (HELO mail-ob0-f182.google.com) (209.85.214.182) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Mar 2014 01:17:37 +0000 Received: by mail-ob0-f182.google.com with SMTP id uz6so358273obc.13 for ; Tue, 04 Mar 2014 17:17:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=Gy88iSwm5UGVgkZzPV5/eLXuL6i+z4h40+UTLWiukEA=; b=Ysyc8z+L/WxLPNn98IZUb2eRW32wBOqYzz4J603KA/uiglKtXvv1v2+nZCHHEH4qQY 0JUE6zFwnO35UgNPDItcm3AogXAvQXoWdMLueDegPGKRRqhLkyY45r0oolufkk6Rar9v gsOORA1KFexPsSMmCcYcN25FsU1Or6ZO9/sYY8JCRC4GCW+8TdwRyh2aeDEOAcfY5bVv x+aNGLqdX/ld7tY9mImgWfk7S/CkrR79gdVCVTiV3jayQWXRBAo2SU4+tJGxOhf8+dm3 5r1CXT7wYEVRuobrjT8ft5KIJ1M92r5ywQB+hmOfmCLWeVd34seFouYAbI3w271Sc28j DHGA== X-Gm-Message-State: ALoCoQkar/kOIbprtDAkHAsMbAjUXIUJnZFvREKU3WN2IZ1RB5Z8l3VXSmN7Ba36Z4cZUtkampFP MIME-Version: 1.0 X-Received: by 10.182.180.7 with SMTP id dk7mr2357627obc.20.1393982236506; Tue, 04 Mar 2014 17:17:16 -0800 (PST) Received: by 10.60.46.5 with HTTP; Tue, 4 Mar 2014 17:17:16 -0800 (PST) In-Reply-To: References: Date: Tue, 4 Mar 2014 17:17:16 -0800 Message-ID: Subject: Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie" From: Deepak Jagtap To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=f46d04426adc080b0f04f3d1c858 X-Virus-Checked: Checked by ClamAV on apache.org --f46d04426adc080b0f04f3d1c858 Content-Type: text/plain; charset=ISO-8859-1 Hi, Please ignore the previous comment, I used wrong jar file and hence rolling upgrade failed. After applying patch for bug on zookeeper-3.5.0.1562289 revision, rolling upgrade went fine. I have patched in house zookeeper version, but it would be convenient if we apply patch on trunk and use the latest trunk. Please advise if I can apply the patch on the trunk and test it for you. Thanks & Regards, Deepak On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap wrote: > Hi German, > > I tried applying patch for 1805 but problem still persists. > Following are the notification messages logged repeatedly by the node > which fails to join the quorum: > > > 2014-03-04 20:00:54,398 [myid:2] - INFO > [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] - > Notification time out: 51200 > 2014-03-04 20:00:54,400 [myid:2] - INFO > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2 > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 > (n.peerEPoch), LOOKING (my state)1 (n.config version) > 2014-03-04 20:00:54,401 [myid:2] - INFO > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 1 > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version) > 2014-03-04 20:00:54,403 [myid:2] - INFO > [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3 > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), LEADING > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config > version) > > > > Patch for 1732 is already included in the trunk. > > > Thanks & Regards, > Deepak > > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap wrote: > >> Hi Flavio, German, >> >> Since this fix is critical for zookeeper rolling upgrade is it ok if I >> apply this patch to 3.5.0 trunk? >> Is it straightforward to apply this patch to trunk? >> >> Thanks & Regards, >> Deepak >> >> >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap wrote: >> >>> Thanks German! >>> Just wondering is there any chance that this patch may be applied to >>> trunk in near future? >>> If it's fine with you guys, I would be more than happy to apply the >>> fixes (from 3.4.5) to trunk and test them. >>> >>> Thanks & Regards, >>> Deepak >>> >>> >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco < >>> german.blanco.blanco@gmail.com> wrote: >>> >>>> Hello Deepak, >>>> >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases in >>>> which an ensemble can be formed so that it doesn't allow any other >>>> zookeeper server to join. >>>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk >>>> yet. >>>> Check if the Notifications sent around contain different values for the >>>> vote in the members of the ensemble. >>>> If you force a new election (e.g. by killing the leader) I guess >>>> everything >>>> should work normally, but don't take my word for it. >>>> Flavio should know more about this. >>>> >>>> Cheers, >>>> >>>> German. >>>> >>>> >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap >>> >wrote: >>>> >>>> > Hi, >>>> > >>>> > I replacing one of the zookeeper server from 3 node quorum. >>>> > Initially all zookeeper serves were running 3.5.0.1515976 version. >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730. >>>> > When I am trying to replace Node2 with the same zookeeper version. >>>> > I couldn't start zookeeper server on Node2 as it is continuously >>>> stuck in >>>> > leader election loop printing following messages: >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO >>>> > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] - >>>> > Notification time out: 60000 >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO >>>> > [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server >>>> > identifier, so dropping the connection: (5, 3) >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO >>>> > [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification: 3 >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3 >>>> (n.sid), 0x0 >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version) >>>> > >>>> > >>>> > Network connections and configuration of the node being upgraded are >>>> fine. >>>> > The other 2 nodes in the quorum are fine and serving the request. >>>> > >>>> > Any idea what might be causing this? >>>> > >>>> > Thanks & Regards, >>>> > Deepak >>>> > >>>> >>> >>> >> > --f46d04426adc080b0f04f3d1c858--