zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From German Blanco <german.blanco.bla...@gmail.com>
Subject Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"
Date Wed, 05 Mar 2014 15:36:11 GMT
Hello,

do you mean ZOOKEEPER-1810 patch?
That one alone doesn't solve the problem. On the other hand, the problem
doesn't happen always, so after a rolling start it might get solved.
We need 1818 as well, but it is easier to go step by step and get 1810 in
trunk first.
I hope that as soon as 3.4.6 is out this might get some attention.

Regards,

German.


On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <deepak.jagtap@maxta.com>wrote:

> Hi,
>
> Please ignore the previous comment, I used wrong jar file and hence rolling
> upgrade failed.
> After applying patch for bug  on zookeeper-3.5.0.1562289
> revision, rolling upgrade went fine.
>
> I have patched in house zookeeper version, but it would be convenient if we
> apply patch on trunk and use the latest trunk.
> Please advise if I can apply the patch on the trunk and test it for you.
>
> Thanks & Regards,
> Deepak
>
>
> On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
>
> > Hi German,
> >
> > I tried applying patch for 1805 but problem still persists.
> > Following are the notification messages logged repeatedly by the node
> > which fails to join the quorum:
> >
> >
> > 2014-03-04 20:00:54,398 [myid:2] - INFO
> >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837] -
> > Notification time out: 51200
> > 2014-03-04 20:00:54,400 [myid:2] - INFO
> >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 2
> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 2 (n.sid),
> 0x0
> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> > 2014-03-04 20:00:54,401 [myid:2] - INFO
> >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING (n.state), 1
> > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config version)
> > 2014-03-04 20:00:54,403 [myid:2] - INFO
> >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification: 3
> > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round), LEADING
> > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1 (n.config
> > version)
> >
> >
> >
> > Patch for 1732 is already included in the trunk.
> >
> >
> > Thanks & Regards,
> > Deepak
> >
> >
> > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
> >
> >> Hi Flavio, German,
> >>
> >> Since this fix is critical for zookeeper rolling upgrade is it ok if I
> >> apply this patch to 3.5.0 trunk?
> >> Is it straightforward to apply this patch to trunk?
> >>
> >> Thanks & Regards,
> >> Deepak
> >>
> >>
> >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
> deepak.jagtap@maxta.com>wrote:
> >>
> >>> Thanks German!
> >>> Just wondering is there any chance that this patch may be applied to
> >>> trunk in near future?
> >>> If it's fine with you guys, I would be more than happy to apply the
> >>> fixes (from 3.4.5) to trunk and test them.
> >>>
> >>> Thanks & Regards,
> >>> Deepak
> >>>
> >>>
> >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
> >>> german.blanco.blanco@gmail.com> wrote:
> >>>
> >>>> Hello Deepak,
> >>>>
> >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805, there are some cases
in
> >>>> which an ensemble can be formed so that it doesn't allow any other
> >>>> zookeeper server to join.
> >>>> This has been fixed in branch 3.4, but it hasn't been fixed in trunk
> >>>> yet.
> >>>> Check if the Notifications sent around contain different values for
> the
> >>>> vote in the members of the ensemble.
> >>>> If you force a new election (e.g. by killing the leader) I guess
> >>>> everything
> >>>> should work normally, but don't take my word for it.
> >>>> Flavio should know more about this.
> >>>>
> >>>> Cheers,
> >>>>
> >>>> German.
> >>>>
> >>>>
> >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap <
> deepak.jagtap@maxta.com
> >>>> >wrote:
> >>>>
> >>>> > Hi,
> >>>> >
> >>>> > I replacing one of the zookeeper server from 3 node quorum.
> >>>> > Initially all zookeeper serves were running 3.5.0.1515976 version.
> >>>> > I successfully replaced Node3 with newer version 3.5.0.1551730.
> >>>> > When I am trying to replace Node2 with the same zookeeper version.
> >>>> > I couldn't start zookeeper server on Node2 as it is continuously
> >>>> stuck in
> >>>> > leader election loop printing  following messages:
> >>>> >
> >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
> >>>> >  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
-
> >>>> > Notification time out: 60000
> >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
> >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195] - Have smaller server
> >>>> > identifier, so dropping the connection: (5, 3)
> >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
> >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605] - Notification:
3
> >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state), 3
> >>>> (n.sid), 0x0
> >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> >>>> >
> >>>> >
> >>>> > Network connections and configuration of the node being upgraded
are
> >>>> fine.
> >>>> > The other 2 nodes in the quorum are fine and serving the request.
> >>>> >
> >>>> > Any idea what might be causing this?
> >>>> >
> >>>> > Thanks & Regards,
> >>>> > Deepak
> >>>> >
> >>>>
> >>>
> >>>
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message