zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Deepak Jagtap <deepak.jag...@maxta.com>
Subject Re: New zookeeper server fails to join quorum with msg "Have smaller server identifie"
Date Tue, 11 Mar 2014 01:11:50 GMT
Thanks Michi!


On Mon, Mar 10, 2014 at 5:40 PM, Michi Mutsuzaki <michi@cs.stanford.edu>wrote:

> StandaloneDisabledTest.startSingleServerTest seems to be failing from
> the same issue. We should fix this soon.
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-1870
>
> On Mon, Mar 10, 2014 at 5:33 PM, Deepak Jagtap <deepak.jagtap@maxta.com>
> wrote:
> > Hello,
> >
> > Another query regarding 1805.
> > I am observing zookeeper rolling upgrade is always succeeds when I apply
> > 1805 patch.
> > When I apply both 1810 and 1805 patch rolling upgrade fails due to an
> > issue mentioned earlier.
> >
> > Please advise, if it's fine to use only patch 1805 for the trunk?
> >
> > Thanks & Regards,
> > Deepak
> >
> >
> > On Mon, Mar 10, 2014 at 3:11 PM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
> >
> >> Hi German,
> >>
> >> I have applied patch 1810 and 1805 against trunk revision 1574686
> (recent
> >> revision against which 1810 patch build succeeded).
> >> But observing following error in the zookeeper log on the new node
> joining
> >> quorum:
> >>
> >> 2014-03-10 21:11:25,126 [myid:1] - INFO
> >>  [WorkerSender[myid=1]:QuorumCnxManager@195] - Have smaller server
> >> identifier, so dropping the connection: (3, 1)
> >> 2014-03-10 21:11:25,127 [myid:1] - INFO  [/169.254.44.1:3888
> >> :QuorumCnxManager$Listener@540] - Received connection request /
> >> 169.254.44.3:51507
> >> 2014-03-10 21:11:25,193 [myid:1] - ERROR
> >> [WorkerReceiver[myid=1]:NIOServerCnxnFactory$1@92] - Thread
> >> Thread[WorkerReceiver[myid=1],5,main] died
> >> java.lang.OutOfMemoryError: Java heap space
> >>    at
> >>
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerReceiver.run(FastLeaderElection.java:273)
> >>    at java.lang.Thread.run(Unknown Source)
> >>
> >> Followed by these messages getting printed repeatedly:
> >> 2014-03-10 21:11:25,328 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 400
> >> 2014-03-10 21:11:25,729 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 800
> >> 2014-03-10 21:11:26,530 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 1600
> >> 2014-03-10 21:11:28,131 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 3200
> >> 2014-03-10 21:11:31,332 [myid:1] - INFO
> >>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@900] -
> >> Notification time out: 6400
> >>
> >> Thanks & Reagrds,
> >> Deepak
> >>
> >>
> >>
> >>
> >>
> >> On Wed, Mar 5, 2014 at 11:50 AM, Deepak Jagtap <deepak.jagtap@maxta.com
> >wrote:
> >>
> >>> Hi,
> >>>
> >>> I have applied only 1805 patch, not 1810.
> >>> And upgrade is from 3.5.0.1458648 to 3.5.0.1562289 (not from 3.4.5).
> >>> It was failing very consistently in our environment, and after 1805
> patch
> >>> it went smoothly.
> >>>
> >>> Regards,
> >>> Deepak
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> On Wed, Mar 5, 2014 at 7:36 AM, German Blanco <
> >>> german.blanco.blanco@gmail.com> wrote:
> >>>
> >>>> Hello,
> >>>>
> >>>> do you mean ZOOKEEPER-1810 patch?
> >>>> That one alone doesn't solve the problem. On the other hand, the
> problem
> >>>> doesn't happen always, so after a rolling start it might get solved.
> >>>> We need 1818 as well, but it is easier to go step by step and get
> 1810 in
> >>>> trunk first.
> >>>> I hope that as soon as 3.4.6 is out this might get some attention.
> >>>>
> >>>> Regards,
> >>>>
> >>>> German.
> >>>>
> >>>>
> >>>> On Wed, Mar 5, 2014 at 2:17 AM, Deepak Jagtap <
> deepak.jagtap@maxta.com
> >>>> >wrote:
> >>>>
> >>>> > Hi,
> >>>> >
> >>>> > Please ignore the previous comment, I used wrong jar file and hence
> >>>> rolling
> >>>> > upgrade failed.
> >>>> > After applying patch for bug  on zookeeper-3.5.0.1562289
> >>>> > revision, rolling upgrade went fine.
> >>>> >
> >>>> > I have patched in house zookeeper version, but it would be
> convenient
> >>>> if we
> >>>> > apply patch on trunk and use the latest trunk.
> >>>> > Please advise if I can apply the patch on the trunk and test it
for
> >>>> you.
> >>>> >
> >>>> > Thanks & Regards,
> >>>> > Deepak
> >>>> >
> >>>> >
> >>>> > On Tue, Mar 4, 2014 at 12:09 PM, Deepak Jagtap <
> >>>> deepak.jagtap@maxta.com
> >>>> > >wrote:
> >>>> >
> >>>> > > Hi German,
> >>>> > >
> >>>> > > I tried applying patch for 1805 but problem still persists.
> >>>> > > Following are the notification messages logged repeatedly
by the
> node
> >>>> > > which fails to join the quorum:
> >>>> > >
> >>>> > >
> >>>> > > 2014-03-04 20:00:54,398 [myid:2] - INFO
> >>>> > >  [QuorumPeer[myid=2]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
> -
> >>>> > > Notification time out: 51200
> >>>> > > 2014-03-04 20:00:54,400 [myid:2] - INFO
> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification:
> 2
> >>>> > > (n.leader), 0x0 (n.zxid), 0x1 (n.round), LOOKING (n.state),
2
> >>>> (n.sid),
> >>>> > 0x0
> >>>> > > (n.peerEPoch), LOOKING (my state)1 (n.config version)
> >>>> > > 2014-03-04 20:00:54,401 [myid:2] - INFO
> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification:
> 3
> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0x2 (n.round), FOLLOWING
> >>>> (n.state), 1
> >>>> > > (n.sid), 0x1 (n.peerEPoch), LOOKING (my state)1 (n.config
version)
> >>>> > > 2014-03-04 20:00:54,403 [myid:2] - INFO
> >>>> > >  [WorkerReceiver[myid=2]:FastLeaderElection@605] - Notification:
> 3
> >>>> > > (n.leader), 0x100003e84 (n.zxid), 0xffffffffffffffff (n.round),
> >>>> LEADING
> >>>> > > (n.state), 3 (n.sid), 0x2 (n.peerEPoch), LOOKING (my state)1
> >>>> (n.config
> >>>> > > version)
> >>>> > >
> >>>> > >
> >>>> > >
> >>>> > > Patch for 1732 is already included in the trunk.
> >>>> > >
> >>>> > >
> >>>> > > Thanks & Regards,
> >>>> > > Deepak
> >>>> > >
> >>>> > >
> >>>> > > On Fri, Feb 28, 2014 at 2:58 PM, Deepak Jagtap <
> >>>> deepak.jagtap@maxta.com
> >>>> > >wrote:
> >>>> > >
> >>>> > >> Hi Flavio, German,
> >>>> > >>
> >>>> > >> Since this fix is critical for zookeeper rolling upgrade
is it ok
> >>>> if I
> >>>> > >> apply this patch to 3.5.0 trunk?
> >>>> > >> Is it straightforward to apply this patch to trunk?
> >>>> > >>
> >>>> > >> Thanks & Regards,
> >>>> > >> Deepak
> >>>> > >>
> >>>> > >>
> >>>> > >> On Wed, Feb 26, 2014 at 11:46 AM, Deepak Jagtap <
> >>>> > deepak.jagtap@maxta.com>wrote:
> >>>> > >>
> >>>> > >>> Thanks German!
> >>>> > >>> Just wondering is there any chance that this patch
may be
> applied
> >>>> to
> >>>> > >>> trunk in near future?
> >>>> > >>> If it's fine with you guys, I would be more than happy
to apply
> the
> >>>> > >>> fixes (from 3.4.5) to trunk and test them.
> >>>> > >>>
> >>>> > >>> Thanks & Regards,
> >>>> > >>> Deepak
> >>>> > >>>
> >>>> > >>>
> >>>> > >>> On Wed, Feb 26, 2014 at 1:29 AM, German Blanco <
> >>>> > >>> german.blanco.blanco@gmail.com> wrote:
> >>>> > >>>
> >>>> > >>>> Hello Deepak,
> >>>> > >>>>
> >>>> > >>>> due to ZOOKEEPER-1732 and then ZOOKEEPER-1805,
there are some
> >>>> cases in
> >>>> > >>>> which an ensemble can be formed so that it doesn't
allow any
> other
> >>>> > >>>> zookeeper server to join.
> >>>> > >>>> This has been fixed in branch 3.4, but it hasn't
been fixed in
> >>>> trunk
> >>>> > >>>> yet.
> >>>> > >>>> Check if the Notifications sent around contain
different values
> >>>> for
> >>>> > the
> >>>> > >>>> vote in the members of the ensemble.
> >>>> > >>>> If you force a new election (e.g. by killing the
leader) I
> guess
> >>>> > >>>> everything
> >>>> > >>>> should work normally, but don't take my word for
it.
> >>>> > >>>> Flavio should know more about this.
> >>>> > >>>>
> >>>> > >>>> Cheers,
> >>>> > >>>>
> >>>> > >>>> German.
> >>>> > >>>>
> >>>> > >>>>
> >>>> > >>>> On Wed, Feb 26, 2014 at 4:04 AM, Deepak Jagtap
<
> >>>> > deepak.jagtap@maxta.com
> >>>> > >>>> >wrote:
> >>>> > >>>>
> >>>> > >>>> > Hi,
> >>>> > >>>> >
> >>>> > >>>> > I replacing one of the zookeeper server from
3 node quorum.
> >>>> > >>>> > Initially all zookeeper serves were running
3.5.0.1515976
> >>>> version.
> >>>> > >>>> > I successfully replaced Node3 with newer
version
> 3.5.0.1551730.
> >>>> > >>>> > When I am trying to replace Node2 with the
same zookeeper
> >>>> version.
> >>>> > >>>> > I couldn't start zookeeper server on Node2
as it is
> continuously
> >>>> > >>>> stuck in
> >>>> > >>>> > leader election loop printing  following
messages:
> >>>> > >>>> >
> >>>> > >>>> > 2014-02-26 02:45:23,709 [myid:3] - INFO
> >>>> > >>>> >
>  [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@837]
> >>>> -
> >>>> > >>>> > Notification time out: 60000
> >>>> > >>>> > 2014-02-26 02:45:23,710 [myid:3] - INFO
> >>>> > >>>> >  [WorkerSender[myid=3]:QuorumCnxManager@195]
- Have smaller
> >>>> server
> >>>> > >>>> > identifier, so dropping the connection: (5,
3)
> >>>> > >>>> > 2014-02-26 02:45:23,712 [myid:3] - INFO
> >>>> > >>>> >  [WorkerReceiver[myid=3]:FastLeaderElection@605]
-
> >>>> Notification: 3
> >>>> > >>>> > (n.leader), 0x0 (n.zxid), 0x1 (n.round),
LOOKING (n.state), 3
> >>>> > >>>> (n.sid), 0x0
> >>>> > >>>> > (n.peerEPoch), LOOKING (my state)1 (n.config
version)
> >>>> > >>>> >
> >>>> > >>>> >
> >>>> > >>>> > Network connections and configuration of
the node being
> >>>> upgraded are
> >>>> > >>>> fine.
> >>>> > >>>> > The other 2 nodes in the quorum are fine
and serving the
> >>>> request.
> >>>> > >>>> >
> >>>> > >>>> > Any idea what might be causing this?
> >>>> > >>>> >
> >>>> > >>>> > Thanks & Regards,
> >>>> > >>>> > Deepak
> >>>> > >>>> >
> >>>> > >>>>
> >>>> > >>>
> >>>> > >>>
> >>>> > >>
> >>>> > >
> >>>> >
> >>>>
> >>>
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message