zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Nickerson <davidnickerson4mailingli...@gmail.com>
Subject Re: Leader election failure
Date Fri, 27 Jul 2012 17:01:09 GMT
>
> Sorry for the spam.


Other people may run into the same problem with leader election, search for
it, and come across your post. On behalf of them, thanks for posting your
solution.

On Fri, Jul 27, 2012 at 10:54 AM, Jared Cantwell
<jared.cantwell@gmail.com>wrote:

> It looks like I had a firewall issue with the leader election port on one
> of the nodes.  Sorry for the spam.
>
> ~Jared
>
> On Thu, Jul 26, 2012 at 11:20 PM, Jared Cantwell
> <jared.cantwell@gmail.com>wrote:
>
> > We are currently testing out 3.5.0.  If the fix made it into 3.4.4, I
> > assume that issue is also fixed in 3.5.0?
> >
> > ~Jared
> >
> >
> > On Thu, Jul 26, 2012 at 11:11 PM, Hanno Schlichting <hanno@hannosch.eu
> >wrote:
> >
> >> What version of ZK are yoy using? There's a bug in 3.4.x with 5 node
> >> clusters failing to agree on a leader. That's only solved in the yet
> >> unreleased 3.4.4.
> >>
> >> Hanno
> >>
> >> On 27.07.2012, at 06:40, Jared Cantwell <jared.cantwell@gmail.com>
> wrote:
> >>
> >> > I have a 5 node cluster configured using dynamic zookeeper.  It has
> been
> >> > through several reconfigurations, but at the moment I am simply trying
> >> to
> >> > start 3 of the nodes to get ZK accessible.  I have confirmed that the
> >> myid
> >> > files match the entries in the dynamic membership file for the 3 nodes
> >> in
> >> > question.  However, when I start up the three nodes I get the
> following
> >> > error:
> >> >
> >> > 2012-07-26 22:26:01,037 [myid:8] - INFO  [QuorumPeer[myid=8]/
> >> 10.10.5.27:2181
> >> > :Leader@445] - LEADING - LEADER ELECTION TOOK - 13
> >> > 2012-07-26 22:26:01,039 [myid:8] - INFO  [QuorumPeer[myid=8]/
> >> 10.10.5.27:2181
> >> > :FileSnap@83] - Reading snapshot /sf/data/zookeeper/
> >> > 10.10.5.27/version-2/snapshot.3000001e3
> >> > 2012-07-26 22:26:01,065 [myid:8] - INFO  [QuorumPeer[myid=8]/
> >> 10.10.5.27:2181
> >> > :FileTxnSnapLog@270] - Snapshotting: 0x3000001e3 to
> /sf/data/zookeeper/
> >> > 10.10.5.27/version-2/snapshot.3000001e3
> >> > 2012-07-26 22:26:10,837 [myid:8] - INFO
> >> > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8
> >> > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9
> >> > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config
> >> version)
> >> > 2012-07-26 22:26:20,849 [myid:8] - INFO
> >> > [WorkerReceiver[myid=8]:FastLeaderElection@635] - Notification: 8
> >> > (n.leader), 0x3000001e3 (n.zxid), 0x1 (n.round), LOOKING (n.state), 9
> >> > (n.sid), 0x3 (n.peerEPoch), LEADING (my state)300000147 (n.config
> >> version)
> >> > 2012-07-26 22:26:21,083 [myid:8] - WARN  [QuorumPeer[myid=8]/
> >> 10.10.5.27:2181
> >> > :QuorumPeer@949] - Unexpected exception
> >> > java.lang.InterruptedException: *Timeout while waiting for epoch from
> >> quorum
> >> > *
> >> >        at
> >> >
> >>
> org.apache.zookeeper.server.quorum.Leader.getEpochToPropose(Leader.java:1207)
> >> >        at
> >> org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:464)
> >> >        at
> >> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:946)
> >> > 2012-07-26 22:26:21,083 [myid:8] - INFO  [QuorumPeer[myid=8]/
> >> 10.10.5.27:2181
> >> > :Leader@614] - Shutting down
> >> > 2012-07-26 22:26:21,083 [myid:8] - INFO  [QuorumPeer[myid=8]/
> >> 10.10.5.27:2181
> >> > :Leader@620] - Shutdown called
> >> > java.lang.Exception: shutdown Leader! reason: Forcing shutdown
> >> >        at
> >> > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:620)
> >> >        at
> >> > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:952)
> >> > 2012-07-26 22:26:21,084 [myid:8] - INFO  [QuorumPeer[myid=8]/
> >> 10.10.5.27:2181
> >> > :ZooKeeperServer@413] - shutting down
> >> > 2012-07-26 22:26:21,084 [myid:8] - INFO
> >> > [LearnerCnxAcceptor-0.0.0.0/0.0.0.0:2182
> :Leader$LearnerCnxAcceptor@407]
> >> -
> >> > exception while shutting down acceptor: java.net.SocketException:
> Socket
> >> > closed
> >> >
> >> > I am not sure what to make of it or how to debug from here.  Any
> >> pointers
> >> > or suggestions on how to debug what might be wrong, or simply some
> usual
> >> > causes of this error would be appreciated.
> >> >
> >> > Thanks!
> >> > Jared
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message