zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Subramanyeswara Bhavirisetty <bs...@yahoo-inc.com>
Subject Zookeepers going out of sync
Date Sun, 31 Mar 2013 19:54:51 GMT
Hi,

I have 5 box zookeeper quorum. 2 / 5 are unable to deserialize the message from leader and
hence they are not coming up. I tried restarting few times but still seeing the same issue.
This issue is observed when we were trying to do the following:


1>   We have a script which brings down the zookeeper server on the leader. We are trying
to test if new follower takes over once leader is down.

2>   The script has been running since 2-3 days. Surprisingly we don't see any issue for
the first 2 days but after that we hit the below issue wherein 2 out 5 zookeeper machines
are unable to deserialize the message.

zookeeper server = 3.4.5

Following are the zookeeper settings:

tickTime=4000
initLimit=20
syncLimit=15
snapCount=100000

Logs from Follower

---------------

Thu Mar 28 19:14:14 2013: 2013-03-28 19:14:14,158 - INFO  [WorkerReceiver
Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 0 (n.zxid), 3
(n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,811 - WARN
[QuorumPeer:/0.0.0.0:2181:Follower@82] - Exception when following the leader
Thu Mar 28 19:14:41 2013: java.io.EOFException
Thu Mar 28 19:14:41 2013:       at
java.io.DataInputStream.readInt(DataInputStream.java:392)
Thu Mar 28 19:14:41 2013:       at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
Thu Mar 28 19:14:41 2013:       at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:84)
Thu Mar 28 19:14:41 2013:       at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
Thu Mar 28 19:14:41 2013:       at
org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:148)
Thu Mar 28 19:14:41 2013:       at
org.apache.zookeeper.server.quorum.Learner.syncWithLeader(Learner.java:332)
Thu Mar 28 19:14:41 2013:       at
org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:75)
Thu Mar 28 19:14:41 2013:       at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:645)
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,811 - INFO
[QuorumPeer:/0.0.0.0:2181:Follower@165] - shutdown called
Thu Mar 28 19:14:41 2013: java.lang.Exception: shutdown Follower
Thu Mar 28 19:14:41 2013:       at
org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:165)
Thu Mar 28 19:14:41 2013:       at
org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:649)
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,812 - INFO
[QuorumPeer:/0.0.0.0:2181:QuorumPeer@621] - LOOKING
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,812 - INFO
[QuorumPeer:/0.0.0.0:2181:FastLeaderElection@663] - New election. My id =  1,
Proposed zxid = 0
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,813 - INFO  [WorkerReceiver
Thread:FastLeaderElection@496] - Notification: 1 (n.leader), 0 (n.zxid), 3
(n.round), LOOKING (n.state), 1 (n.sid), LOOKING (my state)
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO  [WorkerReceiver
Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 85899346051
(n.zxid), 2 (n.round), FOLLOWING (n.state), 3 (n.sid), LOOKING (my state)
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO  [WorkerReceiver
Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 85899346051
(n.zxid), 2 (n.round), FOLLOWING (n.state), 2 (n.sid), LOOKING (my state)
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO  [WorkerReceiver
Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 85899346051
(n.zxid), 2 (n.round), LEADING (n.state), 4 (n.sid), LOOKING (my state)
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO  [WorkerReceiver
Thread:FastLeaderElection@496] - Notification: 4 (n.leader), 85899346051
(n.zxid), 2 (n.round), FOLLOWING (n.state), 5 (n.sid), FOLLOWING (my state)
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,814 - INFO
[QuorumPeer:/0.0.0.0:2181:QuorumPeer@643] - FOLLOWING
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,815 - INFO
[QuorumPeer:/0.0.0.0:2181:ZooKeeperServer@151] - Created server with tickTime
2000 minSessionTimeout 4000 maxSessionTimeout 40000 datadir
/home/y/var/zookeeper/version-2 snapdir /home/y/var/zookeeper/version-2
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,817 - INFO
[QuorumPeer:/0.0.0.0:2181:Learner@294] - Getting a snapshot from leader
Thu Mar 28 19:14:41 2013: 2013-03-28 19:14:41,819 - INFO
[QuorumPeer:/0.0.0.0:2181:Learner@325] - Setting leader epoch 15
Thu Mar 28 19:14:54 2013: 2013-03-28 19:14:54,208 - INFO  [WorkerReceiver
Thread:FastLeaderElection@496] - Notification: 2 (n.leader), 0 (n.zxid), 3
(n.round), LOOKING (n.state), 2 (n.sid), FOLLOWING (my state)


Logs from Leader

----------------------------------------------------

Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,416 - ERROR
[LearnerHandler-/98.139.207.73:37505:LearnerHandler@562] - Unexpected exception
causing shutdown while sock still open
Thu Mar 28 19:18:14 2013: java.net.SocketTimeoutException: Read timed out
Thu Mar 28 19:18:14 2013:       at
java.net.SocketInputStream.socketRead0(Native Method)
Thu Mar 28 19:18:14 2013:       at
java.net.SocketInputStream.read(SocketInputStream.java:150)
Thu Mar 28 19:18:14 2013:       at
java.net.SocketInputStream.read(SocketInputStream.java:121)
Thu Mar 28 19:18:14 2013:       at
java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
Thu Mar 28 19:18:14 2013:       at
java.io.BufferedInputStream.read(BufferedInputStream.java:254)
Thu Mar 28 19:18:14 2013:       at
java.io.DataInputStream.readInt(DataInputStream.java:387)
Thu Mar 28 19:18:14 2013:       at
org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
Thu Mar 28 19:18:14 2013:       at
org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
Thu Mar 28 19:18:14 2013:       at
org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
Thu Mar 28 19:18:14 2013:       at
org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.java:450)
Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,417 - WARN
[LearnerHandler-/98.139.207.73:37505:LearnerHandler@575] - ******* GOODBYE
/98.139.207.73:37505 ********
Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,420 - INFO
[WorkerReceiver[myid=4]:FastLeaderElection$Messenger$WorkerReceiver@273] -
Backward compatibility mode, server id=2
Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,420 - INFO
[WorkerReceiver[myid=4]:FastLeaderElection@542] - Notification: 2 (n.leader),
0x0 (n.zxid), 0x3 (n.round), LOOKING (n.state), 2 (n.sid), 0x0 (n.peerEPoch),
LEADING (my state)
Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,423 - INFO
[LearnerHandler-/98.139.207.73:37511:LearnerHandler@263] - Follower sid: 2 :
info : org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@265d646b<mailto:org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@265d646b>
Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,424 - INFO
[LearnerHandler-/98.139.207.73:37511:LearnerHandler@318] - Synchronizing with
Follower sid: 2 maxCommittedLog=0x15000000d8 minCommittedLog=0x1400000001
peerLastZxid=0x0
Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,424 - WARN
[LearnerHandler-/98.139.207.73:37511:LearnerHandler@379] - Unhandled proposal
scenario
Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,424 - INFO
[LearnerHandler-/98.139.207.73:37511:LearnerHandler@395] - Sending SNAP
Thu Mar 28 19:18:14 2013: 2013-03-28 19:18:14,424 - INFO
[LearnerHandler-/98.139.207.73:37511:LearnerHandler@419] - Sending snapshot
last zxid of peer is 0x0  zxid of leader is 0x15000000d8sent zxid of db as
0x15000000d8

Can you please help how we can fix this issue or if this behavior is expected?

~Subramanyam

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message