zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jung Young Seok <jung.youngs...@gmail.com>
Subject [Zookeeper] Zookeeper Cluster broken due to snapshot corrupted error
Date Fri, 21 Mar 2014 07:14:43 GMT
Dear Zookeeper usergroup members,

I have some questions.

We're currently use Zookeeper 3.4.5 with clustering 3 nodes.
We got zookeeper service stopped all of sudden so client wasn't able to
connect to zookeeper server.
In that situation,  zookeepers couldn't elect leader each other.

Then I restarted zookeeper service (all of them) but could't elect leader
and be follower.
So I rebooted linux but same happened. (I lost zookeeper log here t.t)
When I removed snapshot files in data directory, the zookeeper worked okay.
I have uploaded my zookeeper snapshot here
 - https://s3-ap-northeast-1.amazonaws.com/zookeeper-logs/data_org_b1.tar

If I push the snapshot into data directory, zookeeper clustering fail
reappears again.

My question is
 1. why the snapshot was corrupted all of sudden?
 2. Is there any way I can avoid this snapshot corruption issue?

I've attached zoo.cfg and some of error log.

I'd be happy if I get any opinion.
Thank You.

Best Regards
Youngseok Jung


#zoo.cfg (pretty much default setting)
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/home/zookeeper/data
clientPort=2181

server.1=192.168.33.1:2888:3888
server.2=192.168.33.129:2888:3888
server.3=192.168.161.1:2888:3888
autopurge.snapRetainCount=3
autopurge.purgeInterval=1


#Some of error log
2014-03-19 17:56:24,737 [myid:1] - INFO
 [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 2
(n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING (n.state), 2
(n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
2014-03-19 17:56:24,737 [myid:1] - WARN
 [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel to 3 at
election address /10.0.161.1:3888
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
        at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
        at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
        at java.lang.Thread.run(Thread.java:724)
2014-03-19 17:56:25,537 [myid:1] - INFO
 [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
Notification time out: 1600
2014-03-19 17:56:25,538 [myid:1] - INFO
 [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 1
(n.leader), 0xc200000001 (n.zxid), 0x145 (n.round), LOOKING (n.state), 1
(n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
2014-03-19 17:56:25,540 [myid:1] - INFO
 [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 2
(n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING (n.state), 2
(n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
2014-03-19 17:56:25,540 [myid:1] - WARN
 [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel to 3 at
election address /10.0.161.1:3888
java.net.ConnectException: Connection refused
        at java.net.PlainSocketImpl.socketConnect(Native Method)
        at
java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
        at
java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
        at
java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
        at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
        at java.net.Socket.connect(Socket.java:579)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
        at
org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
        at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
        at
org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
        at java.lang.Thread.run(Thread.java:724)

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message