zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michi Mutsuzaki <mi...@cs.stanford.edu>
Subject Re: [Zookeeper] Zookeeper Cluster broken due to snapshot corrupted error
Date Sun, 23 Mar 2014 03:05:17 GMT
Hi Youngseok,

Could you post the log file from 192.168.161.1? The log file you
posted indicates that 192.168.33.1 is not able to connect to
192.168.161.1.

Thanks!
--Michi


On Fri, Mar 21, 2014 at 12:14 AM, Jung Young Seok
<jung.youngseok@gmail.com> wrote:
> Dear Zookeeper usergroup members,
>
> I have some questions.
>
> We're currently use Zookeeper 3.4.5 with clustering 3 nodes.
> We got zookeeper service stopped all of sudden so client wasn't able to
> connect to zookeeper server.
> In that situation,  zookeepers couldn't elect leader each other.
>
> Then I restarted zookeeper service (all of them) but could't elect leader
> and be follower.
> So I rebooted linux but same happened. (I lost zookeeper log here t.t)
> When I removed snapshot files in data directory, the zookeeper worked okay.
> I have uploaded my zookeeper snapshot here
>  - https://s3-ap-northeast-1.amazonaws.com/zookeeper-logs/data_org_b1.tar
>
> If I push the snapshot into data directory, zookeeper clustering fail
> reappears again.
>
> My question is
>  1. why the snapshot was corrupted all of sudden?
>  2. Is there any way I can avoid this snapshot corruption issue?
>
> I've attached zoo.cfg and some of error log.
>
> I'd be happy if I get any opinion.
> Thank You.
>
> Best Regards
> Youngseok Jung
>
>
> #zoo.cfg (pretty much default setting)
> tickTime=2000
> initLimit=10
> syncLimit=5
> dataDir=/home/zookeeper/data
> clientPort=2181
>
> server.1=192.168.33.1:2888:3888
> server.2=192.168.33.129:2888:3888
> server.3=192.168.161.1:2888:3888
> autopurge.snapRetainCount=3
> autopurge.purgeInterval=1
>
>
> #Some of error log
> 2014-03-19 17:56:24,737 [myid:1] - INFO
>  [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 2
> (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING (n.state), 2
> (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
> 2014-03-19 17:56:24,737 [myid:1] - WARN
>  [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel to 3 at
> election address /10.0.161.1:3888
> java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>         at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>         at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:579)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
>         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
>         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
>         at java.lang.Thread.run(Thread.java:724)
> 2014-03-19 17:56:25,537 [myid:1] - INFO
>  [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
> Notification time out: 1600
> 2014-03-19 17:56:25,538 [myid:1] - INFO
>  [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 1
> (n.leader), 0xc200000001 (n.zxid), 0x145 (n.round), LOOKING (n.state), 1
> (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
> 2014-03-19 17:56:25,540 [myid:1] - INFO
>  [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification: 2
> (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING (n.state), 2
> (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
> 2014-03-19 17:56:25,540 [myid:1] - WARN
>  [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel to 3 at
> election address /10.0.161.1:3888
> java.net.ConnectException: Connection refused
>         at java.net.PlainSocketImpl.socketConnect(Native Method)
>         at
> java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
>         at
> java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
>         at
> java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
>         at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>         at java.net.Socket.connect(Socket.java:579)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:354)
>         at
> org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxManager.java:327)
>         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.process(FastLeaderElection.java:393)
>         at
> org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$WorkerSender.run(FastLeaderElection.java:365)
>         at java.lang.Thread.run(Thread.java:724)

Mime
View raw message