zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michi Mutsuzaki <mi...@cs.stanford.edu>
Subject Re: [Zookeeper] Zookeeper Cluster broken due to snapshot corrupted error
Date Wed, 26 Mar 2014 23:00:35 GMT
Hi Youngseok,

Don't rotate transaction logs yourself. At the very minimum, you need
to keep the most recent snapshot file and the most recent transaction
log file. You can use the snapCount to control the size of the
transaction log files. However be aware that smaller snapCount means
more frequent snapshotting, which affects ZooKeeper performance. Do
test it before changing snapCount in production.

http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_advancedConfiguration

For zookeeper.out, it's really up to you to decide how much data to
retain. Personally I would keep at least a day worth of INFO log.


On Wed, Mar 26, 2014 at 3:36 PM, Jung Young Seok
<jung.youngseok@gmail.com> wrote:
> I have a couple of question regarding log and snapshot management.
>
> I use auto purge feature so I keep only 3 snapshots and 3 transaction logs.
> Zookeeper log(zookeeper.out) is rotated daily.
>
> What I would like to do is keep the log and snapshot files as small as
> possible.
>
> Would it be okay to manage snapshot files and transaction logs with
> logrotate.d or Log4j rolling not to grow more than 50MB ?
>
> Thank you in advance.
> Youngseok
> 2014. 3. 24. 오후 4:43에 "Rakesh R" <rakeshr@huawei.com>님이 작성:
>
>>
>> From the latest log shared by YoungSeok,
>>
>> [1] I could see the LearnerHandler fails to get a Leader.ACKEPOCH response
>> from the Followers and is failing with the following exception.
>>
>> 2014-03-19 17:29:19,312 [myid:3] - INFO [LearnerHandler-/10.0.33.1:58547
>> :LearnerHandler@263] - Follower sid: 1 :info :
>> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@3c966db5
>> 2014-03-19 17:29:19,314 [myid:3] - INFO [LearnerHandler-/10.0.33.129:49810
>> :LearnerHandler@263] - Follower sid: 2 :info :
>> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@466b56b
>> 2014-03-19 17:29:19,475 [myid:3] - ERROR [LearnerHandler-/10.0.33.1:58547
>> :LearnerHandler@562] - Unexpected
>> exception causing shutdown while sock still open java.io.EOFException
>>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>>         at
>> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>         at
>>
>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
>>         at
>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>>         at
>> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.j
>> ava:290)
>>
>> Hi Michi, It would be great if you can help to know more on ZK-1697. As I
>> understood this is talking about the Leader.ACK, am I correct ?, if so I
>> got confused by seeing the Leader.ACKEPOCH exception in LearnerHandler side
>> [1]. From the code what I've seen Leader.ACKEPOCH would sent to the Leader
>> at the time of Learner# registerWithLeader().
>>
>> Thanks in advance.
>> Rakesh
>>
>> -----Original Message-----
>> From: Jung Young Seok [mailto:jung.youngseok@gmail.com]
>> Sent: 24 March 2014 10:20
>> To: michi@cs.stanford.edu
>> Cc: user@zookeeper.apache.org
>> Subject: Re: [Zookeeper] Zookeeper Cluster broken due to snapshot
>> corrupted error
>>
>> I'm not sure if my issue is related to
>> https://issues.apache.org/jira/browse/ZOOKEEPER-1697
>> but I think I should try with  Zookeeer version to 3.4.6(stable).
>>
>> I'm just hoping 3.4.6 version  would prevent happening my issue again.
>>
>> Thank you for your answer.
>> Have a great day.
>>
>> Best Regards,
>> Youngseok Jung
>>
>>
>> 2014-03-24 13:05 GMT+09:00 Michi Mutsuzaki <michi@cs.stanford.edu>:
>>
>> > I wonder if this is related to ZOOKEEPER-1697.
>> >
>> > https://issues.apache.org/jira/browse/ZOOKEEPER-1697
>> >
>> > --Michi
>> >
>> > On Sun, Mar 23, 2014 at 6:15 PM, Jung Young Seok
>> > <jung.youngseok@gmail.com> wrote:
>> > > I've added zookeeper log (192.168.161.1).
>> > > The time that the log was written look different but you might ignore
>> it.
>> > > Logs on 192.168.161.1 had been repeated with below pattern.
>> > >
>> > > Thank you for your asking.
>> > >
>> > >
>> > ----------------------------------------------------------------------
>> > ----------------------------------------------------------------------
>> > ----------------------------------------------------------
>> > > 2014-03-19 17:28:06,105 [myid:3] - INFO
>> > > [LearnerHandler-/10.0.33.129:49809:LearnerHandler@395] - Sending
>> > > DIFF
>> > > 2014-03-19 17:28:07,414 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41252
>> > > 2014-03-19 17:28:07,415 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:07,415 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41252 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:12,173 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41255
>> > > 2014-03-19 17:28:12,174 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:12,174 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41255 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:14,558 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41258
>> > > 2014-03-19 17:28:14,559 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:14,559 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41258 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:18,585 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41261
>> > > 2014-03-19 17:28:18,586 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:18,586 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41261 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:20,067 [myid:3] - WARN
>> > > [LearnerHandler-/10.0.33.1:58546:Leader@574] - Commiting zxid
>> > 0xc500000000
>> > > from /10.0.161.1:2888 not first!
>> > > 2014-03-19 17:28:20,067 [myid:3] - WARN
>> > > [LearnerHandler-/10.0.33.1:58546:Leader@576] - First is 0x0
>> > > 2014-03-19 17:28:20,068 [myid:3] - INFO
>> > > [LearnerHandler-/10.0.33.1:58546:Leader@598] - Have quorum of
>> > supporters;
>> > > starting up and setting last processed zxid: 0xc500000000
>> > > 2014-03-19 17:28:22,312 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader@490] - Shutting down
>> > > 2014-03-19 17:28:22,312 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader@496] - Shutdown
>> > > called
>> > > java.lang.Exception: shutdown Leader! reason: Only 1 followers, need 1
>> > >         at
>> > > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
>> > >         at
>> > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
>> > >         at
>> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:75
>> > > 3)
>> > > 2014-03-19 17:28:22,313 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] -
>> > > shutting down
>> > > 2014-03-19 17:28:22,320 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] -
>> > Shutting
>> > > down
>> > > 2014-03-19 17:28:22,320 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@743] -
>> > > Shutting down
>> > > 2014-03-19 17:28:22,321 [myid:3] - INFO  [ProcessThread(sid:3
>> > > cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited
>> loop!
>> > > 2014-03-19 17:28:22,321 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88
>> > > ] - Shutting down
>> > > 2014-03-19 17:28:22,322 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] -
>> > > Shutting down
>> > > 2014-03-19 17:28:22,322 [myid:3] - INFO
>> > > [CommitProcessor:3:CommitProcessor@150] - CommitProcessor exited loop!
>> > > 2014-03-19 17:28:22,322 [myid:3] - INFO
>> > >
>> > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProc
>> > essor@655
>> > ]
>> > > - Shutting down
>> > > 2014-03-19 17:28:22,322 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415]
>> > > - shutdown of request processor complete
>> > > 2014-03-19 17:28:22,323 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@175] -
>> > > Shutting down
>> > > 2014-03-19 17:28:22,323 [myid:3] - INFO
>> > > [SyncThread:3:SyncRequestProcessor@155] - SyncRequestProcessor exited!
>> > > 2014-03-19 17:28:22,325 [myid:3] - WARN
>> > > [LearnerHandler-/10.0.33.1:58546:LearnerHandler@575] - *******
>> > > GOODBYE
>> > > /10.0.33.1:58546 ********
>> > > 2014-03-19 17:28:22,326 [myid:3] - WARN
>> > > [LearnerHandler-/10.0.33.129:49809:LearnerHandler@575] - *******
>> > > GOODBYE
>> > > /10.0.33.129:49809 ********
>> > > 2014-03-19 17:28:22,327 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
>> > > 2014-03-19 17:28:22,328 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading
>> > > snapshot
>> > > /home/zookeeper/data/version-2/snapshot.c200000001
>> > > 2014-03-19 17:28:22,332 [myid:3] - INFO
>> > > [Thread-140:Leader$LearnerCnxAcceptor@309] - exception while
>> > > shutting
>> > down
>> > > acceptor: java.net.SocketException: Socket closed
>> > > 2014-03-19 17:28:24,004 [myid:3] - INFO
>> > > [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited
>> > loop!
>> > > 2014-03-19 17:28:27,398 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41264
>> > > 2014-03-19 17:28:27,399 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:27,399 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41264 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:34,987 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41267
>> > > 2014-03-19 17:28:34,988 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:34,988 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41267 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:35,218 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] -
>> > > New election. My id =  3, proposed zxid=0xc200000001
>> > > 2014-03-19 17:28:35,219 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:35,420 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:35,420 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
>> > > Notification time out: 400
>> > > 2014-03-19 17:28:35,821 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:35,822 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
>> > > Notification time out: 800
>> > > 2014-03-19 17:28:36,623 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:36,623 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
>> > > Notification time out: 1600
>> > > 2014-03-19 17:28:36,800 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
>> > > (n.state),
>> > 1
>> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:37,096 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
>> > > (n.state),
>> > 2
>> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:37,097 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
>> > > (n.state),
>> > 2
>> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:38,698 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:38,698 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
>> > > Notification time out: 3200
>> > > 2014-03-19 17:28:38,700 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
>> > > (n.state),
>> > 1
>> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:38,705 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
>> > > (n.state),
>> > 2
>> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:39,408 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41270
>> > > 2014-03-19 17:28:39,409 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:39,409 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41270 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:41,906 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:41,906 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
>> > > Notification time out: 6400
>> > > 2014-03-19 17:28:42,390 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41273
>> > > 2014-03-19 17:28:42,390 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:42,391 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41273 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:44,729 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41276
>> > > 2014-03-19 17:28:44,730 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:44,730 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41276 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:48,307 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
>> > > Notification time out: 12800
>> > > 2014-03-19 17:28:48,308 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:49,840 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 1
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 1 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:49,841 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 1 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
>> > > 2014-03-19 17:28:50,042 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@750] - LEADING
>> > > 2014-03-19 17:28:50,042 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] -
>> > > Created server with tickTime 2000 minSessionTimeout 4000
>> > > maxSessionTimeout 40000 datadir /home/zookeeper/data/version-2
>> > > snapdir
>> > > /home/zookeeper/data/version-2
>> > > 2014-03-19 17:28:50,042 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader@345] - LEADING -
>> > > LEADER ELECTION TOOK - 27714
>> > > 2014-03-19 17:28:50,045 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading
>> > > snapshot
>> > > /home/zookeeper/data/version-2/snapshot.c200000001
>> > > 2014-03-19 17:28:50,540 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41279
>> > > 2014-03-19 17:28:50,541 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:50,541 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41279 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:51,406 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 2 (n.sid), 0xc5 (n.peerEPoch), LEADING (my state)
>> > > 2014-03-19 17:28:51,406 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
>> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
>> > > (n.state), 2 (n.sid), 0xc5 (n.peerEPoch), LEADING (my state)
>> > > 2014-03-19 17:28:53,526 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41282
>> > > 2014-03-19 17:28:53,526 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:53,527 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41282 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:28:59,322 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41285
>> > > 2014-03-19 17:28:59,323 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:28:59,323 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41285 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:29:00,253 [myid:3] - INFO
>> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] -
>> > Snapshotting:
>> > > 0xc200000001 to /home/zookeeper/data/version-2/snapshot.c200000001
>> > > 2014-03-19 17:29:04,860 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:29:04,860 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41288 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:29:11,031 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41291
>> > > 2014-03-19 17:29:11,032 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:29:11,032 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41291 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:29:16,490 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41294
>> > > 2014-03-19 17:29:16,491 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:29:16,491 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41294 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:29:19,064 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197]
>> > > - Accepted socket connection from /10.0.160.243:41297
>> > > 2014-03-19 17:29:19,065 [myid:3] - WARN
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
>> > Exception
>> > > causing close of session 0x0 due to java.io.IOException:
>> > > ZooKeeperServer
>> > not
>> > > running
>> > > 2014-03-19 17:29:19,065 [myid:3] - INFO
>> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
>> > > Closed socket connection for client /10.0.160.243:41297 (no session
>> > established for
>> > > client)
>> > > 2014-03-19 17:29:19,312 [myid:3] - INFO
>> > > [LearnerHandler-/10.0.33.1:58547:LearnerHandler@263] - Follower sid:
>> 1 :
>> > > info :
>> > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@3c966db5
>> > > 2014-03-19 17:29:19,314 [myid:3] - INFO
>> > > [LearnerHandler-/10.0.33.129:49810:LearnerHandler@263] - Follower sid:
>> > 2 :
>> > > info :
>> > > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@466b56b
>> > > 2014-03-19 17:29:19,475 [myid:3] - ERROR
>> > > [LearnerHandler-/10.0.33.1:58547:LearnerHandler@562] - Unexpected
>> > exception
>> > > causing shutdown while sock still open java.io.EOFException
>> > >         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> > >         at
>> > > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>> > >         at
>> > >
>> > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPack
>> > et.java:83)
>> > >         at
>> > >
>> > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:
>> > 108)
>> > >         at
>> > >
>> > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.j
>> > ava:290)
>> > > 2014-03-19 17:29:19,476 [myid:3] - WARN
>> > > [LearnerHandler-/10.0.33.1:58547:LearnerHandler@575] - *******
>> > > GOODBYE
>> > > /10.0.33.1:58547 ********
>> > > 2014-03-19 17:29:19,476 [myid:3] - ERROR
>> > > [LearnerHandler-/10.0.33.129:49810:LearnerHandler@562] - Unexpected
>> > > exception causing shutdown while sock still open
>> > > java.io.EOFException
>> > >         at java.io.DataInputStream.readInt(DataInputStream.java:392)
>> > >         at
>> > > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>> > >         at
>> > >
>> > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPack
>> > et.java:83)
>> > >         at
>> > >
>> > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:
>> > 108)
>> > >         at
>> > >
>> > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.j
>> > ava:290)
>> > > 2014-03-19 17:29:19,477 [myid:3] - WARN
>> > > [LearnerHandler-/10.0.33.129:49810:LearnerHandler@575] - *******
>> > > GOODBYE
>> > > /10.0.33.129:49810 ********
>> > > 2014-03-19 17:29:21,757 [myid:3] - INFO
>> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 1
>> > > (n.leader), 0xc200000001 (n.zxid), 0x128 (n.round), LOOKING
>> > > (n.state), 1 (n.sid), 0xc5 (n.peerEPoch), LEADING (my state)
>> > >
>> > >
>> > >
>> > > 2014-03-23 12:05 GMT+09:00 Michi Mutsuzaki <michi@cs.stanford.edu>:
>> > >
>> > >> Hi Youngseok,
>> > >>
>> > >> Could you post the log file from 192.168.161.1? The log file you
>> > >> posted indicates that 192.168.33.1 is not able to connect to
>> > >> 192.168.161.1.
>> > >>
>> > >> Thanks!
>> > >> --Michi
>> > >>
>> > >>
>> > >> On Fri, Mar 21, 2014 at 12:14 AM, Jung Young Seok
>> > >> <jung.youngseok@gmail.com> wrote:
>> > >> > Dear Zookeeper usergroup members,
>> > >> >
>> > >> > I have some questions.
>> > >> >
>> > >> > We're currently use Zookeeper 3.4.5 with clustering 3 nodes.
>> > >> > We got zookeeper service stopped all of sudden so client wasn't
>> > >> > able
>> > to
>> > >> > connect to zookeeper server.
>> > >> > In that situation,  zookeepers couldn't elect leader each other.
>> > >> >
>> > >> > Then I restarted zookeeper service (all of them) but could't
>> > >> > elect leader and be follower.
>> > >> > So I rebooted linux but same happened. (I lost zookeeper log here
>> > >> > t.t) When I removed snapshot files in data directory, the
>> > >> > zookeeper worked okay.
>> > >> > I have uploaded my zookeeper snapshot here
>> > >> >  -
>> > >> >
>> > https://s3-ap-northeast-1.amazonaws.com/zookeeper-logs/data_org_b1.tar
>> > >> >
>> > >> > If I push the snapshot into data directory, zookeeper clustering
>> > >> > fail reappears again.
>> > >> >
>> > >> > My question is
>> > >> >  1. why the snapshot was corrupted all of sudden?
>> > >> >  2. Is there any way I can avoid this snapshot corruption issue?
>> > >> >
>> > >> > I've attached zoo.cfg and some of error log.
>> > >> >
>> > >> > I'd be happy if I get any opinion.
>> > >> > Thank You.
>> > >> >
>> > >> > Best Regards
>> > >> > Youngseok Jung
>> > >> >
>> > >> >
>> > >> > #zoo.cfg (pretty much default setting)
>> > >> > tickTime=2000
>> > >> > initLimit=10
>> > >> > syncLimit=5
>> > >> > dataDir=/home/zookeeper/data
>> > >> > clientPort=2181
>> > >> >
>> > >> > server.1=192.168.33.1:2888:3888
>> > >> > server.2=192.168.33.129:2888:3888
>> > >> > server.3=192.168.161.1:2888:3888
>> > >> > autopurge.snapRetainCount=3
>> > >> > autopurge.purgeInterval=1
>> > >> >
>> > >> >
>> > >> > #Some of error log
>> > >> > 2014-03-19 17:56:24,737 [myid:1] - INFO
>> > >> > [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification:
2
>> > >> > (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING
>> > (n.state), 2
>> > >> > (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
>> > >> > 2014-03-19 17:56:24,737 [myid:1] - WARN
>> > >> > [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel
>> > to 3
>> > >> > at
>> > >> > election address /10.0.161.1:3888
>> > >> > java.net.ConnectException: Connection refused
>> > >> >         at java.net.PlainSocketImpl.socketConnect(Native Method)
>> > >> >         at
>> > >> >
>> > >> >
>> > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.jav
>> > a:339)
>> > >> >         at
>> > >> >
>> > >> >
>> > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketI
>> > mpl.java:200)
>> > >> >         at
>> > >> >
>> > >> >
>> > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:
>> > 182)
>> > >> >         at
>> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> > >> >         at java.net.Socket.connect(Socket.java:579)
>> > >> >         at
>> > >> >
>> > >> >
>> > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumC
>> > nxManager.java:354)
>> > >> >         at
>> > >> >
>> > >> >
>> > org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxMa
>> > nager.java:327)
>> > >> >         at
>> > >> >
>> > >> >
>> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker
>> > Sender.process(FastLeaderElection.java:393)
>> > >> >         at
>> > >> >
>> > >> >
>> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker
>> > Sender.run(FastLeaderElection.java:365)
>> > >> >         at java.lang.Thread.run(Thread.java:724)
>> > >> > 2014-03-19 17:56:25,537 [myid:1] - INFO
>> > >> > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774]
>> > >> > - Notification time out: 1600
>> > >> > 2014-03-19 17:56:25,538 [myid:1] - INFO
>> > >> > [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification:
1
>> > >> > (n.leader), 0xc200000001 (n.zxid), 0x145 (n.round), LOOKING
>> > (n.state), 1
>> > >> > (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
>> > >> > 2014-03-19 17:56:25,540 [myid:1] - INFO
>> > >> > [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification:
2
>> > >> > (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING
>> > (n.state), 2
>> > >> > (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
>> > >> > 2014-03-19 17:56:25,540 [myid:1] - WARN
>> > >> > [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open channel
>> > to 3
>> > >> > at
>> > >> > election address /10.0.161.1:3888
>> > >> > java.net.ConnectException: Connection refused
>> > >> >         at java.net.PlainSocketImpl.socketConnect(Native Method)
>> > >> >         at
>> > >> >
>> > >> >
>> > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.jav
>> > a:339)
>> > >> >         at
>> > >> >
>> > >> >
>> > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketI
>> > mpl.java:200)
>> > >> >         at
>> > >> >
>> > >> >
>> > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:
>> > 182)
>> > >> >         at
>> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
>> > >> >         at java.net.Socket.connect(Socket.java:579)
>> > >> >         at
>> > >> >
>> > >> >
>> > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumC
>> > nxManager.java:354)
>> > >> >         at
>> > >> >
>> > >> >
>> > org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxMa
>> > nager.java:327)
>> > >> >         at
>> > >> >
>> > >> >
>> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker
>> > Sender.process(FastLeaderElection.java:393)
>> > >> >         at
>> > >> >
>> > >> >
>> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker
>> > Sender.run(FastLeaderElection.java:365)
>> > >> >         at java.lang.Thread.run(Thread.java:724)
>> > >
>> > >
>> >
>>

Mime
View raw message