zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jung Young Seok <jung.youngs...@gmail.com>
Subject Re: [Zookeeper] Zookeeper Cluster broken due to snapshot corrupted error
Date Wed, 26 Mar 2014 23:23:52 GMT
Thank you, Michi

Does snapshotting means dumping snap data on memory into disk?

Small snapCount would take little time for snapshotting, it means shorter
stopping world. Do I understand properly?

Thank you in advance.
2014. 3. 27. 오전 8:01에 "Michi Mutsuzaki" <michi@cs.stanford.edu>님이 작성:

> Hi Youngseok,
>
> Don't rotate transaction logs yourself. At the very minimum, you need
> to keep the most recent snapshot file and the most recent transaction
> log file. You can use the snapCount to control the size of the
> transaction log files. However be aware that smaller snapCount means
> more frequent snapshotting, which affects ZooKeeper performance. Do
> test it before changing snapCount in production.
>
>
> http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html#sc_advancedConfiguration
>
> For zookeeper.out, it's really up to you to decide how much data to
> retain. Personally I would keep at least a day worth of INFO log.
>
>
> On Wed, Mar 26, 2014 at 3:36 PM, Jung Young Seok
> <jung.youngseok@gmail.com> wrote:
> > I have a couple of question regarding log and snapshot management.
> >
> > I use auto purge feature so I keep only 3 snapshots and 3 transaction
> logs.
> > Zookeeper log(zookeeper.out) is rotated daily.
> >
> > What I would like to do is keep the log and snapshot files as small as
> > possible.
> >
> > Would it be okay to manage snapshot files and transaction logs with
> > logrotate.d or Log4j rolling not to grow more than 50MB ?
> >
> > Thank you in advance.
> > Youngseok
> > 2014. 3. 24. 오후 4:43에 "Rakesh R" <rakeshr@huawei.com>님이 작성:
> >
> >>
> >> From the latest log shared by YoungSeok,
> >>
> >> [1] I could see the LearnerHandler fails to get a Leader.ACKEPOCH
> response
> >> from the Followers and is failing with the following exception.
> >>
> >> 2014-03-19 17:29:19,312 [myid:3] - INFO [LearnerHandler-/
> 10.0.33.1:58547
> >> :LearnerHandler@263] - Follower sid: 1 :info :
> >> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@3c966db5
> >> 2014-03-19 17:29:19,314 [myid:3] - INFO [LearnerHandler-/
> 10.0.33.129:49810
> >> :LearnerHandler@263] - Follower sid: 2 :info :
> >> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@466b56b
> >> 2014-03-19 17:29:19,475 [myid:3] - ERROR [LearnerHandler-/
> 10.0.33.1:58547
> >> :LearnerHandler@562] - Unexpected
> >> exception causing shutdown while sock still open java.io.EOFException
> >>         at java.io.DataInputStream.readInt(DataInputStream.java:392)
> >>         at
> >> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >>         at
> >>
> >>
> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
> >>         at
> >>
> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
> >>         at
> >> org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.j
> >> ava:290)
> >>
> >> Hi Michi, It would be great if you can help to know more on ZK-1697. As
> I
> >> understood this is talking about the Leader.ACK, am I correct ?, if so I
> >> got confused by seeing the Leader.ACKEPOCH exception in LearnerHandler
> side
> >> [1]. From the code what I've seen Leader.ACKEPOCH would sent to the
> Leader
> >> at the time of Learner# registerWithLeader().
> >>
> >> Thanks in advance.
> >> Rakesh
> >>
> >> -----Original Message-----
> >> From: Jung Young Seok [mailto:jung.youngseok@gmail.com]
> >> Sent: 24 March 2014 10:20
> >> To: michi@cs.stanford.edu
> >> Cc: user@zookeeper.apache.org
> >> Subject: Re: [Zookeeper] Zookeeper Cluster broken due to snapshot
> >> corrupted error
> >>
> >> I'm not sure if my issue is related to
> >> https://issues.apache.org/jira/browse/ZOOKEEPER-1697
> >> but I think I should try with  Zookeeer version to 3.4.6(stable).
> >>
> >> I'm just hoping 3.4.6 version  would prevent happening my issue again.
> >>
> >> Thank you for your answer.
> >> Have a great day.
> >>
> >> Best Regards,
> >> Youngseok Jung
> >>
> >>
> >> 2014-03-24 13:05 GMT+09:00 Michi Mutsuzaki <michi@cs.stanford.edu>:
> >>
> >> > I wonder if this is related to ZOOKEEPER-1697.
> >> >
> >> > https://issues.apache.org/jira/browse/ZOOKEEPER-1697
> >> >
> >> > --Michi
> >> >
> >> > On Sun, Mar 23, 2014 at 6:15 PM, Jung Young Seok
> >> > <jung.youngseok@gmail.com> wrote:
> >> > > I've added zookeeper log (192.168.161.1).
> >> > > The time that the log was written look different but you might
> ignore
> >> it.
> >> > > Logs on 192.168.161.1 had been repeated with below pattern.
> >> > >
> >> > > Thank you for your asking.
> >> > >
> >> > >
> >> > ----------------------------------------------------------------------
> >> > ----------------------------------------------------------------------
> >> > ----------------------------------------------------------
> >> > > 2014-03-19 17:28:06,105 [myid:3] - INFO
> >> > > [LearnerHandler-/10.0.33.129:49809:LearnerHandler@395] - Sending
> >> > > DIFF
> >> > > 2014-03-19 17:28:07,414 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41252
> >> > > 2014-03-19 17:28:07,415 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:07,415 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41252 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:12,173 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41255
> >> > > 2014-03-19 17:28:12,174 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:12,174 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41255 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:14,558 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41258
> >> > > 2014-03-19 17:28:14,559 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:14,559 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41258 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:18,585 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41261
> >> > > 2014-03-19 17:28:18,586 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:18,586 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41261 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:20,067 [myid:3] - WARN
> >> > > [LearnerHandler-/10.0.33.1:58546:Leader@574] - Commiting zxid
> >> > 0xc500000000
> >> > > from /10.0.161.1:2888 not first!
> >> > > 2014-03-19 17:28:20,067 [myid:3] - WARN
> >> > > [LearnerHandler-/10.0.33.1:58546:Leader@576] - First is 0x0
> >> > > 2014-03-19 17:28:20,068 [myid:3] - INFO
> >> > > [LearnerHandler-/10.0.33.1:58546:Leader@598] - Have quorum of
> >> > supporters;
> >> > > starting up and setting last processed zxid: 0xc500000000
> >> > > 2014-03-19 17:28:22,312 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader@490] - Shutting
> down
> >> > > 2014-03-19 17:28:22,312 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader@496] - Shutdown
> >> > > called
> >> > > java.lang.Exception: shutdown Leader! reason: Only 1 followers,
> need 1
> >> > >         at
> >> > > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496)
> >> > >         at
> >> > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471)
> >> > >         at
> >> > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:75
> >> > > 3)
> >> > > 2014-03-19 17:28:22,313 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] -
> >> > > shutting down
> >> > > 2014-03-19 17:28:22,320 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] -
> >> > Shutting
> >> > > down
> >> > > 2014-03-19 17:28:22,320 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@743]
> -
> >> > > Shutting down
> >> > > 2014-03-19 17:28:22,321 [myid:3] - INFO  [ProcessThread(sid:3
> >> > > cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited
> >> loop!
> >> > > 2014-03-19 17:28:22,321 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@88
> >> > > ] - Shutting down
> >> > > 2014-03-19 17:28:22,322 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] -
> >> > > Shutting down
> >> > > 2014-03-19 17:28:22,322 [myid:3] - INFO
> >> > > [CommitProcessor:3:CommitProcessor@150] - CommitProcessor exited
> loop!
> >> > > 2014-03-19 17:28:22,322 [myid:3] - INFO
> >> > >
> >> > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestProc
> >> > essor@655
> >> > ]
> >> > > - Shutting down
> >> > > 2014-03-19 17:28:22,322 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415]
> >> > > - shutdown of request processor complete
> >> > > 2014-03-19 17:28:22,323 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@175]
> -
> >> > > Shutting down
> >> > > 2014-03-19 17:28:22,323 [myid:3] - INFO
> >> > > [SyncThread:3:SyncRequestProcessor@155] - SyncRequestProcessor
> exited!
> >> > > 2014-03-19 17:28:22,325 [myid:3] - WARN
> >> > > [LearnerHandler-/10.0.33.1:58546:LearnerHandler@575] - *******
> >> > > GOODBYE
> >> > > /10.0.33.1:58546 ********
> >> > > 2014-03-19 17:28:22,326 [myid:3] - WARN
> >> > > [LearnerHandler-/10.0.33.129:49809:LearnerHandler@575] - *******
> >> > > GOODBYE
> >> > > /10.0.33.129:49809 ********
> >> > > 2014-03-19 17:28:22,327 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING
> >> > > 2014-03-19 17:28:22,328 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading
> >> > > snapshot
> >> > > /home/zookeeper/data/version-2/snapshot.c200000001
> >> > > 2014-03-19 17:28:22,332 [myid:3] - INFO
> >> > > [Thread-140:Leader$LearnerCnxAcceptor@309] - exception while
> >> > > shutting
> >> > down
> >> > > acceptor: java.net.SocketException: Socket closed
> >> > > 2014-03-19 17:28:24,004 [myid:3] - INFO
> >> > > [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited
> >> > loop!
> >> > > 2014-03-19 17:28:27,398 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41264
> >> > > 2014-03-19 17:28:27,399 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:27,399 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41264 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:34,987 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41267
> >> > > 2014-03-19 17:28:34,988 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:34,988 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41267 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:35,218 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] -
> >> > > New election. My id =  3, proposed zxid=0xc200000001
> >> > > 2014-03-19 17:28:35,219 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:35,420 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:35,420 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
> >> > > Notification time out: 400
> >> > > 2014-03-19 17:28:35,821 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:35,822 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
> >> > > Notification time out: 800
> >> > > 2014-03-19 17:28:36,623 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:36,623 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
> >> > > Notification time out: 1600
> >> > > 2014-03-19 17:28:36,800 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
> >> > > (n.state),
> >> > 1
> >> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:37,096 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
> >> > > (n.state),
> >> > 2
> >> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:37,097 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
> >> > > (n.state),
> >> > 2
> >> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:38,698 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:38,698 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
> >> > > Notification time out: 3200
> >> > > 2014-03-19 17:28:38,700 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
> >> > > (n.state),
> >> > 1
> >> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:38,705 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING
> >> > > (n.state),
> >> > 2
> >> > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:39,408 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41270
> >> > > 2014-03-19 17:28:39,409 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:39,409 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41270 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:41,906 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:41,906 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
> >> > > Notification time out: 6400
> >> > > 2014-03-19 17:28:42,390 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41273
> >> > > 2014-03-19 17:28:42,390 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:42,391 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41273 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:44,729 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41276
> >> > > 2014-03-19 17:28:44,730 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:44,730 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41276 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:48,307 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] -
> >> > > Notification time out: 12800
> >> > > 2014-03-19 17:28:48,308 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:49,840 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 1
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 1 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:49,841 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 1 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state)
> >> > > 2014-03-19 17:28:50,042 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@750] - LEADING
> >> > > 2014-03-19 17:28:50,042 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] -
> >> > > Created server with tickTime 2000 minSessionTimeout 4000
> >> > > maxSessionTimeout 40000 datadir /home/zookeeper/data/version-2
> >> > > snapdir
> >> > > /home/zookeeper/data/version-2
> >> > > 2014-03-19 17:28:50,042 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:Leader@345] - LEADING -
> >> > > LEADER ELECTION TOOK - 27714
> >> > > 2014-03-19 17:28:50,045 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading
> >> > > snapshot
> >> > > /home/zookeeper/data/version-2/snapshot.c200000001
> >> > > 2014-03-19 17:28:50,540 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41279
> >> > > 2014-03-19 17:28:50,541 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:50,541 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41279 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:51,406 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 2
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 2 (n.sid), 0xc5 (n.peerEPoch), LEADING (my state)
> >> > > 2014-03-19 17:28:51,406 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 3
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING
> >> > > (n.state), 2 (n.sid), 0xc5 (n.peerEPoch), LEADING (my state)
> >> > > 2014-03-19 17:28:53,526 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41282
> >> > > 2014-03-19 17:28:53,526 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:53,527 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41282 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:28:59,322 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41285
> >> > > 2014-03-19 17:28:59,323 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:28:59,323 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41285 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:29:00,253 [myid:3] - INFO
> >> > > [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] -
> >> > Snapshotting:
> >> > > 0xc200000001 to /home/zookeeper/data/version-2/snapshot.c200000001
> >> > > 2014-03-19 17:29:04,860 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:29:04,860 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41288 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:29:11,031 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41291
> >> > > 2014-03-19 17:29:11,032 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:29:11,032 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41291 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:29:16,490 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41294
> >> > > 2014-03-19 17:29:16,491 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:29:16,491 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41294 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:29:19,064 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197
> ]
> >> > > - Accepted socket connection from /10.0.160.243:41297
> >> > > 2014-03-19 17:29:19,065 [myid:3] - WARN
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] -
> >> > Exception
> >> > > causing close of session 0x0 due to java.io.IOException:
> >> > > ZooKeeperServer
> >> > not
> >> > > running
> >> > > 2014-03-19 17:29:19,065 [myid:3] - INFO
> >> > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] -
> >> > > Closed socket connection for client /10.0.160.243:41297 (no session
> >> > established for
> >> > > client)
> >> > > 2014-03-19 17:29:19,312 [myid:3] - INFO
> >> > > [LearnerHandler-/10.0.33.1:58547:LearnerHandler@263] - Follower
> sid:
> >> 1 :
> >> > > info :
> >> > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@3c966db5
> >> > > 2014-03-19 17:29:19,314 [myid:3] - INFO
> >> > > [LearnerHandler-/10.0.33.129:49810:LearnerHandler@263] - Follower
> sid:
> >> > 2 :
> >> > > info :
> >> > > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@466b56b
> >> > > 2014-03-19 17:29:19,475 [myid:3] - ERROR
> >> > > [LearnerHandler-/10.0.33.1:58547:LearnerHandler@562] - Unexpected
> >> > exception
> >> > > causing shutdown while sock still open java.io.EOFException
> >> > >         at java.io.DataInputStream.readInt(DataInputStream.java:392)
> >> > >         at
> >> > >
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPack
> >> > et.java:83)
> >> > >         at
> >> > >
> >> > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:
> >> > 108)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.j
> >> > ava:290)
> >> > > 2014-03-19 17:29:19,476 [myid:3] - WARN
> >> > > [LearnerHandler-/10.0.33.1:58547:LearnerHandler@575] - *******
> >> > > GOODBYE
> >> > > /10.0.33.1:58547 ********
> >> > > 2014-03-19 17:29:19,476 [myid:3] - ERROR
> >> > > [LearnerHandler-/10.0.33.129:49810:LearnerHandler@562] - Unexpected
> >> > > exception causing shutdown while sock still open
> >> > > java.io.EOFException
> >> > >         at java.io.DataInputStream.readInt(DataInputStream.java:392)
> >> > >         at
> >> > >
> org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPack
> >> > et.java:83)
> >> > >         at
> >> > >
> >> > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:
> >> > 108)
> >> > >         at
> >> > >
> >> > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.j
> >> > ava:290)
> >> > > 2014-03-19 17:29:19,477 [myid:3] - WARN
> >> > > [LearnerHandler-/10.0.33.129:49810:LearnerHandler@575] - *******
> >> > > GOODBYE
> >> > > /10.0.33.129:49810 ********
> >> > > 2014-03-19 17:29:21,757 [myid:3] - INFO
> >> > > [WorkerReceiver[myid=3]:FastLeaderElection@542] - Notification: 1
> >> > > (n.leader), 0xc200000001 (n.zxid), 0x128 (n.round), LOOKING
> >> > > (n.state), 1 (n.sid), 0xc5 (n.peerEPoch), LEADING (my state)
> >> > >
> >> > >
> >> > >
> >> > > 2014-03-23 12:05 GMT+09:00 Michi Mutsuzaki <michi@cs.stanford.edu>:
> >> > >
> >> > >> Hi Youngseok,
> >> > >>
> >> > >> Could you post the log file from 192.168.161.1? The log file you
> >> > >> posted indicates that 192.168.33.1 is not able to connect to
> >> > >> 192.168.161.1.
> >> > >>
> >> > >> Thanks!
> >> > >> --Michi
> >> > >>
> >> > >>
> >> > >> On Fri, Mar 21, 2014 at 12:14 AM, Jung Young Seok
> >> > >> <jung.youngseok@gmail.com> wrote:
> >> > >> > Dear Zookeeper usergroup members,
> >> > >> >
> >> > >> > I have some questions.
> >> > >> >
> >> > >> > We're currently use Zookeeper 3.4.5 with clustering 3 nodes.
> >> > >> > We got zookeeper service stopped all of sudden so client
wasn't
> >> > >> > able
> >> > to
> >> > >> > connect to zookeeper server.
> >> > >> > In that situation,  zookeepers couldn't elect leader each
other.
> >> > >> >
> >> > >> > Then I restarted zookeeper service (all of them) but could't
> >> > >> > elect leader and be follower.
> >> > >> > So I rebooted linux but same happened. (I lost zookeeper
log here
> >> > >> > t.t) When I removed snapshot files in data directory, the
> >> > >> > zookeeper worked okay.
> >> > >> > I have uploaded my zookeeper snapshot here
> >> > >> >  -
> >> > >> >
> >> >
> https://s3-ap-northeast-1.amazonaws.com/zookeeper-logs/data_org_b1.tar
> >> > >> >
> >> > >> > If I push the snapshot into data directory, zookeeper clustering
> >> > >> > fail reappears again.
> >> > >> >
> >> > >> > My question is
> >> > >> >  1. why the snapshot was corrupted all of sudden?
> >> > >> >  2. Is there any way I can avoid this snapshot corruption
issue?
> >> > >> >
> >> > >> > I've attached zoo.cfg and some of error log.
> >> > >> >
> >> > >> > I'd be happy if I get any opinion.
> >> > >> > Thank You.
> >> > >> >
> >> > >> > Best Regards
> >> > >> > Youngseok Jung
> >> > >> >
> >> > >> >
> >> > >> > #zoo.cfg (pretty much default setting)
> >> > >> > tickTime=2000
> >> > >> > initLimit=10
> >> > >> > syncLimit=5
> >> > >> > dataDir=/home/zookeeper/data
> >> > >> > clientPort=2181
> >> > >> >
> >> > >> > server.1=192.168.33.1:2888:3888
> >> > >> > server.2=192.168.33.129:2888:3888
> >> > >> > server.3=192.168.161.1:2888:3888
> >> > >> > autopurge.snapRetainCount=3
> >> > >> > autopurge.purgeInterval=1
> >> > >> >
> >> > >> >
> >> > >> > #Some of error log
> >> > >> > 2014-03-19 17:56:24,737 [myid:1] - INFO
> >> > >> > [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification:
> 2
> >> > >> > (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING
> >> > (n.state), 2
> >> > >> > (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
> >> > >> > 2014-03-19 17:56:24,737 [myid:1] - WARN
> >> > >> > [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open
> channel
> >> > to 3
> >> > >> > at
> >> > >> > election address /10.0.161.1:3888
> >> > >> > java.net.ConnectException: Connection refused
> >> > >> >         at java.net.PlainSocketImpl.socketConnect(Native
Method)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.jav
> >> > a:339)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketI
> >> > mpl.java:200)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:
> >> > 182)
> >> > >> >         at
> >> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> >> > >> >         at java.net.Socket.connect(Socket.java:579)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumC
> >> > nxManager.java:354)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxMa
> >> > nager.java:327)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker
> >> > Sender.process(FastLeaderElection.java:393)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker
> >> > Sender.run(FastLeaderElection.java:365)
> >> > >> >         at java.lang.Thread.run(Thread.java:724)
> >> > >> > 2014-03-19 17:56:25,537 [myid:1] - INFO
> >> > >> > [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774]
> >> > >> > - Notification time out: 1600
> >> > >> > 2014-03-19 17:56:25,538 [myid:1] - INFO
> >> > >> > [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification:
> 1
> >> > >> > (n.leader), 0xc200000001 (n.zxid), 0x145 (n.round), LOOKING
> >> > (n.state), 1
> >> > >> > (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
> >> > >> > 2014-03-19 17:56:25,540 [myid:1] - INFO
> >> > >> > [WorkerReceiver[myid=1]:FastLeaderElection@542] - Notification:
> 2
> >> > >> > (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING
> >> > (n.state), 2
> >> > >> > (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state)
> >> > >> > 2014-03-19 17:56:25,540 [myid:1] - WARN
> >> > >> > [WorkerSender[myid=1]:QuorumCnxManager@368] - Cannot open
> channel
> >> > to 3
> >> > >> > at
> >> > >> > election address /10.0.161.1:3888
> >> > >> > java.net.ConnectException: Connection refused
> >> > >> >         at java.net.PlainSocketImpl.socketConnect(Native
Method)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.jav
> >> > a:339)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketI
> >> > mpl.java:200)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:
> >> > 182)
> >> > >> >         at
> >> java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
> >> > >> >         at java.net.Socket.connect(Socket.java:579)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumC
> >> > nxManager.java:354)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxMa
> >> > nager.java:327)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker
> >> > Sender.process(FastLeaderElection.java:393)
> >> > >> >         at
> >> > >> >
> >> > >> >
> >> > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker
> >> > Sender.run(FastLeaderElection.java:365)
> >> > >> >         at java.lang.Thread.run(Thread.java:724)
> >> > >
> >> > >
> >> >
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message