Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A6C8A1044B for ; Wed, 26 Mar 2014 22:37:14 +0000 (UTC) Received: (qmail 36832 invoked by uid 500); 26 Mar 2014 22:37:11 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 36675 invoked by uid 500); 26 Mar 2014 22:37:11 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 36651 invoked by uid 99); 26 Mar 2014 22:37:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Mar 2014 22:37:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,NORMAL_HTTP_TO_IP,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of jung.youngseok@gmail.com designates 209.85.219.52 as permitted sender) Received: from [209.85.219.52] (HELO mail-oa0-f52.google.com) (209.85.219.52) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Mar 2014 22:37:01 +0000 Received: by mail-oa0-f52.google.com with SMTP id l6so3332138oag.25 for ; Wed, 26 Mar 2014 15:36:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=EG6RA8VqdtV5UjXTzJOUSnvSB+HceSr467A8QE+QjTw=; b=Z7gtn/Bg4cPgD4/Mm/HZe4iFuzqBWloU154h/3KW1jLIOE5fXIiqMTrMPMHe1eQhHs DTeHD+BiVN6fcDttfltvTn5Kc64B6k0Vb55PU0LrWp8t80ngzMsM+DDxKR4gY+R5qn4x mXda/tsCPzLDjIFzgPSuJbU75avL7sHlNyMzCvDaEllzsB/lfKs6XKJ+7ZQh3cBzp+iP SB0ZP0bfSVKDwLUKGO2pdX4cHf6Lx7Oiopq1/wZ+v5yAasUOfYpGbt/fxUFbyca2F1Uo I+66751EOThzWk2jf6GSAceSsC5HPAPRdkr7lpXeI0IFfxnjZGT+e7zraG88VcG468Dl lrCg== MIME-Version: 1.0 X-Received: by 10.60.174.170 with SMTP id bt10mr18482204oec.47.1395873399440; Wed, 26 Mar 2014 15:36:39 -0700 (PDT) Received: by 10.182.88.3 with HTTP; Wed, 26 Mar 2014 15:36:39 -0700 (PDT) Received: by 10.182.88.3 with HTTP; Wed, 26 Mar 2014 15:36:39 -0700 (PDT) In-Reply-To: References: Date: Thu, 27 Mar 2014 07:36:39 +0900 Message-ID: Subject: RE: [Zookeeper] Zookeeper Cluster broken due to snapshot corrupted error From: Jung Young Seok To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=047d7bd6c52c200d8b04f58a1a7f X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd6c52c200d8b04f58a1a7f Content-Type: text/plain; charset=EUC-KR Content-Transfer-Encoding: quoted-printable I have a couple of question regarding log and snapshot management. I use auto purge feature so I keep only 3 snapshots and 3 transaction logs. Zookeeper log(zookeeper.out) is rotated daily. What I would like to do is keep the log and snapshot files as small as possible. Would it be okay to manage snapshot files and transaction logs with logrotate.d or Log4j rolling not to grow more than 50MB ? Thank you in advance. Youngseok 2014. 3. 24. =BF=C0=C8=C4 4:43=BF=A1 "Rakesh R" =B4=D4= =C0=CC =C0=DB=BC=BA: > > From the latest log shared by YoungSeok, > > [1] I could see the LearnerHandler fails to get a Leader.ACKEPOCH respons= e > from the Followers and is failing with the following exception. > > 2014-03-19 17:29:19,312 [myid:3] - INFO [LearnerHandler-/10.0.33.1:58547 > :LearnerHandler@263] - Follower sid: 1 :info : > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@3c966db5 > 2014-03-19 17:29:19,314 [myid:3] - INFO [LearnerHandler-/10.0.33.129:4981= 0 > :LearnerHandler@263] - Follower sid: 2 :info : > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@466b56b > 2014-03-19 17:29:19,475 [myid:3] - ERROR [LearnerHandler-/10.0.33.1:58547 > :LearnerHandler@562] - Unexpected > exception causing shutdown while sock still open java.io.EOFException > at java.io.DataInputStream.readInt(DataInputStream.java:392) > at > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63) > at > > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.= java:83) > at > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108= ) > at > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.j > ava:290) > > Hi Michi, It would be great if you can help to know more on ZK-1697. As I > understood this is talking about the Leader.ACK, am I correct ?, if so I > got confused by seeing the Leader.ACKEPOCH exception in LearnerHandler si= de > [1]. From the code what I've seen Leader.ACKEPOCH would sent to the Leade= r > at the time of Learner# registerWithLeader(). > > Thanks in advance. > Rakesh > > -----Original Message----- > From: Jung Young Seok [mailto:jung.youngseok@gmail.com] > Sent: 24 March 2014 10:20 > To: michi@cs.stanford.edu > Cc: user@zookeeper.apache.org > Subject: Re: [Zookeeper] Zookeeper Cluster broken due to snapshot > corrupted error > > I'm not sure if my issue is related to > https://issues.apache.org/jira/browse/ZOOKEEPER-1697 > but I think I should try with Zookeeer version to 3.4.6(stable). > > I'm just hoping 3.4.6 version would prevent happening my issue again. > > Thank you for your answer. > Have a great day. > > Best Regards, > Youngseok Jung > > > 2014-03-24 13:05 GMT+09:00 Michi Mutsuzaki : > > > I wonder if this is related to ZOOKEEPER-1697. > > > > https://issues.apache.org/jira/browse/ZOOKEEPER-1697 > > > > --Michi > > > > On Sun, Mar 23, 2014 at 6:15 PM, Jung Young Seok > > wrote: > > > I've added zookeeper log (192.168.161.1). > > > The time that the log was written look different but you might ignore > it. > > > Logs on 192.168.161.1 had been repeated with below pattern. > > > > > > Thank you for your asking. > > > > > > > > ---------------------------------------------------------------------- > > ---------------------------------------------------------------------- > > ---------------------------------------------------------- > > > 2014-03-19 17:28:06,105 [myid:3] - INFO > > > [LearnerHandler-/10.0.33.129:49809:LearnerHandler@395] - Sending > > > DIFF > > > 2014-03-19 17:28:07,414 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41252 > > > 2014-03-19 17:28:07,415 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:07,415 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41252 (no session > > established for > > > client) > > > 2014-03-19 17:28:12,173 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41255 > > > 2014-03-19 17:28:12,174 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:12,174 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41255 (no session > > established for > > > client) > > > 2014-03-19 17:28:14,558 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41258 > > > 2014-03-19 17:28:14,559 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:14,559 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41258 (no session > > established for > > > client) > > > 2014-03-19 17:28:18,585 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41261 > > > 2014-03-19 17:28:18,586 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:18,586 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41261 (no session > > established for > > > client) > > > 2014-03-19 17:28:20,067 [myid:3] - WARN > > > [LearnerHandler-/10.0.33.1:58546:Leader@574] - Commiting zxid > > 0xc500000000 > > > from /10.0.161.1:2888 not first! > > > 2014-03-19 17:28:20,067 [myid:3] - WARN > > > [LearnerHandler-/10.0.33.1:58546:Leader@576] - First is 0x0 > > > 2014-03-19 17:28:20,068 [myid:3] - INFO > > > [LearnerHandler-/10.0.33.1:58546:Leader@598] - Have quorum of > > supporters; > > > starting up and setting last processed zxid: 0xc500000000 > > > 2014-03-19 17:28:22,312 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:Leader@490] - Shutting dow= n > > > 2014-03-19 17:28:22,312 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:Leader@496] - Shutdown > > > called > > > java.lang.Exception: shutdown Leader! reason: Only 1 followers, need = 1 > > > at > > > org.apache.zookeeper.server.quorum.Leader.shutdown(Leader.java:496) > > > at > > org.apache.zookeeper.server.quorum.Leader.lead(Leader.java:471) > > > at > > > org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:75 > > > 3) > > > 2014-03-19 17:28:22,313 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@419] - > > > shutting down > > > 2014-03-19 17:28:22,320 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:SessionTrackerImpl@225] - > > Shutting > > > down > > > 2014-03-19 17:28:22,320 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:PrepRequestProcessor@743] = - > > > Shutting down > > > 2014-03-19 17:28:22,321 [myid:3] - INFO [ProcessThread(sid:3 > > > cport:-1)::PrepRequestProcessor@143] - PrepRequestProcessor exited > loop! > > > 2014-03-19 17:28:22,321 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:ProposalRequestProcessor@8= 8 > > > ] - Shutting down > > > 2014-03-19 17:28:22,322 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:CommitProcessor@181] - > > > Shutting down > > > 2014-03-19 17:28:22,322 [myid:3] - INFO > > > [CommitProcessor:3:CommitProcessor@150] - CommitProcessor exited loop= ! > > > 2014-03-19 17:28:22,322 [myid:3] - INFO > > > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:Leader$ToBeAppliedRequestPro= c > > essor@655 > > ] > > > - Shutting down > > > 2014-03-19 17:28:22,322 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FinalRequestProcessor@415] > > > - shutdown of request processor complete > > > 2014-03-19 17:28:22,323 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:SyncRequestProcessor@175] = - > > > Shutting down > > > 2014-03-19 17:28:22,323 [myid:3] - INFO > > > [SyncThread:3:SyncRequestProcessor@155] - SyncRequestProcessor exited= ! > > > 2014-03-19 17:28:22,325 [myid:3] - WARN > > > [LearnerHandler-/10.0.33.1:58546:LearnerHandler@575] - ******* > > > GOODBYE > > > /10.0.33.1:58546 ******** > > > 2014-03-19 17:28:22,326 [myid:3] - WARN > > > [LearnerHandler-/10.0.33.129:49809:LearnerHandler@575] - ******* > > > GOODBYE > > > /10.0.33.129:49809 ******** > > > 2014-03-19 17:28:22,327 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@670] - LOOKING > > > 2014-03-19 17:28:22,328 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading > > > snapshot > > > /home/zookeeper/data/version-2/snapshot.c200000001 > > > 2014-03-19 17:28:22,332 [myid:3] - INFO > > > [Thread-140:Leader$LearnerCnxAcceptor@309] - exception while > > > shutting > > down > > > acceptor: java.net.SocketException: Socket closed > > > 2014-03-19 17:28:24,004 [myid:3] - INFO > > > [SessionTracker:SessionTrackerImpl@162] - SessionTrackerImpl exited > > loop! > > > 2014-03-19 17:28:27,398 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41264 > > > 2014-03-19 17:28:27,399 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:27,399 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41264 (no session > > established for > > > client) > > > 2014-03-19 17:28:34,987 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41267 > > > 2014-03-19 17:28:34,988 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:34,988 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41267 (no session > > established for > > > client) > > > 2014-03-19 17:28:35,218 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@740] - > > > New election. My id =3D 3, proposed zxid=3D0xc200000001 > > > 2014-03-19 17:28:35,219 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:35,420 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:35,420 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - > > > Notification time out: 400 > > > 2014-03-19 17:28:35,821 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:35,822 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - > > > Notification time out: 800 > > > 2014-03-19 17:28:36,623 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:36,623 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - > > > Notification time out: 1600 > > > 2014-03-19 17:28:36,800 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING > > > (n.state), > > 1 > > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:37,096 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING > > > (n.state), > > 2 > > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:37,097 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING > > > (n.state), > > 2 > > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:38,698 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:38,698 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - > > > Notification time out: 3200 > > > 2014-03-19 17:28:38,700 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING > > > (n.state), > > 1 > > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:38,705 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x126 (n.round), FOLLOWING > > > (n.state), > > 2 > > > (n.sid), 0xc4 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:39,408 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41270 > > > 2014-03-19 17:28:39,409 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:39,409 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41270 (no session > > established for > > > client) > > > 2014-03-19 17:28:41,906 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:41,906 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - > > > Notification time out: 6400 > > > 2014-03-19 17:28:42,390 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41273 > > > 2014-03-19 17:28:42,390 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:42,391 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41273 (no session > > established for > > > client) > > > 2014-03-19 17:28:44,729 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41276 > > > 2014-03-19 17:28:44,730 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:44,730 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41276 (no session > > established for > > > client) > > > 2014-03-19 17:28:48,307 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] - > > > Notification time out: 12800 > > > 2014-03-19 17:28:48,308 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 3 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:49,840 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 1 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 1 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:49,841 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 1 (n.sid), 0xc5 (n.peerEPoch), LOOKING (my state) > > > 2014-03-19 17:28:50,042 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:QuorumPeer@750] - LEADING > > > 2014-03-19 17:28:50,042 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@162] - > > > Created server with tickTime 2000 minSessionTimeout 4000 > > > maxSessionTimeout 40000 datadir /home/zookeeper/data/version-2 > > > snapdir > > > /home/zookeeper/data/version-2 > > > 2014-03-19 17:28:50,042 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:Leader@345] - LEADING - > > > LEADER ELECTION TOOK - 27714 > > > 2014-03-19 17:28:50,045 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FileSnap@83] - Reading > > > snapshot > > > /home/zookeeper/data/version-2/snapshot.c200000001 > > > 2014-03-19 17:28:50,540 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41279 > > > 2014-03-19 17:28:50,541 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:50,541 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41279 (no session > > established for > > > client) > > > 2014-03-19 17:28:51,406 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 2 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 2 (n.sid), 0xc5 (n.peerEPoch), LEADING (my state) > > > 2014-03-19 17:28:51,406 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 3 > > > (n.leader), 0xc200000001 (n.zxid), 0x127 (n.round), LOOKING > > > (n.state), 2 (n.sid), 0xc5 (n.peerEPoch), LEADING (my state) > > > 2014-03-19 17:28:53,526 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41282 > > > 2014-03-19 17:28:53,526 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:53,527 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41282 (no session > > established for > > > client) > > > 2014-03-19 17:28:59,322 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41285 > > > 2014-03-19 17:28:59,323 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:28:59,323 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41285 (no session > > established for > > > client) > > > 2014-03-19 17:29:00,253 [myid:3] - INFO > > > [QuorumPeer[myid=3D3]/0:0:0:0:0:0:0:0:2181:FileTxnSnapLog@240] - > > Snapshotting: > > > 0xc200000001 to /home/zookeeper/data/version-2/snapshot.c200000001 > > > 2014-03-19 17:29:04,860 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:29:04,860 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41288 (no session > > established for > > > client) > > > 2014-03-19 17:29:11,031 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41291 > > > 2014-03-19 17:29:11,032 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:29:11,032 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41291 (no session > > established for > > > client) > > > 2014-03-19 17:29:16,490 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41294 > > > 2014-03-19 17:29:16,491 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:29:16,491 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41294 (no session > > established for > > > client) > > > 2014-03-19 17:29:19,064 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] > > > - Accepted socket connection from /10.0.160.243:41297 > > > 2014-03-19 17:29:19,065 [myid:3] - WARN > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@354] - > > Exception > > > causing close of session 0x0 due to java.io.IOException: > > > ZooKeeperServer > > not > > > running > > > 2014-03-19 17:29:19,065 [myid:3] - INFO > > > [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1001] - > > > Closed socket connection for client /10.0.160.243:41297 (no session > > established for > > > client) > > > 2014-03-19 17:29:19,312 [myid:3] - INFO > > > [LearnerHandler-/10.0.33.1:58547:LearnerHandler@263] - Follower sid: > 1 : > > > info : > > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@3c966db5 > > > 2014-03-19 17:29:19,314 [myid:3] - INFO > > > [LearnerHandler-/10.0.33.129:49810:LearnerHandler@263] - Follower sid= : > > 2 : > > > info : > > > org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@466b56b > > > 2014-03-19 17:29:19,475 [myid:3] - ERROR > > > [LearnerHandler-/10.0.33.1:58547:LearnerHandler@562] - Unexpected > > exception > > > causing shutdown while sock still open java.io.EOFException > > > at java.io.DataInputStream.readInt(DataInputStream.java:392) > > > at > > > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63= ) > > > at > > > > > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPack > > et.java:83) > > > at > > > > > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java: > > 108) > > > at > > > > > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.j > > ava:290) > > > 2014-03-19 17:29:19,476 [myid:3] - WARN > > > [LearnerHandler-/10.0.33.1:58547:LearnerHandler@575] - ******* > > > GOODBYE > > > /10.0.33.1:58547 ******** > > > 2014-03-19 17:29:19,476 [myid:3] - ERROR > > > [LearnerHandler-/10.0.33.129:49810:LearnerHandler@562] - Unexpected > > > exception causing shutdown while sock still open > > > java.io.EOFException > > > at java.io.DataInputStream.readInt(DataInputStream.java:392) > > > at > > > org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63= ) > > > at > > > > > org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPack > > et.java:83) > > > at > > > > > org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java: > > 108) > > > at > > > > > org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.j > > ava:290) > > > 2014-03-19 17:29:19,477 [myid:3] - WARN > > > [LearnerHandler-/10.0.33.129:49810:LearnerHandler@575] - ******* > > > GOODBYE > > > /10.0.33.129:49810 ******** > > > 2014-03-19 17:29:21,757 [myid:3] - INFO > > > [WorkerReceiver[myid=3D3]:FastLeaderElection@542] - Notification: 1 > > > (n.leader), 0xc200000001 (n.zxid), 0x128 (n.round), LOOKING > > > (n.state), 1 (n.sid), 0xc5 (n.peerEPoch), LEADING (my state) > > > > > > > > > > > > 2014-03-23 12:05 GMT+09:00 Michi Mutsuzaki : > > > > > >> Hi Youngseok, > > >> > > >> Could you post the log file from 192.168.161.1? The log file you > > >> posted indicates that 192.168.33.1 is not able to connect to > > >> 192.168.161.1. > > >> > > >> Thanks! > > >> --Michi > > >> > > >> > > >> On Fri, Mar 21, 2014 at 12:14 AM, Jung Young Seok > > >> wrote: > > >> > Dear Zookeeper usergroup members, > > >> > > > >> > I have some questions. > > >> > > > >> > We're currently use Zookeeper 3.4.5 with clustering 3 nodes. > > >> > We got zookeeper service stopped all of sudden so client wasn't > > >> > able > > to > > >> > connect to zookeeper server. > > >> > In that situation, zookeepers couldn't elect leader each other. > > >> > > > >> > Then I restarted zookeeper service (all of them) but could't > > >> > elect leader and be follower. > > >> > So I rebooted linux but same happened. (I lost zookeeper log here > > >> > t.t) When I removed snapshot files in data directory, the > > >> > zookeeper worked okay. > > >> > I have uploaded my zookeeper snapshot here > > >> > - > > >> > > > https://s3-ap-northeast-1.amazonaws.com/zookeeper-logs/data_org_b1.tar > > >> > > > >> > If I push the snapshot into data directory, zookeeper clustering > > >> > fail reappears again. > > >> > > > >> > My question is > > >> > 1. why the snapshot was corrupted all of sudden? > > >> > 2. Is there any way I can avoid this snapshot corruption issue? > > >> > > > >> > I've attached zoo.cfg and some of error log. > > >> > > > >> > I'd be happy if I get any opinion. > > >> > Thank You. > > >> > > > >> > Best Regards > > >> > Youngseok Jung > > >> > > > >> > > > >> > #zoo.cfg (pretty much default setting) > > >> > tickTime=3D2000 > > >> > initLimit=3D10 > > >> > syncLimit=3D5 > > >> > dataDir=3D/home/zookeeper/data > > >> > clientPort=3D2181 > > >> > > > >> > server.1=3D192.168.33.1:2888:3888 > > >> > server.2=3D192.168.33.129:2888:3888 > > >> > server.3=3D192.168.161.1:2888:3888 > > >> > autopurge.snapRetainCount=3D3 > > >> > autopurge.purgeInterval=3D1 > > >> > > > >> > > > >> > #Some of error log > > >> > 2014-03-19 17:56:24,737 [myid:1] - INFO > > >> > [WorkerReceiver[myid=3D1]:FastLeaderElection@542] - Notification: = 2 > > >> > (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING > > (n.state), 2 > > >> > (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state) > > >> > 2014-03-19 17:56:24,737 [myid:1] - WARN > > >> > [WorkerSender[myid=3D1]:QuorumCnxManager@368] - Cannot open channe= l > > to 3 > > >> > at > > >> > election address /10.0.161.1:3888 > > >> > java.net.ConnectException: Connection refused > > >> > at java.net.PlainSocketImpl.socketConnect(Native Method) > > >> > at > > >> > > > >> > > > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.jav > > a:339) > > >> > at > > >> > > > >> > > > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketI > > mpl.java:200) > > >> > at > > >> > > > >> > > > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java: > > 182) > > >> > at > java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > > >> > at java.net.Socket.connect(Socket.java:579) > > >> > at > > >> > > > >> > > > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumC > > nxManager.java:354) > > >> > at > > >> > > > >> > > > org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxMa > > nager.java:327) > > >> > at > > >> > > > >> > > > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker > > Sender.process(FastLeaderElection.java:393) > > >> > at > > >> > > > >> > > > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker > > Sender.run(FastLeaderElection.java:365) > > >> > at java.lang.Thread.run(Thread.java:724) > > >> > 2014-03-19 17:56:25,537 [myid:1] - INFO > > >> > [QuorumPeer[myid=3D1]/0:0:0:0:0:0:0:0:2181:FastLeaderElection@774] > > >> > - Notification time out: 1600 > > >> > 2014-03-19 17:56:25,538 [myid:1] - INFO > > >> > [WorkerReceiver[myid=3D1]:FastLeaderElection@542] - Notification: = 1 > > >> > (n.leader), 0xc200000001 (n.zxid), 0x145 (n.round), LOOKING > > (n.state), 1 > > >> > (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state) > > >> > 2014-03-19 17:56:25,540 [myid:1] - INFO > > >> > [WorkerReceiver[myid=3D1]:FastLeaderElection@542] - Notification: = 2 > > >> > (n.leader), 0xc600000001 (n.zxid), 0x144 (n.round), LEADING > > (n.state), 2 > > >> > (n.sid), 0xc6 (n.peerEPoch), LOOKING (my state) > > >> > 2014-03-19 17:56:25,540 [myid:1] - WARN > > >> > [WorkerSender[myid=3D1]:QuorumCnxManager@368] - Cannot open channe= l > > to 3 > > >> > at > > >> > election address /10.0.161.1:3888 > > >> > java.net.ConnectException: Connection refused > > >> > at java.net.PlainSocketImpl.socketConnect(Native Method) > > >> > at > > >> > > > >> > > > java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.jav > > a:339) > > >> > at > > >> > > > >> > > > java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketI > > mpl.java:200) > > >> > at > > >> > > > >> > > > java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java: > > 182) > > >> > at > java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392) > > >> > at java.net.Socket.connect(Socket.java:579) > > >> > at > > >> > > > >> > > > org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumC > > nxManager.java:354) > > >> > at > > >> > > > >> > > > org.apache.zookeeper.server.quorum.QuorumCnxManager.toSend(QuorumCnxMa > > nager.java:327) > > >> > at > > >> > > > >> > > > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker > > Sender.process(FastLeaderElection.java:393) > > >> > at > > >> > > > >> > > > org.apache.zookeeper.server.quorum.FastLeaderElection$Messenger$Worker > > Sender.run(FastLeaderElection.java:365) > > >> > at java.lang.Thread.run(Thread.java:724) > > > > > > > > > --047d7bd6c52c200d8b04f58a1a7f--