Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C32F896B3 for ; Mon, 19 Sep 2011 05:15:38 +0000 (UTC) Received: (qmail 3887 invoked by uid 500); 19 Sep 2011 05:15:37 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 3495 invoked by uid 500); 19 Sep 2011 05:15:36 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 3468 invoked by uid 99); 19 Sep 2011 05:15:36 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Sep 2011 05:15:36 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of stutiawasthi@hcl.com designates 203.105.185.25 as permitted sender) Received: from [203.105.185.25] (HELO gws06.hcl.com) (203.105.185.25) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Sep 2011 05:15:30 +0000 Received: from NDA-HCLIN-HT01.CORP.HCL.IN (10.248.64.35) by NDA-HCLIN-EDGE4.hcl.com (10.248.64.141) with Microsoft SMTP Server id 8.2.254.0; Mon, 19 Sep 2011 10:41:12 +0530 Received: from NDA-HCLC-HT01.HCLC.CORP.HCL.IN (10.33.64.155) by NDA-HCLIN-HT01.CORP.HCL.IN (10.248.64.35) with Microsoft SMTP Server (TLS) id 8.2.254.0; Mon, 19 Sep 2011 10:45:04 +0530 Received: from NDA-HCLC-EVS04.HCLC.CORP.HCL.IN ([fe80::ad0f:93a5:83ef:76af]) by NDA-HCLC-HT01.HCLC.CORP.HCL.IN ([::1]) with mapi; Mon, 19 Sep 2011 10:45:03 +0530 From: Stuti Awasthi To: "user@hbase.apache.org" Date: Mon, 19 Sep 2011 10:45:02 +0530 Subject: Unexpected shutdown of Zookeeper Thread-Topic: Unexpected shutdown of Zookeeper Thread-Index: Acx2h4yK3MJYvnb8RjKj/EUjFU9zVg== Message-ID: <7D9AF4B98807C54EBADEA75DF6D5ACB77FBA90AD@NDA-HCLC-EVS04.HCLC.CORP.HCL.IN> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_7D9AF4B98807C54EBADEA75DF6D5ACB77FBA90ADNDAHCLCEVS04HCL_" MIME-Version: 1.0 --_000_7D9AF4B98807C54EBADEA75DF6D5ACB77FBA90ADNDAHCLCEVS04HCL_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi All, I was running a 2 node cluster with 1 zookeeper node and 2 region server no= de. I had also setup cluster replication with another single node Hbase-Had= oop cluster. Replication was successful and I left the cluster running over= the weekend with no data for replication. Today I can see that in Master cluster Zookeeper is dead. 1 region server = which was running on slave machine is also dead. The cluster to which I was= replicating is running fine. My queries are : 1. Can zookeeper be dead because there is no replication over the net= work for long time ? 2. How to cater to these situations ? Running 3-4 zookeeper node will= help ? 3. If I run multiple Zookeeper node, then will the cluster keep on ru= nning normally even if 2-3 zookeeper are dead? 4. In my case, out of 2 region server, 1 is dead but 1 is still worki= ng, if my zookeeper node was running, will I able to access hbase properly. Logs : hbase-root-zookeeper-master.log : 2011-09-19 10:07:55,753 INFO org.apache.zookeeper.server.NIOServerCnxn: Acc= epted socket connection from /10.33.64.235:44706 2011-09-19 10:07:55,758 INFO org.apache.zookeeper.server.NIOServerCnxn: Cli= ent attempting to establish new session at /10.33.64.235:44706 2011-09-19 10:07:55,761 INFO org.apache.zookeeper.server.NIOServerCnxn: Est= ablished session 0x13271b6c4f1000c with negotiated timeout 180000 for clien= t /10.33.64.235:44706 2011-09-19 10:10:48,318 WARN org.apache.zookeeper.server.NIOServerCnxn: End= OfStreamException: Unable to read additional data from client sessionid 0x1= 3271b6c4f1000c, likely client has closed socket 2011-09-19 10:10:48,319 INFO org.apache.zookeeper.server.NIOServerCnxn: Clo= sed socket connection for client /10.33.64.235:44706 which had sessionid 0x= 13271b6c4f1000c 2011-09-19 10:12:57,002 INFO org.apache.zookeeper.server.ZooKeeperServer: E= xpiring session 0x13271b6c4f1000c, timeout of 180000ms exceeded 2011-09-19 10:12:57,002 INFO org.apache.zookeeper.server.PrepRequestProcess= or: Processed session termination for sessionid: 0x13271b6c4f1000c hbase-root-regionserver-slave.log: 2011-09-16 16:00:50,354 WARN org.apache.hadoop.ipc.HBaseServer: IPC Server = listener on 60020: readAndProcess threw exception java.io.IOException: Conn= ection reset by peer. Count of bytes read: 0 java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcher.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202) at sun.nio.ch.IOUtil.read(IOUtil.java:175) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243) 2011-09-16 16:00:51,058 DEBUG org.apache.hadoop.hbase.replication.regionser= ver.ReplicationSource: Opening log for replication slave%3A60020.1316168146= 136 at 663246 2011-09-16 16:00:51,064 DEBUG org.apache.hadoop.hbase.replication.regionser= ver.ReplicationSource: currentNbOperations:5003 and seenEntries:0 and size:= 0 2011-09-16 16:00:51,064 INFO org.apache.hadoop.hbase.replication.regionserv= er.ReplicationSourceManager: Going to report log #slave%3A60020.13161681461= 36 for position 663246 in hdfs://master:54310/hbase/.logs/slave,60020,13161= 68145427/slave%3A60020.1316168146136 2011-09-16 16:00:51,066 INFO org.apache.hadoop.hbase.replication.regionserv= er.ReplicationSourceManager: Removing 0 logs in the list: [] 2011-09-16 16:00:51,066 DEBUG org.apache.hadoop.hbase.replication.regionser= ver.ReplicationSource: Nothing to replicate, sleeping 1000 times 2 2011-09-16 16:00:53,068 DEBUG org.apache.hadoop.hbase.replication.regionser= ver.ReplicationSource: Opening log for replication slave%3A60020.1316168146= 136 at 663246 .................................. 2011-09-16 17:14:49,440 WARN org.apache.zookeeper.ClientCnxn: Session 0x132= 71b5395c0007 for server null, unexpected error, closing socket connection a= nd attempting reconnect java.net.ConnectException: Connection timed out at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java= :567) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:11= 19) 2011-09-16 17:14:51,039 INFO org.apache.hadoop.hbase.replication.regionserv= er.ReplicationSourceManager: /hbase/rs/master,60020,1316167798366 znode exp= ired, trying to lock it 2011-09-16 17:14:51,088 INFO org.apache.zookeeper.ClientCnxn: Opening socke= t connection to server slave1/172.28.96.239:2181 2011-09-16 17:14:51,089 INFO org.apache.zookeeper.ClientCnxn: Socket connec= tion established to slave1/172.28.96.239:2181, initiating session 2011-09-16 17:14:51,093 INFO org.apache.zookeeper.ClientCnxn: Unable to rec= onnect to ZooKeeper service, session 0x13271b5395c0007 has expired, closing= socket connection 2011-09-16 17:14:51,094 FATAL org.apache.hadoop.hbase.regionserver.HRegionS= erver: ABORTING region server serverName=3Dslave,60020,1316168145427, load= =3D(requests=3D0, regions=3D6, usedHeap=3D29, maxHeap=3D996): connection to= cluster: 1-0x13271b5395c0007 connection to cluster: 1-0x13271b5395c0007 re= ceived expired from ZooKeeper, aborting org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCo= de =3D Session expired at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEven= t(ZooKeeperWatcher.java:343) at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKee= perWatcher.java:261) at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCn= xn.java:530) at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:5= 06) 2011-09-16 17:14:51,094 INFO org.apache.hadoop.hbase.regionserver.HRegionSe= rver: Dump of metrics: requests=3D0, regions=3D6, stores=3D6, storefiles=3D= 5, storefileIndexSize=3D0, memstoreSize=3D0, compactionQueueSize=3D0, flush= QueueSize=3D0, usedHeap=3D29, maxHeap=3D996, blockCacheSize=3D982352, block= CacheFree=3D208064384, blockCacheCount=3D2, blockCacheHitCount=3D31, blockC= acheMissCount=3D2, blockCacheEvictedCount=3D0, blockCacheHitRatio=3D93, blo= ckCacheHitCachingRatio=3D93 2011-09-16 17:14:51,094 INFO org.apache.hadoop.hbase.regionserver.HRegionSe= rver: STOPPED: connection to cluster: 1-0x13271b5395c0007 connection to clu= ster: 1-0x13271b5395c0007 received expired from ZooKeeper, aborting 2011-09-16 17:14:51,094 INFO org.apache.zookeeper.ClientCnxn: EventThread s= hut down 2011-09-16 17:14:51,114 DEBUG org.apache.hadoop.hbase.replication.regionser= ver.ReplicationSource: Source exiting 1 2011-09-16 17:14:52,476 INFO org.apache.hadoop.ipc.HBaseServer: Stopping se= rver on 60020 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 0 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 2 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 1 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 0 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 2 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 9 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 3 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 8 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 6 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 4 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 5 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 7 on 60020: exiting 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 6 on 60020: exiting 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 8 on 60020: exiting 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server = handler 9 on 60020: exiting 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 1 on 60020: exiting 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 3 on 60020: exiting 2011-09-16 17:14:52,478 INFO org.apache.hadoop.hbase.regionserver.HRegionSe= rver: Stopping infoServer 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IP= C Server listener on 60020 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 4 on 60020: exiting 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 5 on 60020: exiting 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IP= C Server Responder 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: PRI IPC Ser= ver handler 7 on 60020: exiting 2011-09-16 17:14:52,481 INFO org.mortbay.log: Stopped SelectChannelConnecto= r@0.0.0.0:60030 2011-09-16 17:14:52,585 INFO org.apache.hadoop.hbase.regionserver.CompactSp= litThread: regionserver60020.compactor exiting 2011-09-16 17:14:52,585 INFO org.apache.hadoop.hbase.regionserver.MemStoreF= lusher: regionserver60020.cacheFlusher exiting 2011-09-16 17:14:52,586 INFO org.apache.hadoop.hbase.regionserver.LogRoller= : LogRoller exiting. 2011-09-16 17:14:52,586 INFO org.apache.hadoop.hbase.regionserver.HRegionSe= rver$MajorCompactionChecker: regionserver60020.majorCompactionChecker exiti= ng 2011-09-16 17:14:52,587 DEBUG org.apache.hadoop.hbase.regionserver.handler.= CloseRegionHandler: Processing close of backup,,1315992791196.e5ff1d9eb66e1= 157d0ca8bfaaf493480. 2011-09-16 17:14:52,588 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog= : regionserver60020.logSyncer interrupted while waiting for sync requests 2011-09-16 17:14:52,588 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:= Closing backup,,1315992791196.e5ff1d9eb66e1157d0ca8bfaaf493480.: disabling= compactions & flushes 2011-09-16 17:14:52,588 DEBUG org.apache.hadoop.hbase.regionserver.handler.= CloseRegionHandler: Processing close of testArchiveBackup,,1315915407547.e0= 5ec3159a022f28aa92e1a01ca50fec. 2011-09-16 17:14:52,588 DEBUG org.apache.hadoop.hbase.regionserver.handler.= CloseRegionHandler: Processing close of replication,,1316166014290.5937efd7= 6493915556d3641aa9c0b6df. 2011-09-16 17:14:52,589 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:= regionserver60020.logSyncer exiting 2011-09-16 17:14:52,588 DEBUG org.apache.hadoop.hbase.regionserver.handler.= CloseRegionHandler: Processing close of -ROOT-,,0.70236052 2011-09-16 17:14:52,589 DEBUG org.apache.hadoop.hbase.regionserver.wal.HLog= : closing hlog writer in hdfs://master:54310/hbase/.logs/slave,60020,131616= 8145427 2011-09-16 17:14:52,589 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:= Closing replication,,1316166014290.5937efd76493915556d3641aa9c0b6df.: disa= bling compactions & flushes ............................ 2011-09-16 17:14:52,602 INFO org.apache.zookeeper.ClientCnxn: EventThread s= hut down 2011-09-16 17:14:52,602 INFO org.apache.zookeeper.ZooKeeper: Session: 0x132= 71b6c4f10003 closed 2011-09-16 17:14:52,605 INFO org.apache.zookeeper.ClientCnxn: EventThread s= hut down 2011-09-16 17:14:52,605 INFO org.apache.zookeeper.ZooKeeper: Session: 0x132= 71b6c4f10005 closed 2011-09-16 17:14:52,605 INFO org.apache.hadoop.hbase.replication.regionserv= er.ReplicationSource: Closing source 1 because: Region server is closing 2011-09-16 17:14:52,605 INFO org.apache.hadoop.hbase.regionserver.HRegionSe= rver: regionserver60020 exiting 2011-09-16 17:14:53,040 INFO org.apache.hadoop.hbase.replication.regionserv= er.ReplicationSourceManager: Not transferring queue since we are shutting d= own 2011-09-16 17:14:53,042 INFO org.apache.hadoop.hbase.regionserver.ShutdownH= ook: Shutdown hook starting; hbase.shutdown.hook=3Dtrue; fsShutdownHook=3DT= hread[Thread-14,5,main] 2011-09-16 17:14:53,042 INFO org.apache.hadoop.hbase.regionserver.HRegionSe= rver: STOPPED: Shutdown hook 2011-09-16 17:14:53,042 INFO org.apache.hadoop.hbase.regionserver.ShutdownH= ook: Starting fs shutdown hook thread. 2011-09-16 17:14:53,042 INFO org.apache.hadoop.hbase.regionserver.ShutdownH= ook: Shutdown hook finished. Please suggest. Thanks ________________________________ ::DISCLAIMER:: ---------------------------------------------------------------------------= -------------------------------------------- The contents of this e-mail and any attachment(s) are confidential and inte= nded for the named recipient(s) only. It shall not attach any liability on the originator or HCL or its affiliate= s. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect t= he opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification,= distribution and / or publication of this message without the prior written consent of the author of this e-mail= is strictly prohibited. If you have received this email in error please delete it and notify the sender immedia= tely. Before opening any mail and attachments please check them for viruses and defect. ---------------------------------------------------------------------------= -------------------------------------------- --_000_7D9AF4B98807C54EBADEA75DF6D5ACB77FBA90ADNDAHCLCEVS04HCL_--