hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuti Awasthi <stutiawas...@hcl.com>
Subject RE: Unexpected shutdown of Zookeeper
Date Tue, 20 Sep 2011 05:37:57 GMT
Thanks Lars,
I will also try to test this on my end. Thanks. Will update more if faces further issues.

-----Original Message-----
From: lars hofhansl [mailto:lhofhansl@yahoo.com]
Sent: Tuesday, September 20, 2011 11:05 AM
To: user@hbase.apache.org
Subject: Re: Unexpected shutdown of Zookeeper

I think the fix the mostly good.
Chris is working on a test. This will be in 0.92, but can probably be back ported.


-- Lars


----- Original Message -----
From: Stuti Awasthi <stutiawasthi@hcl.com>
To: "user@hbase.apache.org" <user@hbase.apache.org>
Cc:
Sent: Monday, September 19, 2011 9:25 PM
Subject: RE: Unexpected shutdown of Zookeeper

Hi JD,

Thanks for your response. I was planning to use replication for my production/development
servers but it seems like work is still going on this issue. I want to know that which version
release is planned for this bug. Currently Im using Hbase 0.90.3

Some of my queries are :
1.       Will running 3-4 zookeeper node helps in case of failure of 1-2 zookeeper node? Will
the cluster keeps on running or it will be down ?

Thanks
-Stuti

-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Monday, September 19, 2011 11:04 PM
To: user@hbase.apache.org
Subject: Re: Unexpected shutdown of Zookeeper

I think this is just:

https://issues.apache.org/jira/browse/HBASE-3130

J-D

On Sun, Sep 18, 2011 at 10:15 PM, Stuti Awasthi <stutiawasthi@hcl.com> wrote:
> Hi All,
>
> I was running a 2 node cluster with 1 zookeeper node and 2 region server node. I had
also setup cluster replication with another single node Hbase-Hadoop cluster. Replication
was successful and I left the cluster running over the weekend with no data for replication.
>
> Today I can see that in  Master cluster Zookeeper is dead. 1 region server which was
running on slave machine is also dead. The cluster to which I was replicating is running fine.
>
> My queries are :
>
> 1.       Can zookeeper be dead because there is no replication over the network for long
time ?
>
> 2.       How to cater to these situations ? Running 3-4 zookeeper node will help ?
>
> 3.       If I run multiple Zookeeper node, then will the cluster keep on running normally
even if 2-3 zookeeper are dead?
>
> 4.       In my case, out of 2 region server, 1 is dead but 1 is still working, if my
zookeeper node was running, will I able to access hbase properly.
>
> Logs :
> hbase-root-zookeeper-master.log :
>
> 2011-09-19 10:07:55,753 INFO
> org.apache.zookeeper.server.NIOServerCnxn: Accepted socket connection
> from /10.33.64.235:44706
> 2011-09-19 10:07:55,758 INFO
> org.apache.zookeeper.server.NIOServerCnxn: Client attempting to
> establish new session at /10.33.64.235:44706
> 2011-09-19 10:07:55,761 INFO
> org.apache.zookeeper.server.NIOServerCnxn: Established session
> 0x13271b6c4f1000c with negotiated timeout 180000 for client
> /10.33.64.235:44706
> 2011-09-19 10:10:48,318 WARN
> org.apache.zookeeper.server.NIOServerCnxn: EndOfStreamException:
> Unable to read additional data from client sessionid
> 0x13271b6c4f1000c, likely client has closed socket
> 2011-09-19 10:10:48,319 INFO
> org.apache.zookeeper.server.NIOServerCnxn: Closed socket connection
> for client /10.33.64.235:44706 which had sessionid 0x13271b6c4f1000c
> 2011-09-19 10:12:57,002 INFO
> org.apache.zookeeper.server.ZooKeeperServer: Expiring session
> 0x13271b6c4f1000c, timeout of 180000ms exceeded
> 2011-09-19 10:12:57,002 INFO
> org.apache.zookeeper.server.PrepRequestProcessor: Processed session
> termination for sessionid: 0x13271b6c4f1000c
>
> hbase-root-regionserver-slave.log:
>
> 2011-09-16 16:00:50,354 WARN org.apache.hadoop.ipc.HBaseServer: IPC
>Server listener on 60020: readAndProcess threw exception
> java.io.IOException: Connection reset by peer. Count of bytes read: 0
> java.io.IOException: Connection reset by peer
>       at sun.nio.ch.FileDispatcher.read0(Native Method)
>       at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
>       at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:202)
>       at sun.nio.ch.IOUtil.read(IOUtil.java:175)
>       at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
> 2011-09-16 16:00:51,058 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Opening log for replication slave%3A60020.1316168146136 at 663246
> 2011-09-16 16:00:51,064 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> currentNbOperations:5003 and seenEntries:0 and size: 0
> 2011-09-16 16:00:51,064 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceMana
> ger: Going to report log #slave%3A60020.1316168146136 for position
> 663246 in
> hdfs://master:54310/hbase/.logs/slave,60020,1316168145427/slave%3A6002
> 0.1316168146136
> 2011-09-16 16:00:51,066 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceMana
> ger: Removing 0 logs in the list: []
> 2011-09-16 16:00:51,066 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Nothing to replicate, sleeping 1000 times 2
> 2011-09-16 16:00:53,068 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication
slave%3A60020.1316168146136 at 663246 ..................................
> 2011-09-16 17:14:49,440 WARN org.apache.zookeeper.ClientCnxn: Session
> 0x13271b5395c0007 for server null, unexpected error, closing socket
>connection and attempting reconnect
> java.net.ConnectException: Connection timed out
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:567)
>       at
> org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1119)
> 2011-09-16 17:14:51,039 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceMana
> ger: /hbase/rs/master,60020,1316167798366 znode expired, trying to
>lock it
> 2011-09-16 17:14:51,088 INFO org.apache.zookeeper.ClientCnxn: Opening
>socket connection to server slave1/172.28.96.239:2181
> 2011-09-16 17:14:51,089 INFO org.apache.zookeeper.ClientCnxn: Socket
>connection established to slave1/172.28.96.239:2181, initiating
>session
> 2011-09-16 17:14:51,093 INFO org.apache.zookeeper.ClientCnxn: Unable
>to reconnect to ZooKeeper service, session 0x13271b5395c0007 has
>expired, closing socket connection
> 2011-09-16 17:14:51,094 FATAL
> org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region
>server serverName=slave,60020,1316168145427, load=(requests=0,
>regions=6, usedHeap=29, maxHeap=996): connection to cluster:
> 1-0x13271b5395c0007 connection to cluster: 1-0x13271b5395c0007
>received expired from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
>       at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(Zoo
> KeeperWatcher.java:343)
>       at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWa
> tcher.java:261)
>       at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.ja
> va:530)
>       at
> org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:506)
> 2011-09-16 17:14:51,094 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Dump of metrics:
> requests=0, regions=6, stores=6, storefiles=5, storefileIndexSize=0,
>memstoreSize=0, compactionQueueSize=0, flushQueueSize=0, usedHeap=29,
>maxHeap=996, blockCacheSize=982352, blockCacheFree=208064384,
>blockCacheCount=2, blockCacheHitCount=31, blockCacheMissCount=2,
>blockCacheEvictedCount=0, blockCacheHitRatio=93,
> blockCacheHitCachingRatio=93
> 2011-09-16 17:14:51,094 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
> connection to cluster: 1-0x13271b5395c0007 connection to cluster:
> 1-0x13271b5395c0007 received expired from ZooKeeper, aborting
> 2011-09-16 17:14:51,094 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> 2011-09-16 17:14:51,114 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Source exiting 1
> 2011-09-16 17:14:52,476 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping server on 60020
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 0 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 2 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 1 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 0 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 2 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 9 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 3 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 8 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 6 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 4 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 5 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 7 on 60020: exiting
> 2011-09-16 17:14:52,477 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 6 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 8 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: IPC
>Server handler 9 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 1 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 3 on 60020: exiting
> 2011-09-16 17:14:52,478 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: Stopping
>infoServer
> 2011-09-16 17:14:52,478 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping IPC Server listener on 60020
> 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 4 on 60020: exiting
> 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 5 on 60020: exiting
> 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer:
> Stopping IPC Server Responder
> 2011-09-16 17:14:52,479 INFO org.apache.hadoop.ipc.HBaseServer: PRI
>IPC Server handler 7 on 60020: exiting
> 2011-09-16 17:14:52,481 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:60030
> 2011-09-16 17:14:52,585 INFO
> org.apache.hadoop.hbase.regionserver.CompactSplitThread:
> regionserver60020.compactor exiting
> 2011-09-16 17:14:52,585 INFO
> org.apache.hadoop.hbase.regionserver.MemStoreFlusher:
> regionserver60020.cacheFlusher exiting
> 2011-09-16 17:14:52,586 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller
exiting.
> 2011-09-16 17:14:52,586 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer$MajorCompactionChec
> ker: regionserver60020.majorCompactionChecker exiting
> 2011-09-16 17:14:52,587 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Processing close of backup,,1315992791196.e5ff1d9eb66e1157d0ca8bfaaf493480.
> 2011-09-16 17:14:52,588 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> regionserver60020.logSyncer interrupted while waiting for sync
>requests
> 2011-09-16 17:14:52,588 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Closing
> backup,,1315992791196.e5ff1d9eb66e1157d0ca8bfaaf493480.: disabling
>compactions & flushes
> 2011-09-16 17:14:52,588 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Processing close of testArchiveBackup,,1315915407547.e05ec3159a022f28aa92e1a01ca50fec.
> 2011-09-16 17:14:52,588 DEBUG org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
Processing close of replication,,1316166014290.5937efd76493915556d3641aa9c0b6df.
> 2011-09-16 17:14:52,589 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog:
> regionserver60020.logSyncer exiting
> 2011-09-16 17:14:52,588 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler:
> Processing close of -ROOT-,,0.70236052
> 2011-09-16 17:14:52,589 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLog: closing hlog writer in
> hdfs://master:54310/hbase/.logs/slave,60020,1316168145427
> 2011-09-16 17:14:52,589 DEBUG
> org.apache.hadoop.hbase.regionserver.HRegion: Closing replication,,1316166014290.5937efd76493915556d3641aa9c0b6df.:
disabling compactions & flushes ............................
> 2011-09-16 17:14:52,602 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> 2011-09-16 17:14:52,602 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x13271b6c4f10003 closed
> 2011-09-16 17:14:52,605 INFO org.apache.zookeeper.ClientCnxn:
> EventThread shut down
> 2011-09-16 17:14:52,605 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x13271b6c4f10005 closed
> 2011-09-16 17:14:52,605 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
> Closing source 1 because: Region server is closing
> 2011-09-16 17:14:52,605 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
>exiting
> 2011-09-16 17:14:53,040 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceMana
> ger: Not transferring queue since we are shutting down
> 2011-09-16 17:14:53,042 INFO
> org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown hook
>starting; hbase.shutdown.hook=true;
>fsShutdownHook=Thread[Thread-14,5,main]
> 2011-09-16 17:14:53,042 INFO
> org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: Shutdown
>hook
> 2011-09-16 17:14:53,042 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting
fs shutdown hook thread.
> 2011-09-16 17:14:53,042 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown
hook finished.
>
> Please suggest.
>
> Thanks
>
> ________________________________
> ::DISCLAIMER::
> ----------------------------------------------------------------------
> -------------------------------------------------
>
> The contents of this e-mail and any attachment(s) are confidential and intended for the
named recipient(s) only.
> It shall not attach any liability on the originator or HCL or its
> affiliates. Any views or opinions presented in this email are solely those of the author
and may not necessarily reflect the opinions of HCL or its affiliates.
> Any form of reproduction, dissemination, copying, disclosure,
> modification, distribution and / or publication of this message
> without the prior written consent of the author of this e-mail is
> strictly prohibited. If you have received this email in error please delete it and notify
the sender immediately. Before opening any mail and attachments please check them for viruses
and defect.
>
> ----------------------------------------------------------------------
> -------------------------------------------------
>


Mime
View raw message