hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcos Ortiz <mlor...@uci.cu>
Subject Re: hbase-master-server slept
Date Fri, 08 Feb 2013 09:26:18 GMT
Regards, So,
Can you provide more information about your setup?
- HBase version
- Hadoop version
- Operating System
- Java version

On 02/08/2013 03:55 AM, So Hibino wrote:
> Our hbase-master-server was shutdown with following message.
> Hbase is runnig in Distributed mode in a single node.
Can you share your .conf files?
> I checked that GC completed in a very short time at the time of output the
> WARN.
> In addition the other system that is running in the same architecture
> doesn't output the following WARN messsage and works well.
> So I think that this is not due to a long GC pause.
>
> Do you have any idea about the problem?
>
> 2013-01-30 03:07:48,582 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
> 28970ms instead of 1000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
Did you check the link?
Todd wrote a series of posts in Cloudera´s blog about Java Long GC 
pauses, HBase and Zookeeper.
It´s a great read:
http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-1/
http://www.cloudera.com/blog/2011/02/avoiding-full-gcs-in-hbase-with-memstore-local-allocation-buffers-part-2/
> 2013-01-30 03:07:48,583 WARN org.apache.hadoop.hbase.util.Sleeper: We slept
> 36902ms instead of 10000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2013-01-30 03:07:48,585 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 39989ms for sessionid
> 0x13c84cebfce0000, closing socket connection and attempting reconnect
> 2013-01-30 03:07:48,586 INFO org.apache.zookeeper.ClientCnxn: Client session
> timed out, have not heard from server in 39987ms for sessionid
> 0x13c84cebfce0001, closing socket connection and attempting reconnect
> 2013-01-30 03:07:52,779 INFO org.apache.zookeeper.ClientCnxn: Opening socket
> connection to server VM_11/192.168.152.1:2181
> 2013-01-30 03:07:52,789 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to VM_11/192.168.152.1:2181, initiating session
> 2013-01-30 03:07:52,777 INFO org.apache.zookeeper.ClientCnxn: Opening socket
> connection to server VM_11/192.168.152.1:2181
> 2013-01-30 03:07:52,793 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to VM_11/192.168.152.1:2181, initiating session
> 2013-01-30 03:07:52,794 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x13c84cebfce0001 has expired,
> closing socket connection
> 2013-01-30 03:07:52,794 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> This client just lost it's session with ZooKeeper, trying to reconnect.
> 2013-01-30 03:07:52,794 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Trying to reconnect to zookeeper.
> 2013-01-30 03:07:52,795 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=VM_11:2181 sessionTimeout=180000
> watcher=hconnection
> 2013-01-30 03:07:52,812 INFO org.apache.zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x13c84cebfce0000 has expired,
> closing socket connection
> 2013-01-30 03:07:52,813 FATAL org.apache.hadoop.hbase.master.HMaster:
> master:60000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000
> master:60000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000
> received expired from ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
> 	at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:361)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:279)
> 	at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
> 2013-01-30 03:07:52,813 INFO org.apache.hadoop.hbase.master.HMaster:
> Aborting
> 2013-01-30 03:07:52,813 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-01-30 03:07:52,813 INFO org.apache.zookeeper.ClientCnxn: Opening socket
> connection to server VM_11/192.168.152.1:2181
> 2013-01-30 03:07:52,814 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to VM_11/192.168.152.1:2181, initiating session
> 2013-01-30 03:07:52,815 ERROR org.apache.hadoop.hbase.master.HMaster: Region
> server serverName=VM_11,60020,1359437833300, load=(requests=0, regions=3,
> usedHeap=45, maxHeap=997) reported a fatal error:
> ABORTING region server serverName=VM_11,60020,1359437833300,
> load=(requests=0, regions=3, usedHeap=45, maxHeap=997):
> regionserver:60020-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002
> regionserver:60020-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002-0x13c84cebfce0002
> received expired from ZooKeeper, aborting
> Cause:
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired
> 	at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:361)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:279)
> 	at
> org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:526)
> 	at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:502)
>
> 2013-01-30 03:07:52,820 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server VM_11/192.168.152.1:2181, sessionid =
> 0x13c84cebfce0005, negotiated timeout = 40000
> 2013-01-30 03:07:52,841 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Reconnected successfully. This disconnect could have been caused by a
> network partition or a long-running GC pause, either way it's recommended
> that you verify your environment.
> 2013-01-30 03:07:52,841 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-01-30 03:07:53,614 INFO org.apache.hadoop.hbase.master.LogCleaner:
> master-VM_11:60000.oldLogCleaner exiting
> 2013-01-30 03:07:54,251 INFO org.apache.hadoop.hbase.master.HMaster$2:
> VM_11:60000-BalancerChore exiting
> 2013-01-30 03:07:54,251 DEBUG org.apache.hadoop.hbase.master.HMaster:
> Stopping service threads
> 2013-01-30 03:07:54,251 INFO org.apache.hadoop.ipc.HBaseServer: Stopping
> server on 60000
> 2013-01-30 03:07:54,252 INFO org.apache.hadoop.hbase.master.HMaster:
> Stopping infoServer
> 2013-01-30 03:07:54,325 INFO org.mortbay.log: Stopped
> SelectChannelConnector@0.0.0.0:60010
> 2013-01-30 03:07:54,326 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 5 on 60000: exiting
> 2013-01-30 03:07:54,326 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
> Server listener on 60000
> 2013-01-30 03:07:54,327 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 9 on 60000: exiting
> 2013-01-30 03:07:54,327 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 8 on 60000: exiting
> 2013-01-30 03:07:54,327 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 7 on 60000: exiting
> 2013-01-30 03:07:54,327 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 6 on 60000: exiting
> 2013-01-30 03:07:54,327 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 4 on 60000: exiting
> 2013-01-30 03:07:54,327 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 3 on 60000: exiting
> 2013-01-30 03:07:54,327 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 2 on 60000: exiting
> 2013-01-30 03:07:54,327 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 1 on 60000: exiting
> 2013-01-30 03:07:54,327 INFO org.apache.hadoop.hbase.master.CatalogJanitor:
> VM_11:60000-CatalogJanitor exiting
> 2013-01-30 03:07:54,328 INFO org.apache.hadoop.ipc.HBaseServer: IPC Server
> handler 0 on 60000: exiting
> 2013-01-30 03:07:54,328 INFO org.apache.hadoop.ipc.HBaseServer: Stopping IPC
> Server Responder
> 2013-01-30 03:07:54,337 WARN org.apache.hadoop.hbase.zookeeper.ZKUtil:
> master:60000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000
> Unable to get data of znode /hbase/master
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/master
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> 	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:577)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:554)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:648)
> 	at
> org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:202)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:318)
> 2013-01-30 03:07:54,337 ERROR
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
> master:60000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000
> Received unexpected KeeperException, re-throwing exception
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/master
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> 	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:577)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:554)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:648)
> 	at
> org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:202)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:318)
> 2013-01-30 03:07:54,337 ERROR
> org.apache.hadoop.hbase.master.ActiveMasterManager:
> master:60000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000-0x13c84cebfce0000
> Error deleting our own master address node
> org.apache.zookeeper.KeeperException$SessionExpiredException:
> KeeperErrorCode = Session expired for /hbase/master
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:118)
> 	at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)
> 	at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataInternal(ZKUtil.java:577)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAndWatch(ZKUtil.java:554)
> 	at
> org.apache.hadoop.hbase.zookeeper.ZKUtil.getDataAsAddress(ZKUtil.java:648)
> 	at
> org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:202)
> 	at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:318)
> 2013-01-30 03:07:54,337 DEBUG
> org.apache.hadoop.hbase.catalog.CatalogTracker: Stopping catalog tracker
> org.apache.hadoop.hbase.catalog.CatalogTracker@4743bf3d
> 2013-01-30 03:07:54,337 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> The connection to hconnection-0x13c84cebfce0005 has been closed.
> 2013-01-30 03:07:54,338 INFO
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Closed zookeeper sessionid=0x13c84cebfce0005
> 2013-01-30 03:07:54,339 INFO org.apache.zookeeper.ZooKeeper: Session:
> 0x13c84cebfce0005 closed
> 2013-01-30 03:07:54,339 DEBUG
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> The connection to null has been closed.
> 2013-01-30 03:07:54,339 INFO org.apache.hadoop.hbase.master.HMaster: HMaster
> main thread exiting
> 2013-01-30 03:07:54,339 INFO org.apache.zookeeper.ClientCnxn: EventThread
> shut down
> 2013-01-30 03:07:54,339 INFO
> org.apache.hadoop.hbase.master.AssignmentManager$TimeoutMonitor:
> VM_11:60000.timeoutMonitor exiting
>
>
>
> --
> View this message in context: http://apache-hbase.679495.n3.nabble.com/hbase-master-server-slept-tp4038192.html
> Sent from the HBase User mailing list archive at Nabble.com.

-- 
Marcos Ortiz Valmaseda,
Product Manager && Data Scientist at UCI
Blog: http://marcosluis2186.posterous.com
Twitter: @marcosluis2186 <http://twitter.com/marcosluis2186>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message