hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leonid Fedotov <lfedo...@hortonworks.com>
Subject Re: help why do my regionservers shut themselves down?
Date Tue, 23 Apr 2013 15:59:36 GMT
This could be a reason as well:
2013-04-22 16:47:21,900 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Too many consecutive
RollWriter requests, it's a sign of the total number of live datanodes is lower than the tolerable
replicas.
Make sure your cluster is in good health conditions...


Thank you!

Sincerely,
Leonid Fedotov
On Apr 22, 2013, at 6:25 PM, kaveh minooie wrote:

> 
> Hi
> 
> after a few mapreduce jobs my regionservers shut themselves down. this is the latest
time that this has happened:
> 
> 2013-04-22 16:47:21,843 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
This client just lost it's session with ZooKeeper, trying to reconnect.
> 2013-04-22 16:47:21,843 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING
region server serverName=d1r1n17.prod.plutoz.com,60020,1366657358443, load=(requests=5
> 392, regions=196, usedHeap=1063, maxHeap=3966): regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 received expired fr
> om ZooKeeper, aborting
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session
expired
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:352)
>        at org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:270)
>        at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:523)
>        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:499)
> 2013-04-22 16:47:21,843 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Trying to reconnect to zookeeper.
> 2013-04-22 16:47:21,844 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Dump
of metrics: requests=1794, regions=196, stores=1561, storefiles=1585, storefileIndexSize=104,
memstoreSize=306, compactionQueueSize=10, flushQueueSize=0, usedHeap=1073, maxHeap=3966, blockCacheSize=661986032,
blockCacheFree=169901776, blockCacheCount=7242, blockCacheHitCount=910925, blockCacheMissCount=1558134,
blockCacheEvictedCount=1344753, blockCacheHitRatio=36, blockCacheHitCachingRatio=40
> 2013-04-22 16:47:21,844 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661 regionserver:60020-0x13dd980d2ab8661-0x13dd980d2ab8661
received expired from ZooKeeper, aborting
> 2013-04-22 16:47:21,844 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
> 2013-04-22 16:47:21,900 WARN org.apache.hadoop.hbase.regionserver.wal.HLog: Too many
consecutive RollWriter requests, it's a sign of the total number of live datanodes is lower
than the tolerable replicas.
> 2013-04-22 16:47:22,341 INFO org.apache.zookeeper.ZooKeeper: Initiating client connection,
connectString=zk1:2181 sessionTimeout=180000 watcher=hconnection
> 2013-04-22 16:47:22,357 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: Waiting
on 1 regions to close
> 2013-04-22 16:47:22,394 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection
to server d1r2n2.prod.plutoz.com/10.0.0.66:2181. Will not attempt to authenticate using SASL
(unknown error)
> 2013-04-22 16:47:22,395 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to d1r2n2.prod.plutoz.com/10.0.0.66:2181, initiating session
> 2013-04-22 16:47:22,397 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete
on server d1r2n2.prod.plutoz.com/10.0.0.66:2181, sessionid = 0x13dd980d2abbf93, negotiated
timeout = 40000
> 2013-04-22 16:47:22,400 INFO org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Reconnected successfully. This disconnect could have been caused by a network partition or
a long-running GC pause, either way it's recommended that you verify your environment.
> 2013-04-22 16:47:22,400 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.regionserver.HRegion: compaction
interrupted by user:
> java.io.InterruptedIOException: Aborting compaction of store f in region t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
because user requested stop.
>        at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:998)
>        at org.apache.hadoop.hbase.regionserver.Store.compact(Store.java:779)
>        at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:776)
>        at org.apache.hadoop.hbase.regionserver.HRegion.compactStores(HRegion.java:721)
>        at org.apache.hadoop.hbase.regionserver.CompactSplitThread.run(CompactSplitThread.java:81)
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.regionserver.HRegion: aborted compaction
on region t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
after 5mins, 58sec
> 2013-04-22 16:47:56,830 INFO org.apache.hadoop.hbase.regionserver.CompactSplitThread:
regionserver60020.compactor exiting
> 2013-04-22 16:47:56,832 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed t1_webpage,com.pandora.www:http/shaggy,1366670139658.9f565d5da3468c0725e590dc232abc23.
> 2013-04-22 16:47:57,363 INFO org.apache.hadoop.hbase.regionserver.wal.HLog: regionserver60020.logSyncer
exiting
> 2013-04-22 16:47:57,366 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020
closing leases
> 2013-04-22 16:47:57,366 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020
closed leases
> 2013-04-22 16:47:57,366 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: regionserver60020
exiting
> 2013-04-22 16:47:57,497 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown
hook starting; hbase.shutdown.hook=true; fsShutdownHook=Thread[Thread-15,5,main]
> 2013-04-22 16:47:57,497 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED:
Shutdown hook
> 2013-04-22 16:47:57,497 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Starting
fs shutdown hook thread.
> 2013-04-22 16:47:57,504 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020.leaseChecker
closing leases
> 2013-04-22 16:47:57,504 INFO org.apache.hadoop.hbase.regionserver.Leases: regionserver60020.leaseChecker
closed leases
> 2013-04-22 16:47:57,598 INFO org.apache.hadoop.hbase.regionserver.ShutdownHook: Shutdown
hook finished.
> 
> I would appreciate it very much if someone could explain to me what just happened here.
> 
> thanks,


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message